Production-Beispiel: Software Delivery Platform¶

Überblick¶

Dieses Beispiel zeigt den Aufbau einer Production-Ready Software Delivery Platform mit Polycrate auf Bare-Metal-Servern. Der Stack umfasst:

High-Availability Kubernetes-Cluster (3 Controlplanes, 6 Workers)
Floating IP für Controlplane-Hochverfügbarkeit (Keepalived)
Hardening aller Nodes (SSH, Firewall, Security-Updates)
Cilium als CNI mit BGP, Gateway API und Hubble
Longhorn als verteiltes Storage-System
VictoriaMetrics für Monitoring mit Grafana
VictoriaLogs für zentralisiertes Logging
cert-manager für automatisierte TLS-Zertifikate (Let's Encrypt)
NGINX Ingress Controller mit Monitoring-Integration

Architektur:

graph TB
    subgraph "Controlplanes (HA)"
        CP1[Controlplane 1<br/>10.0.0.11]
        CP2[Controlplane 2<br/>10.0.0.12]
        CP3[Controlplane 3<br/>10.0.0.13]
        VIP[Floating IP<br/>10.0.0.27<br/>keepalived]

        CP1 --> VIP
        CP2 --> VIP
        CP3 --> VIP
    end

    subgraph "Workers"
        W1[Worker 1]
        W2[Worker 2]
        W3[Worker 3]
        W4[Worker 4]
        W5[Worker 5]
        W6[Worker 6]
    end

    VIP --> K8s[Kubernetes API<br/>:6443]

    subgraph "CNI & Networking"
        Cilium[Cilium CNI<br/>+ BGP + Gateway API]
        Hubble[Hubble Observability]
    end

    subgraph "Storage"
        Longhorn[Longhorn<br/>Replicated Storage]
    end

    subgraph "Ingress & TLS"
        Ingress[NGINX Ingress]
        CertManager[cert-manager<br/>Let's Encrypt]
    end

    subgraph "Observability"
        VM[VictoriaMetrics<br/>+ Grafana]
        VL[VictoriaLogs<br/>Syslog :514]
    end

    K8s --> Cilium
    K8s --> Longhorn
    K8s --> Ingress
    K8s --> CertManager
    K8s --> VM
    K8s --> VL

    Cilium --> Hubble

    style VIP fill:#e74c3c,color:#fff
    style K8s fill:#326ce5,color:#fff
    style Cilium fill:#f1c40f,color:#000

Workspace-Struktur¶

polycrate-demo/
├── workspace.poly          # Hauptkonfiguration
├── inventory.yml           # Ansible-Inventory (Hosts)
└── blocks/                 # Gepullte Blocks vom PolyHub
    └── cargo.ayedo.cloud/
        └── ayedo/
            ├── k8s/
            │   ├── cilium/
            │   ├── cert-manager/
            │   ├── victoria-metrics-stack/
            │   ├── victoria-logs/
            │   ├── longhorn/
            │   ├── nginx/
            │   └── library/
            └── linux/
                ├── hardening/
                ├── keepalived/
                └── k8s-1.27/

Voraussetzungen¶

Hardware¶

Controlplanes: 3 Server (mind. 4 vCPU, 8 GB RAM, 100 GB SSD)
Workers: 6 Server (mind. 8 vCPU, 16 GB RAM, 500 GB SSD)
Netzwerk: Alle Server im gleichen Layer-2-Netzwerk (für Floating IP)

Software¶

Betriebssystem: Ubuntu 22.04 LTS oder Debian 12
SSH-Zugriff: User mit sudo-Rechten auf allen Servern
DNS: Wildcard-DNS für Ingress (z.B. *.example.com)
Cloudflare: Account + API-Key für Let's Encrypt DNS01-Challenge

Lokale Installation¶

# Polycrate installieren
curl -sSL https://get.polycrate.io | bash

# Workspace erstellen
mkdir -p ~/polycrate-workspaces/production
cd ~/polycrate-workspaces/production
polycrate workspace init --with-name production

Inventory konfigurieren¶

Erstellen Sie inventory.yml mit Ihren Server-IPs:

all:
  hosts:
    prod-controlplane-1:
      ansible_host: 10.0.0.11
      ansible_user: "demo"
      ansible_ssh_port: 22
      ansible_python_interpreter: "/usr/bin/python3"
      ansible_become: "yes"

    prod-controlplane-2:
      ansible_host: 10.0.0.12
      ansible_user: "demo"
      ansible_ssh_port: 22
      ansible_python_interpreter: "/usr/bin/python3"
      ansible_become: "yes"

    prod-controlplane-3:
      ansible_host: 10.0.0.13
      ansible_user: "demo"
      ansible_ssh_port: 22
      ansible_python_interpreter: "/usr/bin/python3"
      ansible_become: "yes"

    prod-worker-1:
      ansible_host: 10.0.0.21
      ansible_user: "demo"
      ansible_ssh_port: 22
      ansible_python_interpreter: "/usr/bin/python3"
      ansible_become: "yes"

    prod-worker-2:
      ansible_host: 10.0.0.22
      ansible_user: "demo"
      ansible_ssh_port: 22
      ansible_python_interpreter: "/usr/bin/python3"
      ansible_become: "yes"

    prod-worker-3:
      ansible_host: 10.0.0.23
      ansible_user: "demo"
      ansible_ssh_port: 22
      ansible_python_interpreter: "/usr/bin/python3"
      ansible_become: "yes"

    prod-worker-4:
      ansible_host: 10.0.0.24
      ansible_user: "demo"
      ansible_ssh_port: 22
      ansible_python_interpreter: "/usr/bin/python3"
      ansible_become: "yes"

    prod-worker-5:
      ansible_host: 10.0.0.25
      ansible_user: "demo"
      ansible_ssh_port: 22
      ansible_python_interpreter: "/usr/bin/python3"
      ansible_become: "yes"

    prod-worker-6:
      ansible_host: 10.0.0.26
      ansible_user: "demo"
      ansible_ssh_port: 22
      ansible_python_interpreter: "/usr/bin/python3"
      ansible_become: "yes"

  children:
    k8s_controlplane:
      hosts:
        prod-controlplane-1:
        prod-controlplane-2:
        prod-controlplane-3:

    k8s_worker:
      hosts:
        prod-worker-1:
        prod-worker-2:
        prod-worker-3:
        prod-worker-4:
        prod-worker-5:
        prod-worker-6:

Workspace-Konfiguration¶

Erstellen Sie workspace.poly:

# YAML-Anker für wiederverwendbare Werte
project-name: &project-name production
k3s-token: &k3s-token "your-secure-k3s-token-here"
letsencrypt-email: &letsencrypt-email "admin@example.com"
cloudflare-api-email: &cloudflare-api-email "cloudflare@example.com"
cloudflare-api-key: &cloudflare-api-key "your-cloudflare-api-key"

ssh_port: &ssh_port 22
ssh_user: &ssh_user demo

name: *project-name
organization: &organization example-org

# OCI Registry für private Blocks (optional)
registry:
  endpoint: registry.example.com
  username: robot+production-user
  password: registry-password

blocks:
  # 1. Library (Basis-Block mit Helper-Funktionen)
  - name: library
    from: cargo.ayedo.cloud/ayedo/k8s/library:0.0.6

  # 2. Hardening (Sicherheit für alle Nodes)
  - name: hardening
    from: cargo.ayedo.cloud/ayedo/linux/hardening:0.1.12
    config:
      hosts: all
      firewall:
        enabled: false  # dedizierter Router/Firewall vorhanden
      ssh:
        port: *ssh_port
        user: *ssh_user

  # 3. Keepalived (Floating IP für Controlplanes)
  - name: keepalived-controlplanes
    from: cargo.ayedo.cloud/ayedo/linux/keepalived:0.0.1
    config:
      hosts: k8s_controlplane
      app:
        virtual_ips:
          - 10.0.0.27/24  # Floating IP
        password: vrrp-password-change-me
        master: prod-controlplane-1
        interface: eth0

  # 4. Kubernetes Cluster (K3s)
  - name: k8s
    from: cargo.ayedo.cloud/ayedo/linux/k8s-1.27:0.3.6
    config:
      external_cloud_provider: false
      service_lb: false
      cluster_name: *project-name
      init:
        restart_nodes: false
      kube_proxy:
        disabled: true  # Cilium übernimmt kube-proxy
      oidc:
        enabled: true
        issuer_url: "https://id.example.com/"
        client_id: "kube-apiserver"
        username_claim: "email"
        groups_claim: "groups"
      apt:
        packages:
          - apparmor
          - apparmor-utils
          - unattended-upgrades
          - python3-pip
      flannel:
        disabled: true  # Cilium als CNI
      k3s:
        token: *k3s-token
        version: v1.34.1+k3s1
        tlssans:
          - "10.0.0.27"  # Floating IP für API-Server
      ansible_host_groups:
        controlplane: k8s_controlplane
        worker: "k8s_worker"

  # 5. Cilium CNI (Container Network Interface)
  - name: cilium
    from: cargo.ayedo.cloud/ayedo/k8s/cilium:0.2.12
    kubeconfig:
      from: k8s
    config:
      bandwidth_manager: true
      host_port: true
      kube_proxy_replacement: true
      node_port: true
      gateway_api: true
      k8s_service_host: 10.0.0.27  # Floating IP
      k8s_service_port: 6443
      loadbalancer_algorithm: maglev
      hubble:
        enabled: true
      loadbalancer_acceleration: best-effort
      loadbalancer_mode: snat
      routing_mode: tunnel
      routing_cidr: "172.16.187.0/24"
      auto_direct_node_routes: false
      masquerade_enabled: true
      masquerade_disabled: false
      bpf_masquerade_disabled: false
      encryption:
        enabled: false
      external_ips: true
      bgp_controlplane: true
      l2announcements: false

  # 6. cert-manager (TLS-Zertifikate)
  - name: cert-manager
    from: cargo.ayedo.cloud/ayedo/k8s/cert-manager:0.4.0
    kubeconfig:
      from: k8s
    config:
      letsencrypt:
        email: *letsencrypt-email
        solver:
          type:
            - dns01
          provider: cloudflare
          create_secret: true
          cloudflare:
            api_email: *cloudflare-api-email
            api_key: *cloudflare-api-key

  # 7. VictoriaMetrics Stack (Monitoring + Grafana)
  - name: monitoring
    from: cargo.ayedo.cloud/ayedo/k8s/victoria-metrics-stack:0.12.5
    kubeconfig:
      from: k8s
    config:
      labels:
        cluster: *project-name
        customer: *organization
        workspace: *project-name
      grafana:
        enabled: true
        custom_dashboards: true
        persistence:
          enabled: true
          storageclassname: longhorn
          size: 30Gi
        admin_password: "grafana-admin-password-change-me"
        ingress:
          enabled: true
          host: grafana.example.com
          class: nginx
          tls:
            enabled: true
            issuer: letsencrypt-production
        oidc:
          auto_login: true
          enabled: true
          name: "SSO Login"
          client_id: "grafana-client-id"
          client_secret: "grafana-client-secret"
          auth_url: "https://id.example.com/application/o/authorize/"
          token_url: "https://id.example.com/application/o/token/"
          api_url: "https://id.example.com/application/o/userinfo/"
          allow_sign_up: true
          scopes: "openid profile email"
          admin_group: "Admins"
          editor_group: "Editors"
      vm:
        replicationfactor: 3
        storage:
          replicas: 2
          size: 100Gi
          storageclassname: "longhorn"
          memory_limit: 10Gi
          cpu_limit: 2
        select:
          replicas: 2
          size: 10Gi
          storageclassname: "longhorn"
          memory_limit: 8Gi
          cpu_limit: 2
        insert:
          replicas: 2
          memory_limit: 2Gi
          cpu_limit: 1
        agent:
          memory_limit: 2Gi
          cpu_limit: 2

  # 8. VictoriaLogs (Zentralisiertes Logging)
  - name: logs
    from: cargo.ayedo.cloud/ayedo/k8s/victoria-logs:0.1.4
    kubeconfig:
      from: k8s
    config:
      namespace: victoria-logs
      syslog:
        enabled: true
        loadbalancer:
          enabled: true
          ip: 10.0.0.28  # Dedizierte IP für Syslog
      pvc:
        class: longhorn
        size: 100Gi
      resources:
        limits:
          cpu: 2
          memory: 8Gi
        requests:
          cpu: 1
          memory: 4Gi
      ingress:
        enabled: false
      monitoring:
        enabled: true
        vmservicescrape:
          enabled: true
      backup:
        enabled: false

  # 9. Longhorn (Distributed Block Storage)
  - name: longhorn
    from: cargo.ayedo.cloud/ayedo/k8s/longhorn:0.2.3
    kubeconfig:
      from: k8s
    config:
      volume_replica_count: 3

  # 10. NGINX Ingress Controller
  - name: ingress
    from: cargo.ayedo.cloud/ayedo/k8s/nginx:0.2.2
    kubeconfig:
      from: k8s
    config:
      metrics:
        enabled: true
        dashboards:
          enabled: true
        vmservicescrape:
          enabled: true

Deployment-Prozess¶

Phase 1: Hardening & Floating IP¶

# 1. Hardening aller Nodes
polycrate run hardening install

# 2. Keepalived für Floating IP auf Controlplanes
polycrate run keepalived-controlplanes install

# Verifizieren: Floating IP sollte erreichbar sein
ping 10.0.0.27

Phase 2: Kubernetes-Cluster¶

# 3. Kubernetes-Cluster provisonieren (dauert ~10-15 Min)
polycrate run k8s install

# Kubeconfig wird automatisch generiert nach:
# artifacts/blocks/k8s/kubeconfig.yml

# Verifizieren
export KUBECONFIG=artifacts/blocks/k8s/kubeconfig.yml
kubectl get nodes
# Sollte 3 Controlplanes + 6 Workers zeigen (NotReady, da CNI fehlt)

Phase 3: Networking & Storage¶

# 4. Cilium CNI deployen
polycrate run cilium install

# Nodes sollten jetzt Ready sein
kubectl get nodes

# 5. Longhorn Storage deployen
polycrate run longhorn install

# Verifizieren
kubectl get storageclass
# Should show "longhorn" StorageClass

Phase 4: Ingress & TLS¶

# 6. cert-manager deployen
polycrate run cert-manager install

# 7. NGINX Ingress deployen
polycrate run ingress install

# Verifizieren
kubectl get pods -n ingress-nginx
kubectl get clusterissuer
# Should show: letsencrypt-production, letsencrypt-staging

Phase 5: Observability¶

# 8. VictoriaMetrics + Grafana deployen
polycrate run monitoring install

# 9. VictoriaLogs deployen
polycrate run logs install

# Verifizieren
kubectl get pods -n victoria-metrics-stack
kubectl get pods -n victoria-logs

# Grafana aufrufen
# https://grafana.example.com
# (Login mit OIDC oder admin / grafana-admin-password-change-me)

Workflows für komplette Deployments¶

Sie können Workflows definieren, um mehrere Blocks nacheinander zu deployen:

# In workspace.poly hinzufügen:
workflows:
  - name: deploy-complete-stack
    steps:
      - name: hardening
        block: hardening
        action: install

      - name: keepalived
        block: keepalived-controlplanes
        action: install

      - name: kubernetes
        block: k8s
        action: install

      - name: cilium
        block: cilium
        action: install

      - name: longhorn
        block: longhorn
        action: install

      - name: cert-manager
        block: cert-manager
        action: install

      - name: ingress
        block: ingress
        action: install

      - name: monitoring
        block: monitoring
        action: install

      - name: logs
        block: logs
        action: install

Ausführen:

polycrate workflows run deploy-complete-stack

Verifikation & Testing¶

Cluster-Status¶

# Nodes
kubectl get nodes -o wide

# Pods über alle Namespaces
kubectl get pods -A

# Storage
kubectl get pv,pvc -A

# Ingress
kubectl get ingress -A

Cilium-Status¶

# Cilium-Status
kubectl exec -n kube-system ds/cilium -- cilium status

# Hubble UI (optional)
kubectl port-forward -n kube-system svc/hubble-ui 12000:80
# http://localhost:12000

Monitoring¶

# VictoriaMetrics Targets
kubectl port-forward -n victoria-metrics-stack svc/vmsingle-vm 8429:8429
# http://localhost:8429/targets

# Grafana
# https://grafana.example.com
# Default Dashboards:
# - Kubernetes / Compute Resources / Cluster
# - Kubernetes / Networking / Cluster
# - Cilium Metrics
# - NGINX Ingress Controller

Test-Deployment¶

Testen Sie die Plattform mit einer einfachen Anwendung:

# test-app.yaml
apiVersion: v1
kind: Namespace
metadata:
  name: test-app
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx
  namespace: test-app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:latest
        ports:
        - containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
  name: nginx
  namespace: test-app
spec:
  selector:
    app: nginx
  ports:
  - port: 80
    targetPort: 80
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: nginx
  namespace: test-app
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt-production
spec:
  ingressClassName: nginx
  tls:
  - hosts:
    - test.example.com
    secretName: test-app-tls
  rules:
  - host: test.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: nginx
            port:
              number: 80

kubectl apply -f test-app.yaml

# Warten auf TLS-Zertifikat
kubectl get certificate -n test-app -w

# Testen
curl https://test.example.com

Wartung & Updates¶

Kubernetes-Updates¶

# K3s auf neue Version updaten
# In workspace.poly:
# k3s.version: v1.35.0+k3s1

polycrate run k8s upgrade

Block-Updates¶

# Neue Block-Version pullen
polycrate blocks pull cargo.ayedo.cloud/ayedo/k8s/cilium:0.3.0

# In workspace.poly Version anpassen
# from: cargo.ayedo.cloud/ayedo/k8s/cilium:0.3.0

# Upgrade durchführen
polycrate run cilium upgrade

Backup & Disaster Recovery¶

# Velero für Cluster-Backups (optional)
polycrate blocks pull cargo.ayedo.cloud/ayedo/k8s/velero:latest

# In workspace.poly hinzufügen und konfigurieren
# Dann deployen:
polycrate run velero install

Troubleshooting¶

Logs analysieren¶

# Polycrate-Logs
ls -la .logs/

# Letzte Transaction
cat .logs/latest.log

# Ansible-Output
cat .logs/<transaction-id>/ansible-output.log

SSH-Debugging¶

# Direkter SSH-Zugriff zu Node
polycrate ssh prod-controlplane-1

# Ansible-Inventory testen
polycrate inventory list
polycrate inventory ping

Kubernetes-Debugging¶

# Events
kubectl get events -A --sort-by='.lastTimestamp'

# Pod-Logs
kubectl logs -n <namespace> <pod-name>

# Describe für Details
kubectl describe pod -n <namespace> <pod-name>

Best Practices¶

1. Versionierung¶

# Immer spezifische Versionen pinnen
from: cargo.ayedo.cloud/ayedo/k8s/cilium:0.2.12
# Nicht:
from: cargo.ayedo.cloud/ayedo/k8s/cilium:latest

2. Secrets Management¶

Wichtig: Secrets wie API-Keys, Passwörter und Tokens gehören nicht in Git!

Empfohlene Ansätze:

Separate Secrets-Datei: Erstellen Sie eine secrets.poly Datei (nicht in Git) mit sensiblen Werten
Manuelle Eingabe: Secrets direkt in workspace.poly eintragen vor Deployment, nicht committen
Externes Secrets-Management: Nutzen Sie Tools wie HashiCorp Vault, SOPS oder age-encryption

Beispiel .gitignore:

secrets.poly
*.key
*.pem
id_rsa*

3. Testing¶

Testen Sie Updates erst in Staging-Umgebung
Nutzen Sie separate Workspaces für Dev/Staging/Production
Dokumentieren Sie Änderungen in Git-Commits

4. Monitoring¶

Richten Sie Alerting in Grafana ein
Überwachen Sie Resource-Usage (CPU, RAM, Disk)
Setzen Sie PodDisruptionBudgets für kritische Apps

5. Backups¶

Regelmäßige Backups mit Velero
Testen Sie Restore-Prozesse
Dokumentieren Sie Recovery-Procedures

Ressourcen¶

PolyHub: hub.polycrate.io
Polycrate Docs: docs.polycrate.io
Cilium Docs: docs.cilium.io
VictoriaMetrics Docs: docs.victoriametrics.com
Longhorn Docs: longhorn.io/docs

Nächste Schritte¶

Workflows: Komplexere Deployment-Pipelines
Best Practices: Produktions-Tipps
Troubleshooting: Problemlösung
Cloud Migration: Von Hyperscalern zu eigener Infrastruktur

Production-Ready Stack

Mit diesem Setup haben Sie eine vollständig funktionale, hochverfügbare Kubernetes-Plattform, die alle Anforderungen einer modernen Software Delivery Platform erfüllt: HA, Monitoring, Logging, automatisierte TLS, verteilter Storage und Security Hardening.