Production-Beispiel: Software Delivery Platform¶
Überblick¶
Dieses Beispiel zeigt den Aufbau einer Production-Ready Software Delivery Platform mit Polycrate auf Bare-Metal-Servern. Der Stack umfasst:
- High-Availability Kubernetes-Cluster (3 Controlplanes, 6 Workers)
- Floating IP für Controlplane-Hochverfügbarkeit (Keepalived)
- Hardening aller Nodes (SSH, Firewall, Security-Updates)
- Cilium als CNI mit BGP, Gateway API und Hubble
- Longhorn als verteiltes Storage-System
- VictoriaMetrics für Monitoring mit Grafana
- VictoriaLogs für zentralisiertes Logging
- cert-manager für automatisierte TLS-Zertifikate (Let's Encrypt)
- NGINX Ingress Controller mit Monitoring-Integration
Architektur:
graph TB
subgraph "Controlplanes (HA)"
CP1[Controlplane 1<br/>10.0.0.11]
CP2[Controlplane 2<br/>10.0.0.12]
CP3[Controlplane 3<br/>10.0.0.13]
VIP[Floating IP<br/>10.0.0.27<br/>keepalived]
CP1 --> VIP
CP2 --> VIP
CP3 --> VIP
end
subgraph "Workers"
W1[Worker 1]
W2[Worker 2]
W3[Worker 3]
W4[Worker 4]
W5[Worker 5]
W6[Worker 6]
end
VIP --> K8s[Kubernetes API<br/>:6443]
subgraph "CNI & Networking"
Cilium[Cilium CNI<br/>+ BGP + Gateway API]
Hubble[Hubble Observability]
end
subgraph "Storage"
Longhorn[Longhorn<br/>Replicated Storage]
end
subgraph "Ingress & TLS"
Ingress[NGINX Ingress]
CertManager[cert-manager<br/>Let's Encrypt]
end
subgraph "Observability"
VM[VictoriaMetrics<br/>+ Grafana]
VL[VictoriaLogs<br/>Syslog :514]
end
K8s --> Cilium
K8s --> Longhorn
K8s --> Ingress
K8s --> CertManager
K8s --> VM
K8s --> VL
Cilium --> Hubble
style VIP fill:#e74c3c,color:#fff
style K8s fill:#326ce5,color:#fff
style Cilium fill:#f1c40f,color:#000 Workspace-Struktur¶
polycrate-demo/
├── workspace.poly # Hauptkonfiguration
├── inventory.yml # Ansible-Inventory (Hosts)
└── blocks/ # Gepullte Blocks vom PolyHub
└── cargo.ayedo.cloud/
└── ayedo/
├── k8s/
│ ├── cilium/
│ ├── cert-manager/
│ ├── victoria-metrics-stack/
│ ├── victoria-logs/
│ ├── longhorn/
│ ├── nginx/
│ └── library/
└── linux/
├── hardening/
├── keepalived/
└── k8s-1.27/
Voraussetzungen¶
Hardware¶
- Controlplanes: 3 Server (mind. 4 vCPU, 8 GB RAM, 100 GB SSD)
- Workers: 6 Server (mind. 8 vCPU, 16 GB RAM, 500 GB SSD)
- Netzwerk: Alle Server im gleichen Layer-2-Netzwerk (für Floating IP)
Software¶
- Betriebssystem: Ubuntu 22.04 LTS oder Debian 12
- SSH-Zugriff: User mit sudo-Rechten auf allen Servern
- DNS: Wildcard-DNS für Ingress (z.B.
*.example.com) - Cloudflare: Account + API-Key für Let's Encrypt DNS01-Challenge
Lokale Installation¶
# Polycrate installieren
curl -sSL https://get.polycrate.io | bash
# Workspace erstellen
mkdir -p ~/polycrate-workspaces/production
cd ~/polycrate-workspaces/production
polycrate workspace init --with-name production
Inventory konfigurieren¶
Erstellen Sie inventory.yml mit Ihren Server-IPs:
all:
hosts:
prod-controlplane-1:
ansible_host: 10.0.0.11
ansible_user: "demo"
ansible_ssh_port: 22
ansible_python_interpreter: "/usr/bin/python3"
ansible_become: "yes"
prod-controlplane-2:
ansible_host: 10.0.0.12
ansible_user: "demo"
ansible_ssh_port: 22
ansible_python_interpreter: "/usr/bin/python3"
ansible_become: "yes"
prod-controlplane-3:
ansible_host: 10.0.0.13
ansible_user: "demo"
ansible_ssh_port: 22
ansible_python_interpreter: "/usr/bin/python3"
ansible_become: "yes"
prod-worker-1:
ansible_host: 10.0.0.21
ansible_user: "demo"
ansible_ssh_port: 22
ansible_python_interpreter: "/usr/bin/python3"
ansible_become: "yes"
prod-worker-2:
ansible_host: 10.0.0.22
ansible_user: "demo"
ansible_ssh_port: 22
ansible_python_interpreter: "/usr/bin/python3"
ansible_become: "yes"
prod-worker-3:
ansible_host: 10.0.0.23
ansible_user: "demo"
ansible_ssh_port: 22
ansible_python_interpreter: "/usr/bin/python3"
ansible_become: "yes"
prod-worker-4:
ansible_host: 10.0.0.24
ansible_user: "demo"
ansible_ssh_port: 22
ansible_python_interpreter: "/usr/bin/python3"
ansible_become: "yes"
prod-worker-5:
ansible_host: 10.0.0.25
ansible_user: "demo"
ansible_ssh_port: 22
ansible_python_interpreter: "/usr/bin/python3"
ansible_become: "yes"
prod-worker-6:
ansible_host: 10.0.0.26
ansible_user: "demo"
ansible_ssh_port: 22
ansible_python_interpreter: "/usr/bin/python3"
ansible_become: "yes"
children:
k8s_controlplane:
hosts:
prod-controlplane-1:
prod-controlplane-2:
prod-controlplane-3:
k8s_worker:
hosts:
prod-worker-1:
prod-worker-2:
prod-worker-3:
prod-worker-4:
prod-worker-5:
prod-worker-6:
Workspace-Konfiguration¶
Erstellen Sie workspace.poly:
# YAML-Anker für wiederverwendbare Werte
project-name: &project-name production
k3s-token: &k3s-token "your-secure-k3s-token-here"
letsencrypt-email: &letsencrypt-email "admin@example.com"
cloudflare-api-email: &cloudflare-api-email "cloudflare@example.com"
cloudflare-api-key: &cloudflare-api-key "your-cloudflare-api-key"
ssh_port: &ssh_port 22
ssh_user: &ssh_user demo
name: *project-name
organization: &organization example-org
# OCI Registry für private Blocks (optional)
registry:
endpoint: registry.example.com
username: robot+production-user
password: registry-password
blocks:
# 1. Library (Basis-Block mit Helper-Funktionen)
- name: library
from: cargo.ayedo.cloud/ayedo/k8s/library:0.0.6
# 2. Hardening (Sicherheit für alle Nodes)
- name: hardening
from: cargo.ayedo.cloud/ayedo/linux/hardening:0.1.12
config:
hosts: all
firewall:
enabled: false # dedizierter Router/Firewall vorhanden
ssh:
port: *ssh_port
user: *ssh_user
# 3. Keepalived (Floating IP für Controlplanes)
- name: keepalived-controlplanes
from: cargo.ayedo.cloud/ayedo/linux/keepalived:0.0.1
config:
hosts: k8s_controlplane
app:
virtual_ips:
- 10.0.0.27/24 # Floating IP
password: vrrp-password-change-me
master: prod-controlplane-1
interface: eth0
# 4. Kubernetes Cluster (K3s)
- name: k8s
from: cargo.ayedo.cloud/ayedo/linux/k8s-1.27:0.3.6
config:
external_cloud_provider: false
service_lb: false
cluster_name: *project-name
init:
restart_nodes: false
kube_proxy:
disabled: true # Cilium übernimmt kube-proxy
oidc:
enabled: true
issuer_url: "https://id.example.com/"
client_id: "kube-apiserver"
username_claim: "email"
groups_claim: "groups"
apt:
packages:
- apparmor
- apparmor-utils
- unattended-upgrades
- python3-pip
flannel:
disabled: true # Cilium als CNI
k3s:
token: *k3s-token
version: v1.34.1+k3s1
tlssans:
- "10.0.0.27" # Floating IP für API-Server
ansible_host_groups:
controlplane: k8s_controlplane
worker: "k8s_worker"
# 5. Cilium CNI (Container Network Interface)
- name: cilium
from: cargo.ayedo.cloud/ayedo/k8s/cilium:0.2.12
kubeconfig:
from: k8s
config:
bandwidth_manager: true
host_port: true
kube_proxy_replacement: true
node_port: true
gateway_api: true
k8s_service_host: 10.0.0.27 # Floating IP
k8s_service_port: 6443
loadbalancer_algorithm: maglev
hubble:
enabled: true
loadbalancer_acceleration: best-effort
loadbalancer_mode: snat
routing_mode: tunnel
routing_cidr: "172.16.187.0/24"
auto_direct_node_routes: false
masquerade_enabled: true
masquerade_disabled: false
bpf_masquerade_disabled: false
encryption:
enabled: false
external_ips: true
bgp_controlplane: true
l2announcements: false
# 6. cert-manager (TLS-Zertifikate)
- name: cert-manager
from: cargo.ayedo.cloud/ayedo/k8s/cert-manager:0.4.0
kubeconfig:
from: k8s
config:
letsencrypt:
email: *letsencrypt-email
solver:
type:
- dns01
provider: cloudflare
create_secret: true
cloudflare:
api_email: *cloudflare-api-email
api_key: *cloudflare-api-key
# 7. VictoriaMetrics Stack (Monitoring + Grafana)
- name: monitoring
from: cargo.ayedo.cloud/ayedo/k8s/victoria-metrics-stack:0.12.5
kubeconfig:
from: k8s
config:
labels:
cluster: *project-name
customer: *organization
workspace: *project-name
grafana:
enabled: true
custom_dashboards: true
persistence:
enabled: true
storageclassname: longhorn
size: 30Gi
admin_password: "grafana-admin-password-change-me"
ingress:
enabled: true
host: grafana.example.com
class: nginx
tls:
enabled: true
issuer: letsencrypt-production
oidc:
auto_login: true
enabled: true
name: "SSO Login"
client_id: "grafana-client-id"
client_secret: "grafana-client-secret"
auth_url: "https://id.example.com/application/o/authorize/"
token_url: "https://id.example.com/application/o/token/"
api_url: "https://id.example.com/application/o/userinfo/"
allow_sign_up: true
scopes: "openid profile email"
admin_group: "Admins"
editor_group: "Editors"
vm:
replicationfactor: 3
storage:
replicas: 2
size: 100Gi
storageclassname: "longhorn"
memory_limit: 10Gi
cpu_limit: 2
select:
replicas: 2
size: 10Gi
storageclassname: "longhorn"
memory_limit: 8Gi
cpu_limit: 2
insert:
replicas: 2
memory_limit: 2Gi
cpu_limit: 1
agent:
memory_limit: 2Gi
cpu_limit: 2
# 8. VictoriaLogs (Zentralisiertes Logging)
- name: logs
from: cargo.ayedo.cloud/ayedo/k8s/victoria-logs:0.1.4
kubeconfig:
from: k8s
config:
namespace: victoria-logs
syslog:
enabled: true
loadbalancer:
enabled: true
ip: 10.0.0.28 # Dedizierte IP für Syslog
pvc:
class: longhorn
size: 100Gi
resources:
limits:
cpu: 2
memory: 8Gi
requests:
cpu: 1
memory: 4Gi
ingress:
enabled: false
monitoring:
enabled: true
vmservicescrape:
enabled: true
backup:
enabled: false
# 9. Longhorn (Distributed Block Storage)
- name: longhorn
from: cargo.ayedo.cloud/ayedo/k8s/longhorn:0.2.3
kubeconfig:
from: k8s
config:
volume_replica_count: 3
# 10. NGINX Ingress Controller
- name: ingress
from: cargo.ayedo.cloud/ayedo/k8s/nginx:0.2.2
kubeconfig:
from: k8s
config:
metrics:
enabled: true
dashboards:
enabled: true
vmservicescrape:
enabled: true
Deployment-Prozess¶
Phase 1: Hardening & Floating IP¶
# 1. Hardening aller Nodes
polycrate run hardening install
# 2. Keepalived für Floating IP auf Controlplanes
polycrate run keepalived-controlplanes install
# Verifizieren: Floating IP sollte erreichbar sein
ping 10.0.0.27
Phase 2: Kubernetes-Cluster¶
# 3. Kubernetes-Cluster provisonieren (dauert ~10-15 Min)
polycrate run k8s install
# Kubeconfig wird automatisch generiert nach:
# artifacts/blocks/k8s/kubeconfig.yml
# Verifizieren
export KUBECONFIG=artifacts/blocks/k8s/kubeconfig.yml
kubectl get nodes
# Sollte 3 Controlplanes + 6 Workers zeigen (NotReady, da CNI fehlt)
Phase 3: Networking & Storage¶
# 4. Cilium CNI deployen
polycrate run cilium install
# Nodes sollten jetzt Ready sein
kubectl get nodes
# 5. Longhorn Storage deployen
polycrate run longhorn install
# Verifizieren
kubectl get storageclass
# Should show "longhorn" StorageClass
Phase 4: Ingress & TLS¶
# 6. cert-manager deployen
polycrate run cert-manager install
# 7. NGINX Ingress deployen
polycrate run ingress install
# Verifizieren
kubectl get pods -n ingress-nginx
kubectl get clusterissuer
# Should show: letsencrypt-production, letsencrypt-staging
Phase 5: Observability¶
# 8. VictoriaMetrics + Grafana deployen
polycrate run monitoring install
# 9. VictoriaLogs deployen
polycrate run logs install
# Verifizieren
kubectl get pods -n victoria-metrics-stack
kubectl get pods -n victoria-logs
# Grafana aufrufen
# https://grafana.example.com
# (Login mit OIDC oder admin / grafana-admin-password-change-me)
Workflows für komplette Deployments¶
Sie können Workflows definieren, um mehrere Blocks nacheinander zu deployen:
# In workspace.poly hinzufügen:
workflows:
- name: deploy-complete-stack
steps:
- name: hardening
block: hardening
action: install
- name: keepalived
block: keepalived-controlplanes
action: install
- name: kubernetes
block: k8s
action: install
- name: cilium
block: cilium
action: install
- name: longhorn
block: longhorn
action: install
- name: cert-manager
block: cert-manager
action: install
- name: ingress
block: ingress
action: install
- name: monitoring
block: monitoring
action: install
- name: logs
block: logs
action: install
Ausführen:
Verifikation & Testing¶
Cluster-Status¶
# Nodes
kubectl get nodes -o wide
# Pods über alle Namespaces
kubectl get pods -A
# Storage
kubectl get pv,pvc -A
# Ingress
kubectl get ingress -A
Cilium-Status¶
# Cilium-Status
kubectl exec -n kube-system ds/cilium -- cilium status
# Hubble UI (optional)
kubectl port-forward -n kube-system svc/hubble-ui 12000:80
# http://localhost:12000
Monitoring¶
# VictoriaMetrics Targets
kubectl port-forward -n victoria-metrics-stack svc/vmsingle-vm 8429:8429
# http://localhost:8429/targets
# Grafana
# https://grafana.example.com
# Default Dashboards:
# - Kubernetes / Compute Resources / Cluster
# - Kubernetes / Networking / Cluster
# - Cilium Metrics
# - NGINX Ingress Controller
Test-Deployment¶
Testen Sie die Plattform mit einer einfachen Anwendung:
# test-app.yaml
apiVersion: v1
kind: Namespace
metadata:
name: test-app
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx
namespace: test-app
spec:
replicas: 3
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:latest
ports:
- containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
name: nginx
namespace: test-app
spec:
selector:
app: nginx
ports:
- port: 80
targetPort: 80
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: nginx
namespace: test-app
annotations:
cert-manager.io/cluster-issuer: letsencrypt-production
spec:
ingressClassName: nginx
tls:
- hosts:
- test.example.com
secretName: test-app-tls
rules:
- host: test.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: nginx
port:
number: 80
kubectl apply -f test-app.yaml
# Warten auf TLS-Zertifikat
kubectl get certificate -n test-app -w
# Testen
curl https://test.example.com
Wartung & Updates¶
Kubernetes-Updates¶
# K3s auf neue Version updaten
# In workspace.poly:
# k3s.version: v1.35.0+k3s1
polycrate run k8s upgrade
Block-Updates¶
# Neue Block-Version pullen
polycrate blocks pull cargo.ayedo.cloud/ayedo/k8s/cilium:0.3.0
# In workspace.poly Version anpassen
# from: cargo.ayedo.cloud/ayedo/k8s/cilium:0.3.0
# Upgrade durchführen
polycrate run cilium upgrade
Backup & Disaster Recovery¶
# Velero für Cluster-Backups (optional)
polycrate blocks pull cargo.ayedo.cloud/ayedo/k8s/velero:latest
# In workspace.poly hinzufügen und konfigurieren
# Dann deployen:
polycrate run velero install
Troubleshooting¶
Logs analysieren¶
# Polycrate-Logs
ls -la .logs/
# Letzte Transaction
cat .logs/latest.log
# Ansible-Output
cat .logs/<transaction-id>/ansible-output.log
SSH-Debugging¶
# Direkter SSH-Zugriff zu Node
polycrate ssh prod-controlplane-1
# Ansible-Inventory testen
polycrate inventory list
polycrate inventory ping
Kubernetes-Debugging¶
# Events
kubectl get events -A --sort-by='.lastTimestamp'
# Pod-Logs
kubectl logs -n <namespace> <pod-name>
# Describe für Details
kubectl describe pod -n <namespace> <pod-name>
Best Practices¶
1. Versionierung¶
# Immer spezifische Versionen pinnen
from: cargo.ayedo.cloud/ayedo/k8s/cilium:0.2.12
# Nicht:
from: cargo.ayedo.cloud/ayedo/k8s/cilium:latest
2. Secrets Management¶
Wichtig: Secrets wie API-Keys, Passwörter und Tokens gehören nicht in Git!
Empfohlene Ansätze:
- Separate Secrets-Datei: Erstellen Sie eine
secrets.polyDatei (nicht in Git) mit sensiblen Werten - Manuelle Eingabe: Secrets direkt in
workspace.polyeintragen vor Deployment, nicht committen - Externes Secrets-Management: Nutzen Sie Tools wie HashiCorp Vault, SOPS oder age-encryption
Beispiel .gitignore:
3. Testing¶
- Testen Sie Updates erst in Staging-Umgebung
- Nutzen Sie separate Workspaces für Dev/Staging/Production
- Dokumentieren Sie Änderungen in Git-Commits
4. Monitoring¶
- Richten Sie Alerting in Grafana ein
- Überwachen Sie Resource-Usage (CPU, RAM, Disk)
- Setzen Sie PodDisruptionBudgets für kritische Apps
5. Backups¶
- Regelmäßige Backups mit Velero
- Testen Sie Restore-Prozesse
- Dokumentieren Sie Recovery-Procedures
Ressourcen¶
- PolyHub: hub.polycrate.io
- Polycrate Docs: docs.polycrate.io
- Cilium Docs: docs.cilium.io
- VictoriaMetrics Docs: docs.victoriametrics.com
- Longhorn Docs: longhorn.io/docs
Nächste Schritte¶
- Workflows: Komplexere Deployment-Pipelines
- Best Practices: Produktions-Tipps
- Troubleshooting: Problemlösung
- Cloud Migration: Von Hyperscalern zu eigener Infrastruktur
Production-Ready Stack
Mit diesem Setup haben Sie eine vollständig funktionale, hochverfügbare Kubernetes-Plattform, die alle Anforderungen einer modernen Software Delivery Platform erfüllt: HA, Monitoring, Logging, automatisierte TLS, verteilter Storage und Security Hardening.