DevOps Grundlagen: CI/CD, Docker, Kubernetes, Automation, Monitoring & Infrastructure as Code

Dieser Beitrag ist eine umfassende Einführung in die DevOps Grundlagen – inklusive CI/CD, Docker, Kubernetes, Automation, Monitoring und Infrastructure as Code mit praktischen Beispielen.

In a Nutshell

DevOps ist eine Kultur und Methodik, die Softwareentwicklung (Dev) und IT-Operationen (Ops) zusammenbringt, um die Softwarelieferkette zu automatisieren und zu beschleunigen.

Kompakte Fachbeschreibung

DevOps ist ein Ansatz, der durch Automatisierung, Kollaboration und kontinuierliche Verbesserung die Lücke zwischen Entwicklung und Betrieb überwindet.

Kernkomponenten:

Continuous Integration/Continuous Deployment (CI/CD)

Version Control: Git, GitHub, GitLab, Bitbucket
Build Automation: Jenkins, GitHub Actions, GitLab CI
Testing: Unit Tests, Integration Tests, E2E Tests
Deployment: Automated Rollouts, Blue/Green, Canary

Containerisierung

Docker: Container-Plattform für Anwendungsisolation
Docker Compose: Multi-Container-Anwendungen
Container Registry: Docker Hub, Harbor, AWS ECR
Image Optimization: Multi-stage Builds, Layer Caching

Orchestrierung

Kubernetes: Container-Orchestrierungsplattform
Services: Pods, Deployments, Services, Ingress
Configuration: ConfigMaps, Secrets, Helm Charts
Scaling: Horizontal Pod Autoscaling, Cluster Autoscaling

Infrastructure as Code (IaC)

Terraform: Multi-Cloud Infrastructure Provisioning
Ansible: Configuration Management
CloudFormation: AWS-native IaC
Pulumi: Programmierbare Infrastruktur

Monitoring & Observability

Metrics: Prometheus, Grafana, InfluxDB
Logging: ELK Stack, Fluentd, Loki
Tracing: Jaeger, Zipkin, OpenTelemetry
APM: Application Performance Monitoring

Prüfungsrelevante Stichpunkte

DevOps: Kultur und Methodik für Softwareentwicklung und Betrieb
CI/CD: Continuous Integration und Continuous Deployment
Docker: Container-Plattform für Anwendungsisolation
Kubernetes: Container-Orchestrierungsplattform
Infrastructure as Code: Automatisierte Infrastrukturverwaltung
Monitoring: Überwachung von Systemen und Anwendungen
Automation: Automatisierung von wiederkehrenden Aufgaben
GitOps: Git-basierte Operations-Workflows
IHK-relevant: Moderne DevOps-Praktiken und -Tools

Kernkomponenten

Version Control: Git-Workflows, Branching-Strategien
CI/CD Pipeline: Build, Test, Deploy, Monitor
Containerisierung: Docker, Container-Images, Registry
Orchestrierung: Kubernetes, Services, Scaling
IaC: Terraform, Ansible, Configuration Management
Monitoring: Metrics, Logging, Tracing
Security: Scanning, Compliance, Secret Management
Collaboration: Team-Workflows, Communication

Praxisbeispiele

1. CI/CD Pipeline mit GitHub Actions

# .github/workflows/ci-cd.yml
name: CI/CD Pipeline

on:
  push:
    branches: [ main, develop ]
  pull_request:
    branches: [ main ]
  release:
    types: [ published ]

env:
  REGISTRY: ghcr.io
  IMAGE_NAME: ${{ github.repository }}
  NODE_VERSION: '18'
  PYTHON_VERSION: '3.11'

jobs:
  # Code Quality and Security
  quality:
    name: Code Quality & Security
    runs-on: ubuntu-latest
    
    steps:
    - name: Checkout code
      uses: actions/checkout@v4
      with:
        fetch-depth: 0
    
    - name: Setup Node.js
      uses: actions/setup-node@v4
      with:
        node-version: ${{ env.NODE_VERSION }}
        cache: 'npm'
    
    - name: Setup Python
      uses: actions/setup-python@v4
      with:
        python-version: ${{ env.PYTHON_VERSION }}
        cache: 'pip'
    
    - name: Install dependencies
      run: |
        npm ci
        pip install -r requirements.txt
        pip install -r requirements-dev.txt
    
    - name: Run ESLint
      run: npm run lint
    
    - name: Run Prettier check
      run: npm run format:check
    
    - name: Run Python linting
      run: |
        flake8 src/
        black --check src/
        isort --check-only src/
    
    - name: Run security scan
      run: |
        npm audit --audit-level moderate
        safety check
    
    - name: Run SonarCloud scan
      uses: SonarSource/sonarcloud-github-action@master
      env:
        GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
        SONAR_TOKEN: ${{ secrets.SONAR_TOKEN }}

  # Testing
  test:
    name: Test Suite
    runs-on: ubuntu-latest
    strategy:
      matrix:
        node-version: [16, 18, 20]
        python-version: [3.9, 3.11, 3.12]
    
    steps:
    - name: Checkout code
      uses: actions/checkout@v4
    
    - name: Setup Node.js ${{ matrix.node-version }}
      uses: actions/setup-node@v4
      with:
        node-version: ${{ matrix.node-version }}
        cache: 'npm'
    
    - name: Setup Python ${{ matrix.python-version }}
      uses: actions/setup-python@v4
      with:
        python-version: ${{ matrix.python-version }}
        cache: 'pip'
    
    - name: Install dependencies
      run: |
        npm ci
        pip install -r requirements.txt
        pip install -r requirements-test.txt
    
    - name: Run unit tests
      run: |
        npm run test:unit
        pytest tests/unit/ -v --cov=src --cov-report=xml
    
    - name: Run integration tests
      run: |
        npm run test:integration
        pytest tests/integration/ -v
    
    - name: Upload coverage to Codecov
      uses: codecov/codecov-action@v3
      with:
        file: ./coverage.xml
        flags: unittests
        name: codecov-umbrella

  # Build and Test Docker Image
  build:
    name: Build Docker Image
    runs-on: ubuntu-latest
    needs: [quality, test]
    
    steps:
    - name: Checkout code
      uses: actions/checkout@v4
    
    - name: Set up Docker Buildx
      uses: docker/setup-buildx-action@v3
    
    - name: Log in to Container Registry
      uses: docker/login-action@v3
      with:
        registry: ${{ env.REGISTRY }}
        username: ${{ github.actor }}
        password: ${{ secrets.GITHUB_TOKEN }}
    
    - name: Extract metadata
      id: meta
      uses: docker/metadata-action@v5
      with:
        images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
        tags: |
          type=ref,event=branch
          type=ref,event=pr
          type=sha,prefix={{branch}}-
          type=raw,value=latest,enable={{is_default_branch}}
    
    - name: Build and push Docker image
      uses: docker/build-push-action@v5
      with:
        context: .
        platforms: linux/amd64,linux/arm64
        push: true
        tags: ${{ steps.meta.outputs.tags }}
        labels: ${{ steps.meta.outputs.labels }}
        cache-from: type=gha
        cache-to: type=gha,mode=max
    
    - name: Run container security scan
      uses: aquasecurity/trivy-action@master
      with:
        image-ref: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ github.sha }}
        format: 'sarif'
        output: 'trivy-results.sarif'
    
    - name: Upload Trivy scan results to GitHub Security tab
      uses: github/codeql-action/upload-sarif@v2
      with:
        sarif_file: 'trivy-results.sarif'

  # Deploy to Staging
  deploy-staging:
    name: Deploy to Staging
    runs-on: ubuntu-latest
    needs: build
    if: github.ref == 'refs/heads/develop'
    environment: staging
    
    steps:
    - name: Checkout code
      uses: actions/checkout@v4
    
    - name: Setup kubectl
      uses: azure/setup-kubectl@v3
      with:
        version: 'v1.28.0'
    
    - name: Configure kubectl
      run: |
        echo "${{ secrets.KUBE_CONFIG_STAGING }}" | base64 -d > kubeconfig
        export KUBECONFIG=kubeconfig
    
    - name: Deploy to Kubernetes
      run: |
        export KUBECONFIG=kubeconfig
        helm upgrade --install app-staging ./helm/app \
          --namespace staging \
          --create-namespace \
          --set image.tag=${{ github.sha }} \
          --set environment=staging \
          --values helm/values-staging.yaml
    
    - name: Run smoke tests
      run: |
        export KUBECONFIG=kubeconfig
        kubectl wait --for=condition=ready pod -l app=app-staging -n staging --timeout=300s
        npm run test:smoke -- --env=staging
    
    - name: Run integration tests against staging
      run: |
        npm run test:integration -- --env=staging

  # Deploy to Production
  deploy-production:
    name: Deploy to Production
    runs-on: ubuntu-latest
    needs: build
    if: github.event_name == 'release'
    environment: production
    
    steps:
    - name: Checkout code
      uses: actions/checkout@v4
    
    - name: Setup kubectl
      uses: azure/setup-kubectl@v3
      with:
        version: 'v1.28.0'
    
    - name: Configure kubectl
      run: |
        echo "${{ secrets.KUBE_CONFIG_PRODUCTION }}" | base64 -d > kubeconfig
        export KUBECONFIG=kubeconfig
    
    - name: Deploy to Kubernetes (Blue/Green)
      run: |
        export KUBECONFIG=kubeconfig
        
        # Deploy to green environment
        helm upgrade --install app-green ./helm/app \
          --namespace production \
          --set image.tag=${{ github.sha }} \
          --set environment=production \
          --set deployment.color=green \
          --values helm/values-production.yaml
        
        # Wait for green deployment to be ready
        kubectl wait --for=condition=ready pod -l app=app-green,color=green -n production --timeout=600s
        
        # Run health checks
        npm run test:health -- --env=production-green
        
        # Switch traffic to green
        kubectl patch service app-production -n production -p '{"spec":{"selector":{"color":"green"}}}'
        
        # Wait for traffic switch
        sleep 30
        
        # Run final tests
        npm run test:smoke -- --env=production
    
    - name: Cleanup blue environment
      run: |
        export KUBECONFIG=kubeconfig
        helm uninstall app-blue -n production || true
        kubectl delete deployment app-blue -n production || true
    
    - name: Notify deployment
      uses: 8398a7/action-slack@v3
      with:
        status: ${{ job.status }}
        channel: '#deployments'
        webhook_url: ${{ secrets.SLACK_WEBHOOK }}
      if: always()

  # Performance Testing
  performance:
    name: Performance Testing
    runs-on: ubuntu-latest
    needs: deploy-staging
    if: github.ref == 'refs/heads/develop'
    
    steps:
    - name: Checkout code
      uses: actions/checkout@v4
    
    - name: Setup k6
      run: |
        sudo gpg -k
        sudo gpg --no-default-keyring --keyring /usr/share/keyrings/k6-archive-keyring.gpg --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys C5AD17C747E3415A3642D57D77C6C491D6AC1D69
        echo "deb [signed-by=/usr/share/keyrings/k6-archive-keyring.gpg] https://dl.k6.io/deb stable main" | sudo tee /etc/apt/sources.list.d/k6.list
        sudo apt-get update
        sudo apt-get install k6
    
    - name: Run performance tests
      run: |
        k6 run --out json=performance-results.json tests/performance/load-test.js
    
    - name: Upload performance results
      uses: actions/upload-artifact@v3
      with:
        name: performance-results
        path: performance-results.json
    
    - name: Analyze performance
      run: |
        npm run analyze:performance -- performance-results.json

  # Documentation
  docs:
    name: Build Documentation
    runs-on: ubuntu-latest
    needs: test
    
    steps:
    - name: Checkout code
      uses: actions/checkout@v4
    
    - name: Setup Node.js
      uses: actions/setup-node@v4
      with:
        node-version: ${{ env.NODE_VERSION }}
        cache: 'npm'
    
    - name: Install dependencies
      run: npm ci
    
    - name: Build documentation
      run: |
        npm run docs:build
        npm run docs:generate-api
    
    - name: Deploy to GitHub Pages
      uses: peaceiris/actions-gh-pages@v3
      if: github.ref == 'refs/heads/main'
      with:
        github_token: ${{ secrets.GITHUB_TOKEN }}
        publish_dir: ./docs/build

# Workflow for dependency updates
name: Dependency Updates

on:
  schedule:
    - cron: '0 2 * * 1'  # Every Monday at 2 AM
  workflow_dispatch:

jobs:
  update-dependencies:
    name: Update Dependencies
    runs-on: ubuntu-latest
    
    steps:
    - name: Checkout code
      uses: actions/checkout@v4
      with:
        token: ${{ secrets.GITHUB_TOKEN }}
    
    - name: Setup Node.js
      uses: actions/setup-node@v4
      with:
        node-version: ${{ env.NODE_VERSION }}
        cache: 'npm'
    
    - name: Setup Python
      uses: actions/setup-python@v4
      with:
        python-version: ${{ env.PYTHON_VERSION }}
        cache: 'pip'
    
    - name: Update Node.js dependencies
      run: |
        npm update
        npm audit fix
    
    - name: Update Python dependencies
      run: |
        pip-compile requirements.in
        pip-compile requirements-dev.in
    
    - name: Run tests
      run: |
        npm ci
        npm run test
        pip install -r requirements.txt
        pytest tests/
    
    - name: Create Pull Request
      uses: peter-evans/create-pull-request@v5
      with:
        token: ${{ secrets.GITHUB_TOKEN }}
        commit-message: 'chore: update dependencies'
        title: 'chore: update dependencies'
        body: |
          Automated dependency update
          
          - Updated Node.js dependencies
          - Updated Python dependencies
          
          Please review the changes and ensure all tests pass.
        branch: chore/update-dependencies
        delete-branch: true

2. Docker Multi-Stage Build mit Best Practices

# Multi-stage Dockerfile for production-ready application
# Stage 1: Build stage
FROM node:18-alpine AS builder

# Set build arguments
ARG NODE_ENV=production
ARG APP_VERSION=1.0.0

# Set environment variables
ENV NODE_ENV=$NODE_ENV
ENV APP_VERSION=$APP_VERSION

# Install build dependencies
RUN apk add --no-cache \
    python3 \
    make \
    g++ \
    git

# Create app directory
WORKDIR /app

# Copy package files
COPY package*.json ./
COPY requirements.txt ./

# Install Node.js dependencies
RUN npm ci --only=production && npm cache clean --force

# Install Python dependencies
RUN pip install --no-cache-dir -r requirements.txt

# Copy source code
COPY . .

# Run build and tests
RUN npm run build
RUN npm run test

# Stage 2: Runtime stage
FROM python:3.11-slim AS runtime

# Set runtime arguments
ARG APP_USER=appuser
ARG APP_UID=1001
ARG APP_GID=1001

# Set environment variables
ENV NODE_ENV=production
ENV PYTHONUNBUFFERED=1
ENV PYTHONDONTWRITEBYTECODE=1
ENV APP_PORT=3000

# Install runtime dependencies
RUN apt-get update && apt-get install -y \
    curl \
    ca-certificates \
    && rm -rf /var/lib/apt/lists/*

# Create non-root user
RUN groupadd -g $APP_GID $APP_USER && \
    useradd -m -u $APP_UID -g $APP_GID -s /bin/bash $APP_USER

# Create app directory
WORKDIR /app

# Copy built application from builder stage
COPY --from=builder --chown=$APP_USER:$APP_GID /app/dist ./dist
COPY --from=builder --chown=$APP_USER:$APP_GID /app/node_modules ./node_modules
COPY --from=builder --chown=$APP_USER:$APP_GID /app/requirements.txt ./

# Install Python production dependencies
RUN pip install --no-cache-dir -r requirements.txt

# Copy configuration files
COPY --chown=$APP_USER:$APP_GID config/ ./config/
COPY --chown=$APP_USER:$APP_GID scripts/ ./scripts/

# Set permissions
RUN chmod +x scripts/*.sh

# Switch to non-root user
USER $APP_USER

# Health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
    CMD curl -f http://localhost:$APP_PORT/health || exit 1

# Expose port
EXPOSE $APP_PORT

# Set entrypoint
ENTRYPOINT ["./scripts/entrypoint.sh"]

# Default command
CMD ["npm", "start"]

# Stage 3: Development stage
FROM runtime AS development

# Override environment for development
ENV NODE_ENV=development

# Install development dependencies
RUN apt-get update && apt-get install -y \
    git \
    vim \
    && rm -rf /var/lib/apt/lists/*

# Install Node.js development dependencies
RUN npm install

# Switch back to root for development tools
USER root

# Install development tools
RUN pip install --no-cache-dir pytest pytest-cov black flake8

# Switch back to app user
USER $APP_USER

# Override command for development
CMD ["npm", "run", "dev"]

# Stage 4: Testing stage
FROM builder AS testing

# Install test dependencies
RUN npm install --no-save
RUN pip install --no-cache-dir pytest pytest-cov

# Run comprehensive tests
RUN npm run test:coverage
RUN pytest tests/ --cov=src --cov-report=xml

# Security scanning
RUN npm audit --audit-level high
RUN safety check

# Stage 5: Security scanning stage
FROM builder AS security

# Install security scanning tools
RUN npm install -g audit-ci
RUN pip install safety bandit

# Run security scans
RUN audit-ci --moderate
RUN safety check --json --output safety-report.json
RUN bandit -r src/ -f json -o bandit-report.json

# Export security reports
COPY --from=security /app/safety-report.json /reports/
COPY --from=security /app/bandit-report.json /reports/

3. Kubernetes Deployment mit Helm und GitOps

# helm/app/Chart.yaml
apiVersion: v2
name: app
description: A Helm chart for deploying the application
type: application
version: 1.0.0
appVersion: "1.0.0"
home: https://github.com/organization/app
sources:
  - https://github.com/organization/app
maintainers:
  - name: DevOps Team
    email: devops@organization.com
keywords:
  - web
  - application
  - devops
annotations:
  category: WebApplication

# helm/app/values.yaml
# Default values for the application
replicaCount: 3

image:
  repository: ghcr.io/organization/app
  pullPolicy: IfNotPresent
  tag: "latest"

nameOverride: ""
fullnameOverride: ""

serviceAccount:
  create: true
  annotations: {}
  name: ""

podAnnotations: {}

podSecurityContext:
  fsGroup: 1001

securityContext:
  allowPrivilegeEscalation: false
  runAsNonRoot: true
  runAsUser: 1001
  runAsGroup: 1001
  capabilities:
    drop:
    - ALL
  readOnlyRootFilesystem: true

service:
  type: ClusterIP
  port: 80
  targetPort: 3000

ingress:
  enabled: true
  className: "nginx"
  annotations:
    cert-manager.io/cluster-issuer: "letsencrypt-prod"
    nginx.ingress.kubernetes.io/rate-limit: "100"
    nginx.ingress.kubernetes.io/rate-limit-window: "1m"
  hosts:
    - host: app.example.com
      paths:
        - path: /
          pathType: Prefix
  tls:
    - secretName: app-tls
      hosts:
        - app.example.com

resources:
  limits:
    cpu: 500m
    memory: 512Mi
  requests:
    cpu: 250m
    memory: 256Mi

autoscaling:
  enabled: true
  minReplicas: 3
  maxReplicas: 10
  targetCPUUtilizationPercentage: 70
  targetMemoryUtilizationPercentage: 80

nodeSelector: {}

tolerations: []

affinity:
  podAntiAffinity:
    preferredDuringSchedulingIgnoredDuringExecution:
    - weight: 100
      podAffinityTerm:
        labelSelector:
          matchExpressions:
          - key: app.kubernetes.io/name
            operator: In
            values:
            - app
        topologyKey: kubernetes.io/hostname

config:
  environment: production
  logLevel: info
  database:
    host: postgres.example.com
    port: 5432
    name: app_prod
  redis:
    host: redis.example.com
    port: 6379
  monitoring:
    enabled: true
    port: 9090

secrets:
  databasePassword: ""
  jwtSecret: ""
  apiKeys: ""

# helm/app/templates/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: {{ include "app.fullname" . }}
  labels:
    {{- include "app.labels" . | nindent 4 }}
spec:
  {{- if not .Values.autoscaling.enabled }}
  replicas: {{ .Values.replicaCount }}
  {{- end }}
  selector:
    matchLabels:
      {{- include "app.selectorLabels" . | nindent 6 }}
  template:
    metadata:
      annotations:
        checksum/config: {{ include (print $.Template.BasePath "/configmap.yaml") . | sha256sum }}
        checksum/secret: {{ include (print $.Template.BasePath "/secret.yaml") . | sha256sum }}
        {{- with .Values.podAnnotations }}
        {{- toYaml . | nindent 8 }}
        {{- end }}
      labels:
        {{- include "app.selectorLabels" . | nindent 8 }}
    spec:
      {{- with .Values.imagePullSecrets }}
      imagePullSecrets:
        {{- toYaml . | nindent 8 }}
      {{- end }}
      serviceAccountName: {{ include "app.serviceAccountName" . }}
      securityContext:
        {{- toYaml .Values.podSecurityContext | nindent 8 }}
      initContainers:
        - name: wait-for-db
          image: postgres:15-alpine
          command:
            - sh
            - -c
            - |
              until pg_isready -h {{ .Values.config.database.host }} -p {{ .Values.config.database.port }}; do
                echo "Waiting for database..."
                sleep 2
              done
        - name: migrate-db
          image: "{{ .Values.image.repository }}:{{ .Values.image.tag | default .Chart.AppVersion }}"
          command:
            - npm
            - run
            - migrate
          envFrom:
            - configMapRef:
                name: {{ include "app.fullname" . }}
            - secretRef:
                name: {{ include "app.fullname" . }}
      containers:
        - name: {{ .Chart.Name }}
          securityContext:
            {{- toYaml .Values.securityContext | nindent 12 }}
          image: "{{ .Values.image.repository }}:{{ .Values.image.tag | default .Chart.AppVersion }}"
          imagePullPolicy: {{ .Values.image.pullPolicy }}
          ports:
            - name: http
              containerPort: {{ .Values.service.targetPort }}
              protocol: TCP
            - name: metrics
              containerPort: {{ .Values.config.monitoring.port }}
              protocol: TCP
          livenessProbe:
            httpGet:
              path: /health
              port: http
            initialDelaySeconds: 30
            periodSeconds: 10
            timeoutSeconds: 5
            failureThreshold: 3
          readinessProbe:
            httpGet:
              path: /ready
              port: http
            initialDelaySeconds: 5
            periodSeconds: 5
            timeoutSeconds: 3
            failureThreshold: 3
          resources:
            {{- toYaml .Values.resources | nindent 12 }}
          envFrom:
            - configMapRef:
                name: {{ include "app.fullname" . }}
            - secretRef:
                name: {{ include "app.fullname" . }}
          volumeMounts:
            - name: tmp
              mountPath: /tmp
            - name: config
              mountPath: /app/config
              readOnly: true
        - name: log-shipper
          image: fluent/fluent-bit:2.0
          resources:
            limits:
              cpu: 100m
              memory: 128Mi
            requests:
              cpu: 50m
              memory: 64Mi
          volumeMounts:
            - name: varlog
              mountPath: /var/log
              readOnly: true
            - name: varlibdockercontainers
              mountPath: /var/lib/docker/containers
              readOnly: true
            - name: fluent-bit-config
              mountPath: /fluent-bit/etc/
      volumes:
        - name: tmp
          emptyDir: {}
        - name: config
          configMap:
            name: {{ include "app.fullname" . }}
        - name: varlog
          hostPath:
            path: /var/log
        - name: varlibdockercontainers
          hostPath:
            path: /var/lib/docker/containers
        - name: fluent-bit-config
          configMap:
            name: fluent-bit-config
      {{- with .Values.nodeSelector }}
      nodeSelector:
        {{- toYaml . | nindent 8 }}
      {{- end }}
      {{- with .Values.affinity }}
      affinity:
        {{- toYaml . | nindent 8 }}
      {{- end }}
      {{- with .Values.tolerations }}
      tolerations:
        {{- toYaml . | nindent 8 }}
      {{- end }}

# helm/app/templates/hpa.yaml
{{- if .Values.autoscaling.enabled }}
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: {{ include "app.fullname" . }}
  labels:
    {{- include "app.labels" . | nindent 4 }}
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: {{ include "app.fullname" . }}
  minReplicas: {{ .Values.autoscaling.minReplicas }}
  maxReplicas: {{ .Values.autoscaling.maxReplicas }}
  metrics:
    {{- if .Values.autoscaling.targetCPUUtilizationPercentage }}
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: {{ .Values.autoscaling.targetCPUUtilizationPercentage }}
    {{- end }}
    {{- if .Values.autoscaling.targetMemoryUtilizationPercentage }}
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: {{ .Values.autoscaling.targetMemoryUtilizationPercentage }}
    {{- end }}
{{- end }}

# helm/app/templates/monitoring.yaml
{{- if .Values.config.monitoring.enabled }}
apiVersion: v1
kind: Service
metadata:
  name: {{ include "app.fullname" . }}-metrics
  labels:
    {{- include "app.labels" . | nindent 4 }}
spec:
  type: ClusterIP
  ports:
    - port: {{ .Values.config.monitoring.port }}
      targetPort: metrics
      protocol: TCP
      name: metrics
  selector:
    {{- include "app.selectorLabels" . | nindent 4 }}

---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: {{ include "app.fullname" . }}
  labels:
    {{- include "app.labels" . | nindent 4 }}
spec:
  selector:
    matchLabels:
      {{- include "app.selectorLabels" . | nindent 6 }}
  endpoints:
    - port: metrics
      interval: 30s
      path: /metrics
{{- end }}

# GitOps Application Manifest (ArgoCD)
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: app-production
  namespace: argocd
spec:
  project: default
  source:
    repoURL: https://github.com/organization/app-helm
    targetRevision: HEAD
    path: helm/app
    helm:
      valueFiles:
        - values-production.yaml
  destination:
    server: https://kubernetes.default.svc
    namespace: production
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
    syncOptions:
      - CreateNamespace=true
    retry:
      limit: 5
      backoff:
        duration: 5s
        factor: 2
        maxDuration: 3m

4. Terraform Infrastructure as Code

# terraform/main.tf
provider "aws" {
  region = var.aws_region
  
  default_tags {
    tags = {
      Environment = var.environment
      Project     = var.project_name
      ManagedBy   = "terraform"
    }
  }
}

# Terraform backend configuration
terraform {
  backend "s3" {
    bucket         = "terraform-state-${var.project_name}"
    key            = "infrastructure/terraform.tfstate"
    region         = "us-east-1"
    encrypt        = true
    dynamodb_table = "terraform-locks-${var.project_name}"
  }
  
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
    
    kubernetes = {
      source  = "hashicorp/kubernetes"
      version = "~> 2.0"
    }
    
    helm = {
      source  = "hashicorp/helm"
      version = "~> 2.0"
    }
    
    random = {
      source  = "hashicorp/random"
      version = "~> 3.0"
    }
    
    null = {
      source  = "hashicorp/null"
      version = "~> 3.0"
    }
  }
}

# terraform/variables.tf
variable "aws_region" {
  description = "AWS region"
  type        = string
  default     = "us-east-1"
}

variable "environment" {
  description = "Environment name"
  type        = string
  default     = "production"
}

variable "project_name" {
  description = "Project name"
  type        = string
  default     = "my-app"
}

variable "vpc_cidr" {
  description = "CIDR block for VPC"
  type        = string
  default     = "10.0.0.0/16"
}

variable "availability_zones" {
  description = "List of availability zones"
  type        = list(string)
  default     = ["us-east-1a", "us-east-1b", "us-east-1c"]
}

variable "cluster_name" {
  description = "EKS cluster name"
  type        = string
  default     = "my-app-cluster"
}

variable "cluster_version" {
  description = "EKS cluster version"
  type        = string
  default     = "1.28"
}

variable "node_groups" {
  description = "EKS node groups configuration"
  type = map(object({
    instance_type = string
    min_size      = number
    max_size      = number
    desired_size  = number
    disk_size     = number
  }))
  
  default = {
    general = {
      instance_type = "t3.medium"
      min_size      = 3
      max_size      = 10
      desired_size  = 3
      disk_size     = 50
    }
    
    compute = {
      instance_type = "c5.large"
      min_size      = 2
      max_size      = 5
      desired_size  = 2
      disk_size     = 100
    }
  }
}

# terraform/vpc.tf
# VPC Configuration
resource "aws_vpc" "main" {
  cidr_block           = var.vpc_cidr
  enable_dns_hostnames = true
  enable_dns_support   = true
  
  tags = {
    Name = "${var.project_name}-vpc"
  }
}

# Internet Gateway
resource "aws_internet_gateway" "main" {
  vpc_id = aws_vpc.main.id
  
  tags = {
    Name = "${var.project_name}-igw"
  }
}

# Public Subnets
resource "aws_subnet" "public" {
  count = length(var.availability_zones)
  
  vpc_id                  = aws_vpc.main.id
  cidr_block              = cidrsubnet(var.vpc_cidr, 8, count.index)
  availability_zone       = var.availability_zones[count.index]
  map_public_ip_on_launch = true
  
  tags = {
    Name = "${var.project_name}-public-${count.index}"
    Type = "Public"
  }
}

# Private Subnets
resource "aws_subnet" "private" {
  count = length(var.availability_zones)
  
  vpc_id            = aws_vpc.main.id
  cidr_block        = cidrsubnet(var.vpc_cidr, 8, count.index + 3)
  availability_zone = var.availability_zones[count.index]
  
  tags = {
    Name = "${var.project_name}-private-${count.index}"
    Type = "Private"
  }
}

# Database Subnets
resource "aws_subnet" "database" {
  count = length(var.availability_zones)
  
  vpc_id            = aws_vpc.main.id
  cidr_block        = cidrsubnet(var.vpc_cidr, 8, count.index + 6)
  availability_zone = var.availability_zones[count.index]
  
  tags = {
    Name = "${var.project_name}-database-${count.index}"
    Type = "Database"
  }
}

# Route Tables
resource "aws_route_table" "public" {
  vpc_id = aws_vpc.main.id
  
  route {
    cidr_block = "0.0.0.0/0"
    gateway_id = aws_internet_gateway.main.id
  }
  
  tags = {
    Name = "${var.project_name}-public-rt"
  }
}

resource "aws_route_table_association" "public" {
  count = length(aws_subnet.public)
  
  subnet_id      = aws_subnet.public[count.index].id
  route_table_id = aws_route_table.public.id
}

# EKS Cluster
resource "aws_eks_cluster" "main" {
  name     = var.cluster_name
  role_arn = aws_iam_role.eks_cluster.arn
  version  = var.cluster_version
  
  vpc_config {
    subnet_ids = concat(
      aws_subnet.public[*].id,
      aws_subnet.private[*].id
    )
    
    endpoint_public_access  = true
    endpoint_private_access = true
    
    public_access_cidrs = ["0.0.0.0/0"]
  }
  
  depends_on = [
    aws_iam_role_policy_attachment.eks_cluster_policy,
  ]
  
  tags = {
    Name = var.cluster_name
  }
}

# EKS Node Groups
resource "aws_eks_node_group" "main" {
  for_each = var.node_groups
  
  cluster_name    = aws_eks_cluster.main.name
  node_group_name = each.key
  node_role_arn   = aws_iam_role.eks_node.arn
  
  subnet_ids = aws_subnet.private[*].id
  
  scaling_config {
    desired_size = each.value.desired_size
    max_size     = each.value.max_size
    min_size     = each.value.min_size
  }
  
  instance_types = [each.value.instance_type]
  disk_size      = each.value.disk_size
  
  remote_access {
    ec2_ssh_key               = aws_key_pair.main.key_name
    source_security_group_ids = [aws_security_group.eks_nodes.id]
  }
  
  depends_on = [
    aws_iam_role_policy_attachment.eks_worker_node_policy,
    aws_iam_role_policy_attachment.eks_cni_policy,
    aws_iam_role_policy_attachment.eks_container_registry_policy,
  ]
  
  tags = {
    Name = "${var.cluster_name}-${each.key}"
    Type = each.key
  }
}

# IAM Roles
resource "aws_iam_role" "eks_cluster" {
  name = "${var.project_name}-eks-cluster-role"
  
  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Action = "sts:AssumeRole"
        Effect = "Allow"
        Principal = {
          Service = "eks.amazonaws.com"
        }
      }
    ]
  })
}

resource "aws_iam_role_policy_attachment" "eks_cluster_policy" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonEKSClusterPolicy"
  role       = aws_iam_role.eks_cluster.name
}

resource "aws_iam_role" "eks_node" {
  name = "${var.project_name}-eks-node-role"
  
  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Action = "sts:AssumeRole"
        Effect = "Allow"
        Principal = {
          Service = "ec2.amazonaws.com"
        }
      }
    ]
  })
}

resource "aws_iam_role_policy_attachment" "eks_worker_node_policy" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy"
  role       = aws_iam_role.eks_node.name
}

resource "aws_iam_role_policy_attachment" "eks_cni_policy" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy"
  role       = aws_iam_role.eks_node.name
}

resource "aws_iam_role_policy_attachment" "eks_container_registry_policy" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly"
  role       = aws_iam_role.eks_node.name
}

# Security Groups
resource "aws_security_group" "eks_cluster" {
  name        = "${var.project_name}-eks-cluster-sg"
  description = "Security group for EKS cluster"
  vpc_id      = aws_vpc.main.id
  
  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }
  
  tags = {
    Name = "${var.project_name}-eks-cluster-sg"
  }
}

resource "aws_security_group" "eks_nodes" {
  name        = "${var.project_name}-eks-nodes-sg"
  description = "Security group for EKS nodes"
  vpc_id      = aws_vpc.main.id
  
  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }
  
  tags = {
    Name = "${var.project_name}-eks-nodes-sg"
  }
}

# RDS Database
resource "aws_db_subnet_group" "main" {
  name       = "${var.project_name}-db-subnet-group"
  subnet_ids = aws_subnet.database[*].id
  
  tags = {
    Name = "${var.project_name}-db-subnet-group"
  }
}

resource "aws_security_group" "rds" {
  name        = "${var.project_name}-rds-sg"
  description = "Security group for RDS database"
  vpc_id      = aws_vpc.main.id
  
  ingress {
    from_port       = 5432
    to_port         = 5432
    protocol        = "tcp"
    security_groups = [aws_security_group.eks_nodes.id]
  }
  
  tags = {
    Name = "${var.project_name}-rds-sg"
  }
}

resource "aws_db_instance" "postgres" {
  identifier = "${var.project_name}-postgres"
  
  engine         = "postgres"
  engine_version = "15.4"
  instance_class = "db.t3.medium"
  
  allocated_storage     = 100
  max_allocated_storage = 1000
  storage_type          = "gp2"
  storage_encrypted     = true
  
  db_name  = "app"
  username = "app_user"
  password = random_password.db_password.result
  
  db_subnet_group_name = aws_db_subnet_group.main.name
  vpc_security_group_ids = [aws_security_group.rds.id]
  
  backup_retention_period = 7
  backup_window          = "03:00-04:00"
  maintenance_window     = "sun:04:00-sun:05:00"
  
  skip_final_snapshot       = false
  final_snapshot_identifier = "${var.project_name}-postgres-final-snapshot"
  
  deletion_protection = true
  
  tags = {
    Name = "${var.project_name}-postgres"
  }
}

# Redis ElastiCache
resource "aws_elasticache_subnet_group" "main" {
  name       = "${var.project_name}-cache-subnet-group"
  subnet_ids = aws_subnet.private[*].id
  
  tags = {
    Name = "${var.project_name}-cache-subnet-group"
  }
}

resource "aws_security_group" "redis" {
  name        = "${var.project_name}-redis-sg"
  description = "Security group for Redis"
  vpc_id      = aws_vpc.main.id
  
  ingress {
    from_port       = 6379
    to_port         = 6379
    protocol        = "tcp"
    security_groups = [aws_security_group.eks_nodes.id]
  }
  
  tags = {
    Name = "${var.project_name}-redis-sg"
  }
}

resource "aws_elasticache_replication_group" "redis" {
  replication_group_id       = "${var.project_name}-redis"
  description                 = "Redis cluster for ${var.project_name}"
  
  node_type                   = "cache.t3.micro"
  port                        = 6379
  parameter_group_name        = "default.redis7"
  
  num_cache_clusters         = 2
  automatic_failover_enabled = true
  multi_az_enabled          = true
  
  subnet_group_name  = aws_elasticache_subnet_group.main.name
  security_group_ids = [aws_security_group.redis.id]
  
  at_rest_encryption_enabled = true
  transit_encryption_enabled = true
  auth_token                 = random_password.redis_auth_token.result
  
  snapshot_retention_limit = 7
  snapshot_window         = "05:00-06:00"
  maintenance_window      = "sun:06:00-sun:07:00"
  
  tags = {
    Name = "${var.project_name}-redis"
  }
}

# S3 Buckets
resource "aws_s3_bucket" "app_storage" {
  bucket = "${var.project_name}-storage-${random_string.bucket_suffix.result}"
  
  tags = {
    Name = "${var.project_name}-storage"
  }
}

resource "aws_s3_bucket_versioning" "app_storage" {
  bucket = aws_s3_bucket.app_storage.id
  versioning_configuration {
    status = "Enabled"
  }
}

resource "aws_s3_bucket_encryption" "app_storage" {
  bucket = aws_s3_bucket.app_storage.id
  
  server_side_encryption_configuration {
    rule {
      apply_server_side_encryption_by_default {
        sse_algorithm = "AES256"
      }
    }
  }
}

resource "aws_s3_bucket_public_access_block" "app_storage" {
  bucket = aws_s3_bucket.app_storage.id
  
  block_public_acls       = true
  block_public_policy     = true
  ignore_public_acls      = true
  restrict_public_buckets = true
}

# Random resources
resource "random_password" "db_password" {
  length           = 32
  special          = true
  override_special = "!#$%&*()-_=+[]{}<>:?"
}

resource "random_password" "redis_auth_token" {
  length           = 64
  special          = true
  override_special = "!#$%&*()-_=+[]{}<>:?"
}

resource "random_string" "bucket_suffix" {
  length  = 8
  special = false
  upper   = false
}

# Outputs
output "cluster_name" {
  description = "EKS cluster name"
  value       = aws_eks_cluster.main.name
}

output "cluster_endpoint" {
  description = "EKS cluster endpoint"
  value       = aws_eks_cluster.main.endpoint
}

output "cluster_certificate_authority_data" {
  description = "EKS cluster certificate authority data"
  value       = aws_eks_cluster.main.certificate_authority[0].data
}

output "database_endpoint" {
  description = "RDS database endpoint"
  value       = aws_db_instance.postgres.endpoint
  sensitive   = true
}

output "redis_endpoint" {
  description = "Redis endpoint"
  value       = aws_elasticache_replication_group.redis.primary_endpoint_address
  sensitive   = true
}

output "storage_bucket" {
  description = "S3 storage bucket name"
  value       = aws_s3_bucket.app_storage.bucket
}

5. Monitoring mit Prometheus und Grafana

# monitoring/prometheus.yml
global:
  scrape_interval: 15s
  evaluation_interval: 15s

rule_files:
  - "alert_rules.yml"

alerting:
  alertmanagers:
    - static_configs:
        - targets:
          - alertmanager:9093

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

  - job_name: 'kubernetes-apiservers'
    kubernetes_sd_configs:
      - role: endpoints
    scheme: https
    tls_config:
      ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
    relabel_configs:
      - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
        action: keep
        regex: default;kubernetes;https

  - job_name: 'kubernetes-nodes'
    kubernetes_sd_configs:
      - role: node
    scheme: https
    tls_config:
      ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
    relabel_configs:
      - action: labelmap
        regex: __meta_kubernetes_node_label_(.+)

  - job_name: 'kubernetes-pods'
    kubernetes_sd_configs:
      - role: pod
    relabel_configs:
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
        action: keep
        regex: true
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
        action: replace
        target_label: __metrics_path__
        regex: (.+)
      - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
        action: replace
        regex: ([^:]+)(?::\d+)?;(\d+)
        replacement: $1:$2
        target_label: __address__
      - action: labelmap
        regex: __meta_kubernetes_pod_label_(.+)
      - source_labels: [__meta_kubernetes_namespace]
        action: replace
        target_label: kubernetes_namespace
      - source_labels: [__meta_kubernetes_pod_name]
        action: replace
        target_label: kubernetes_pod_name

  - job_name: 'kubernetes-services'
    kubernetes_sd_configs:
      - role: service
    relabel_configs:
      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_probe]
        action: keep
        regex: true
      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
        action: replace
        target_label: __metrics_path__
        regex: (.+)
      - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
        action: replace
        regex: ([^:]+)(?::\d+)?;(\d+)
        replacement: $1:$2
        target_label: __address__
      - action: labelmap
        regex: __meta_kubernetes_service_label_(.+)
      - source_labels: [__meta_kubernetes_namespace]
        action: replace
        target_label: kubernetes_namespace
      - source_labels: [__meta_kubernetes_service_name]
        action: replace
        target_label: kubernetes_name

# monitoring/alert_rules.yml
groups:
  - name: kubernetes-apps
    rules:
      - alert: KubernetesPodCrashLooping
        expr: rate(kube_pod_container_status_restarts_total[15m]) > 0
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Pod {{ $labels.pod }} is crash looping"
          description: "Pod {{ $labels.pod }} in namespace {{ $labels.namespace }} is crash looping."

      - alert: KubernetesPodNotReady
        expr: kube_pod_status_ready{condition="true"} == 0
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "Pod {{ $labels.pod }} is not ready"
          description: "Pod {{ $labels.pod }} in namespace {{ $labels.namespace }} is not ready."

      - alert: KubernetesNodeNotReady
        expr: kube_node_status_condition{condition="Ready",status="true"} == 0
        for: 10m
        labels:
          severity: critical
        annotations:
          summary: "Node {{ $labels.node }} is not ready"
          description: "Node {{ $labels.node }} has been not ready for more than 10 minutes."

  - name: application
    rules:
      - alert: HighErrorRate
        expr: rate(http_requests_total{status=~"5.."}[5m]) / rate(http_requests_total[5m]) > 0.1
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "High error rate detected"
          description: "Error rate is {{ $value | humanizePercentage }} for {{ $labels.job }}."

      - alert: HighResponseTime
        expr: histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m])) > 1
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High response time detected"
          description: "95th percentile response time is {{ $value }}s for {{ $labels.job }}."

      - alert: LowThroughput
        expr: rate(http_requests_total[5m]) < 10
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "Low throughput detected"
          description: "Request rate is {{ $value }} requests/second for {{ $labels.job }}."

  - name: infrastructure
    rules:
      - alert: HighCPUUsage
        expr: 100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "High CPU usage on {{ $labels.instance }}"
          description: "CPU usage is {{ $value }}% on {{ $labels.instance }}."

      - alert: HighMemoryUsage
        expr: (node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes * 100 > 85
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "High memory usage on {{ $labels.instance }}"
          description: "Memory usage is {{ $value }}% on {{ $labels.instance }}."

      - alert: DiskSpaceLow
        expr: (node_filesystem_avail_bytes / node_filesystem_size_bytes) * 100 < 10
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Disk space low on {{ $labels.instance }}"
          description: "Disk space is {{ $value }}% available on {{ $labels.device }}."

# grafana/dashboards/app-dashboard.json
{
  "dashboard": {
    "id": null,
    "title": "Application Dashboard",
    "tags": ["app", "production"],
    "timezone": "browser",
    "panels": [
      {
        "id": 1,
        "title": "Request Rate",
        "type": "graph",
        "targets": [
          {
            "expr": "rate(http_requests_total[5m])",
            "legendFormat": "{{method}} {{status}}"
          }
        ],
        "yAxes": [
          {
            "label": "Requests/sec"
          }
        ]
      },
      {
        "id": 2,
        "title": "Response Time",
        "type": "graph",
        "targets": [
          {
            "expr": "histogram_quantile(0.50, rate(http_request_duration_seconds_bucket[5m]))",
            "legendFormat": "50th percentile"
          },
          {
            "expr": "histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))",
            "legendFormat": "95th percentile"
          },
          {
            "expr": "histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[5m]))",
            "legendFormat": "99th percentile"
          }
        ],
        "yAxes": [
          {
            "label": "Seconds"
          }
        ]
      },
      {
        "id": 3,
        "title": "Error Rate",
        "type": "graph",
        "targets": [
          {
            "expr": "rate(http_requests_total{status=~\"5..\"}[5m]) / rate(http_requests_total[5m])",
            "legendFormat": "Error Rate"
          }
        ],
        "yAxes": [
          {
            "label": "Percentage",
            "max": 1,
            "min": 0
          }
        ]
      },
      {
        "id": 4,
        "title": "Application Status",
        "type": "stat",
        "targets": [
          {
            "expr": "up{job=\"app\"}",
            "legendFormat": "Application Status"
          }
        ],
        "fieldConfig": {
          "defaults": {
            "mappings": [
              {
                "options": {
                  "0": {
                    "text": "DOWN",
                    "color": "red"
                  },
                  "1": {
                    "text": "UP",
                    "color": "green"
                  }
                },
                "type": "value"
              }
            ]
          }
        }
      }
    ],
    "time": {
      "from": "now-1h",
      "to": "now"
    },
    "refresh": "5s"
  }
}

DevOps Pipeline Architektur

CI/CD Pipeline Stages

graph TD
    A[Code Commit] --> B[Build Stage]
    B --> C[Test Stage]
    C --> D[Security Scan]
    D --> E[Package Stage]
    E --> F[Deploy Staging]
    F --> G[Integration Tests]
    G --> H[Approve Production]
    H --> I[Deploy Production]
    I --> J[Monitoring]
    J --> K[Rollback if needed]
    
    A1[Git Push] --> A
    B1[Docker Build] --> B
    C1[Unit Tests] --> C
    C2[Integration Tests] --> C
    D1[Vulnerability Scan] --> D
    E1[Image Registry] --> E
    F1[Kubernetes Deploy] --> F
    G1[E2E Tests] --> G
    H1[Manual Approval] --> H
    I1[Blue/Green Deploy] --> I
    J1[Prometheus/Grafana] --> J
    K1[Automated Rollback] --> K

Containerisierung Vergleich

Container Runtimes

Runtime	Sprache	Sicherheit	Performance	Anwendung
Docker	Go	Mittel	Gut	General Purpose
containerd	Go	Hoch	Sehr Gut	Production
CRI-O	Go	Hoch	Gut	Kubernetes
Podman	Go	Hoch	Gut	Daemonless

Orchestrierung Plattformen

Plattform	Komplexität	Skalierbarkeit	Cloud-Native	Anwendung
Kubernetes	Hoch	Sehr Hoch	Ja	Enterprise
Docker Swarm	Niedrig	Mittel	Teilweise	Small/Medium
OpenShift	Hoch	Sehr Hoch	Ja	Enterprise
Nomad	Mittel	Hoch	Ja	Multi-Cloud

Infrastructure as Code Tools

Terraform vs. CloudFormation vs. Pulumi

Tool	Sprache	Multi-Cloud	State Management	Anwendung
Terraform	HCL	Ja	Eigener State	Multi-Cloud
CloudFormation	YAML	Nein	AWS Managed	AWS-only
Pulumi	Verschiedene	Ja	Eigener State	Programmierbar
Ansible	YAML	Ja	Kein State	Configuration

IaC Best Practices

Modularisierung: Kleine, wiederverwendbare Module
Versionierung: Git-basierte Versionskontrolle
Testing: Automated Testing von Infrastruktur
Documentation: Automatisierte Dokumentation
Security: Security Scanning und Compliance

Monitoring und Observability

Observability Pillars

Pillar	Werkzeuge	Metriken	Anwendung
Metrics	Prometheus, InfluxDB	Numerische Daten	Performance
Logs	ELK Stack, Loki	Textuelle Daten	Troubleshooting
Traces	Jaeger, Zipkin	Request-Flows	Distributed Systems
Events	CloudWatch, EventBridge	Zustandsänderungen	Audit Trail

Alerting Strategien

Threshold-based: Statische Grenzwerte
Anomaly Detection: Automatische Anomalieerkennung
Predictive: Vorhersage von Problemen
Business Metrics: Geschäftsrelevante Metriken

Vorteile und Nachteile

Vorteile von DevOps

Schnellere Lieferung: Beschleunigte Softwareentwicklung
Höhere Qualität: Automatisierte Tests und Qualitätssicherung
Bessere Zusammenarbeit: Integration von Dev und Ops
Skalierbarkeit: Automatisierte Skalierung von Infrastruktur
Zuverlässigkeit: Konsistente und wiederholbare Deployments

Nachteile

Komplexität: Hohe initiale Komplexität
Kosten: Investition in Tools und Training
Kultureller Wandel: Erfordert Organisationsveränderungen
Lernkurve: Steile Lernkurve für Teams
Tool-overload: Viele verschiedene Werkzeuge

Häufige Prüfungsfragen

Was ist der Unterschied zwischen CI und CD? CI (Continuous Integration) automatisiert das Build und Testen von Code, CD (Continuous Deployment) automatisiert das Deployment in Produktion.
Erklären Sie Containerisierung mit Docker! Docker isoliert Anwendungen in Containern mit allen Abhängigkeiten, was konsistente Umgebungen über verschiedene Systeme hinweg gewährleistet.
Wann verwendet man Kubernetes vs. Docker Swarm? Kubernetes für komplexe, skalierbare Anwendungen in Enterprise-Umgebungen, Docker Swarm für einfachere Setups und kleine bis mittlere Unternehmen.
Was ist Infrastructure as Code? Infrastructure as Code ist die Praxis, Infrastruktur durch Code zu definieren und zu verwalten, was Automatisierung und Versionierung ermöglicht.

DevOps Grundlagen: CI/CD, Docker, Kubernetes, Automation, Monitoring & Infrastructure as Code

DevOps Grundlagen: CI/CD, Docker, Kubernetes, Automation, Monitoring & Infrastructure as Code

In a Nutshell

Kompakte Fachbeschreibung

Continuous Integration/Continuous Deployment (CI/CD)

Containerisierung

Orchestrierung

Infrastructure as Code (IaC)

Monitoring & Observability

Prüfungsrelevante Stichpunkte

Kernkomponenten

Praxisbeispiele

1. CI/CD Pipeline mit GitHub Actions

2. Docker Multi-Stage Build mit Best Practices

3. Kubernetes Deployment mit Helm und GitOps

4. Terraform Infrastructure as Code

5. Monitoring mit Prometheus und Grafana

DevOps Pipeline Architektur

CI/CD Pipeline Stages

Containerisierung Vergleich

Container Runtimes

Orchestrierung Plattformen

Infrastructure as Code Tools

Terraform vs. CloudFormation vs. Pulumi

IaC Best Practices

Monitoring und Observability

Observability Pillars

Alerting Strategien

Vorteile und Nachteile

Vorteile von DevOps

Nachteile

Häufige Prüfungsfragen

Wichtigste Quellen

Ähnliche Beiträge

DevOps und Continuous Integration/Delivery

Curl: Die vergessene Multiparadigmen-Programmiersprache für das World Wide Web

Cloud Native Entwicklung Grundlagen: Kubernetes, Docker, Containers, Serverless & Cloud Architecture

DevOps Grundlagen: CI/CD, Docker, Kubernetes, Automation, Monitoring & Infrastructure as Code

IRC-Mania Die Seite rund um Sicherheit und Programmierung

In a Nutshell

Kompakte Fachbeschreibung

Continuous Integration/Continuous Deployment (CI/CD)

Containerisierung

Orchestrierung

Infrastructure as Code (IaC)

Monitoring & Observability

Prüfungsrelevante Stichpunkte

Kernkomponenten

Praxisbeispiele

1. CI/CD Pipeline mit GitHub Actions

2. Docker Multi-Stage Build mit Best Practices

3. Kubernetes Deployment mit Helm und GitOps

IRC-Security.de

4. Terraform Infrastructure as Code

5. Monitoring mit Prometheus und Grafana

DevOps Pipeline Architektur

CI/CD Pipeline Stages

Containerisierung Vergleich

Container Runtimes

Orchestrierung Plattformen

Infrastructure as Code Tools

Terraform vs. CloudFormation vs. Pulumi

IaC Best Practices

Monitoring und Observability

Observability Pillars

Alerting Strategien

Vorteile und Nachteile

Vorteile von DevOps

Nachteile

Häufige Prüfungsfragen

Wichtigste Quellen

Premium VPS Hosting

Ähnliche Beiträge

DevOps und Continuous Integration/Delivery

Curl: Die vergessene Multiparadigmen-Programmiersprache für das World Wide Web

Cloud Native Entwicklung Grundlagen: Kubernetes, Docker, Containers, Serverless & Cloud Architecture