DevOps Fundamentals: CI/CD, Docker, Kubernetes, Automation, Monitoring & Infrastructure as Code

This article is a comprehensive introduction to DevOps fundamentals – including CI/CD, Docker, Kubernetes, Automation, Monitoring and Infrastructure as Code with practical examples.

In a Nutshell

DevOps is a culture and methodology that brings together software development (Dev) and IT operations (Ops) to automate and accelerate the software delivery chain.

Compact Technical Description

DevOps is an approach that overcomes the gap between development and operations through automation, collaboration and continuous improvement.

Core Components:

Continuous Integration/Continuous Deployment (CI/CD)

Version Control: Git, GitHub, GitLab, Bitbucket
Build Automation: Jenkins, GitHub Actions, GitLab CI
Testing: Unit Tests, Integration Tests, E2E Tests
Deployment: Automated Rollouts, Blue/Green, Canary

Containerization

Docker: Container platform for application isolation
Docker Compose: Multi-container applications
Container Registry: Docker Hub, Harbor, AWS ECR
Image Optimization: Multi-stage Builds, Layer Caching

Orchestration

Kubernetes: Container orchestration platform
Services: Pods, Deployments, Services, Ingress
Configuration: ConfigMaps, Secrets, Helm Charts
Scaling: Horizontal Pod Autoscaling, Cluster Autoscaling

Infrastructure as Code (IaC)

Terraform: Multi-cloud infrastructure provisioning
Ansible: Configuration management
CloudFormation: AWS-native IaC
Pulumi: Programmable infrastructure

Monitoring & Observability

Metrics: Prometheus, Grafana, InfluxDB
Logging: ELK Stack, Fluentd, Loki
Tracing: Jaeger, Zipkin, OpenTelemetry
APM: Application Performance Monitoring

Exam-Relevant Key Points

DevOps: Culture and methodology for software development and operations
CI/CD: Continuous Integration and Continuous Deployment
Docker: Container platform for application isolation
Kubernetes: Container orchestration platform
Infrastructure as Code: Automated infrastructure management
Monitoring: Monitoring of systems and applications
Automation: Automation of recurring tasks
GitOps: Git-based operations workflows
IHK-relevant: Modern DevOps practices and tools

Core Components

Version Control: Git workflows, branching strategies
CI/CD Pipeline: Build, Test, Deploy, Monitor
Containerization: Docker, container images, registry
Orchestration: Kubernetes, services, scaling
IaC: Terraform, Ansible, configuration management
Monitoring: Metrics, logging, tracing
Security: Scanning, compliance, secret management
Collaboration: Team workflows, communication

Practical Examples

1. CI/CD Pipeline with GitHub Actions

# .github/workflows/ci-cd.yml
name: CI/CD Pipeline

on:
  push:
    branches: [ main, develop ]
  pull_request:
    branches: [ main ]
  release:
    types: [ published ]

env:
  REGISTRY: ghcr.io
  IMAGE_NAME: ${{ github.repository }}
  NODE_VERSION: '18'
  PYTHON_VERSION: '3.11'

jobs:
  # Code Quality and Security
  quality:
    name: Code Quality & Security
    runs-on: ubuntu-latest
    
    steps:
    - name: Checkout code
      uses: actions/checkout@v4
      with:
        fetch-depth: 0
    
    - name: Setup Node.js
      uses: actions/setup-node@v4
      with:
        node-version: ${{ env.NODE_VERSION }}
        cache: 'npm'
    
    - name: Setup Python
      uses: actions/setup-python@v4
      with:
        python-version: ${{ env.PYTHON_VERSION }}
        cache: 'pip'
    
    - name: Install dependencies
      run: |
        npm ci
        pip install -r requirements.txt
        pip install -r requirements-dev.txt
    
    - name: Run ESLint
      run: npm run lint
    
    - name: Run Prettier check
      run: npm run format:check
    
    - name: Run Python linting
      run: |
        flake8 src/
        black --check src/
        isort --check-only src/
    
    - name: Run security scan
      run: |
        npm audit --audit-level moderate
        safety check
    
    - name: Run SonarCloud scan
      uses: SonarSource/sonarcloud-github-action@master
      env:
        GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
        SONAR_TOKEN: ${{ secrets.SONAR_TOKEN }}

  # Testing
  test:
    name: Test Suite
    runs-on: ubuntu-latest
    strategy:
      matrix:
        node-version: [16, 18, 20]
        python-version: [3.9, 3.11, 3.12]
    
    steps:
    - name: Checkout code
      uses: actions/checkout@v4
    
    - name: Setup Node.js ${{ matrix.node-version }}
      uses: actions/setup-node@v4
      with:
        node-version: ${{ matrix.node-version }}
        cache: 'npm'
    
    - name: Setup Python ${{ matrix.python-version }}
      uses: actions/setup-python@v4
      with:
        python-version: ${{ matrix.python-version }}
        cache: 'pip'
    
    - name: Install dependencies
      run: |
        npm ci
        pip install -r requirements.txt
        pip install -r requirements-test.txt
    
    - name: Run unit tests
      run: |
        npm run test:unit
        pytest tests/unit/ -v --cov=src --cov-report=xml
    
    - name: Run integration tests
      run: |
        npm run test:integration
        pytest tests/integration/ -v
    
    - name: Upload coverage to Codecov
      uses: codecov/codecov-action@v3
      with:
        file: ./coverage.xml
        flags: unittests
        name: codecov-umbrella

  # Build and Test Docker Image
  build:
    name: Build Docker Image
    runs-on: ubuntu-latest
    needs: [quality, test]
    
    steps:
    - name: Checkout code
      uses: actions/checkout@v4
    
    - name: Set up Docker Buildx
      uses: docker/setup-buildx-action@v3
    
    - name: Log in to Container Registry
      uses: docker/login-action@v3
      with:
        registry: ${{ env.REGISTRY }}
        username: ${{ github.actor }}
        password: ${{ secrets.GITHUB_TOKEN }}
    
    - name: Extract metadata
      id: meta
      uses: docker/metadata-action@v5
      with:
        images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
        tags: |
          type=ref,event=branch
          type=ref,event=pr
          type=sha,prefix={{branch}}-
          type=raw,value=latest,enable={{is_default_branch}}
    
    - name: Build and push Docker image
      uses: docker/build-push-action@v5
      with:
        context: .
        platforms: linux/amd64,linux/arm64
        push: true
        tags: ${{ steps.meta.outputs.tags }}
        labels: ${{ steps.meta.outputs.labels }}
        cache-from: type=gha
        cache-to: type=gha,mode=max
    
    - name: Run container security scan
      uses: aquasecurity/trivy-action@master
      with:
        image-ref: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ github.sha }}
        format: 'sarif'
        output: 'trivy-results.sarif'
    
    - name: Upload Trivy scan results to GitHub Security tab
      uses: github/codeql-action/upload-sarif@v2
      with:
        sarif_file: 'trivy-results.sarif'

  # Deploy to Staging
  deploy-staging:
    name: Deploy to Staging
    runs-on: ubuntu-latest
    needs: build
    if: github.ref == 'refs/heads/develop'
    environment: staging
    
    steps:
    - name: Checkout code
      uses: actions/checkout@v4
    
    - name: Setup kubectl
      uses: azure/setup-kubectl@v3
      with:
        version: 'v1.28.0'
    
    - name: Configure kubectl
      run: |
        echo "${{ secrets.KUBE_CONFIG_STAGING }}" | base64 -d > kubeconfig
        export KUBECONFIG=kubeconfig
    
    - name: Deploy to Kubernetes
      run: |
        export KUBECONFIG=kubeconfig
        helm upgrade --install app-staging ./helm/app \
          --namespace staging \
          --create-namespace \
          --set image.tag=${{ github.sha }} \
          --set environment=staging \
          --values helm/values-staging.yaml
    
    - name: Run smoke tests
      run: |
        export KUBECONFIG=kubeconfig
        kubectl wait --for=condition=ready pod -l app=app-staging -n staging --timeout=300s
        npm run test:smoke -- --env=staging
    
    - name: Run integration tests against staging
      run: |
        npm run test:integration -- --env=staging

  # Deploy to Production
  deploy-production:
    name: Deploy to Production
    runs-on: ubuntu-latest
    needs: build
    if: github.event_name == 'release'
    environment: production
    
    steps:
    - name: Checkout code
      uses: actions/checkout@v4
    
    - name: Setup kubectl
      uses: azure/setup-kubectl@v3
      with:
        version: 'v1.28.0'
    
    - name: Configure kubectl
      run: |
        echo "${{ secrets.KUBE_CONFIG_PRODUCTION }}" | base64 -d > kubeconfig
        export KUBECONFIG=kubeconfig
    
    - name: Deploy to Kubernetes (Blue/Green)
      run: |
        export KUBECONFIG=kubeconfig
        
        # Deploy to green environment
        helm upgrade --install app-green ./helm/app \
          --namespace production \
          --set image.tag=${{ github.sha }} \
          --set environment=production \
          --set deployment.color=green \
          --values helm/values-production.yaml
        
        # Wait for green deployment to be ready
        kubectl wait --for=condition=ready pod -l app=app-green,color=green -n production --timeout=600s
        
        # Run health checks
        npm run test:health -- --env=production-green
        
        # Switch traffic to green
        kubectl patch service app-production -n production -p '{"spec":{"selector":{"color":"green"}}}'
        
        # Wait for traffic switch
        sleep 30
        
        # Run final tests
        npm run test:smoke -- --env=production
    
    - name: Cleanup blue environment
      run: |
        export KUBECONFIG=kubeconfig
        helm uninstall app-blue -n production || true
        kubectl delete deployment app-blue -n production || true
    
    - name: Notify deployment
      uses: 8398a7/action-slack@v3
      with:
        status: ${{ job.status }}
        channel: '#deployments'
        webhook_url: ${{ secrets.SLACK_WEBHOOK }}
      if: always()

  # Performance Testing
  performance:
    name: Performance Testing
    runs-on: ubuntu-latest
    needs: deploy-staging
    if: github.ref == 'refs/heads/develop'
    
    steps:
    - name: Checkout code
      uses: actions/checkout@v4
    
    - name: Setup k6
      run: |
        sudo gpg -k
        sudo gpg --no-default-keyring --keyring /usr/share/keyrings/k6-archive-keyring.gpg --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys C5AD17C747E3415A3642D57D77C6C491D6AC1D69
        echo "deb [signed-by=/usr/share/keyrings/k6-archive-keyring.gpg] https://dl.k6.io/deb stable main" | sudo tee /etc/apt/sources.list.d/k6.list
        sudo apt-get update
        sudo apt-get install k6
    
    - name: Run performance tests
      run: |
        k6 run --out json=performance-results.json tests/performance/load-test.js
    
    - name: Upload performance results
      uses: actions/upload-artifact@v3
      with:
        name: performance-results
        path: performance-results.json
    
    - name: Analyze performance
      run: |
        npm run analyze:performance -- performance-results.json

  # Documentation
  docs:
    name: Build Documentation
    runs-on: ubuntu-latest
    needs: test
    
    steps:
    - name: Checkout code
      uses: actions/checkout@v4
    
    - name: Setup Node.js
      uses: actions/setup-node@v4
      with:
        node-version: ${{ env.NODE_VERSION }}
        cache: 'npm'
    
    - name: Install dependencies
      run: npm ci
    
    - name: Build documentation
      run: |
        npm run docs:build
        npm run docs:generate-api
    
    - name: Deploy to GitHub Pages
      uses: peaceiris/actions-gh-pages@v3
      if: github.ref == 'refs/heads/main'
      with:
        github_token: ${{ secrets.GITHUB_TOKEN }}
        publish_dir: ./docs/build

# Workflow for dependency updates
name: Dependency Updates

on:
  schedule:
    - cron: '0 2 * * 1'  # Every Monday at 2 AM
  workflow_dispatch:

jobs:
  update-dependencies:
    name: Update Dependencies
    runs-on: ubuntu-latest
    
    steps:
    - name: Checkout code
      uses: actions/checkout@v4
      with:
        token: ${{ secrets.GITHUB_TOKEN }}
    
    - name: Setup Node.js
      uses: actions/setup-node@v4
      with:
        node-version: ${{ env.NODE_VERSION }}
        cache: 'npm'
    
    - name: Setup Python
      uses: actions/setup-python@v4
      with:
        python-version: ${{ env.PYTHON_VERSION }}
        cache: 'pip'
    
    - name: Update Node.js dependencies
      run: |
        npm update
        npm audit fix
    
    - name: Update Python dependencies
      run: |
        pip-compile requirements.in
        pip-compile requirements-dev.in
    
    - name: Run tests
      run: |
        npm ci
        npm run test
        pip install -r requirements.txt
        pytest tests/
    
    - name: Create Pull Request
      uses: peter-evans/create-pull-request@v5
      with:
        token: ${{ secrets.GITHUB_TOKEN }}
        commit-message: 'chore: update dependencies'
        title: 'chore: update dependencies'
        body: |
          Automated dependency update
          
          - Updated Node.js dependencies
          - Updated Python dependencies
          
          Please review the changes and ensure all tests pass.
        branch: chore/update-dependencies
        delete-branch: true

2. Docker Multi-Stage Build with Best Practices

# Multi-stage Dockerfile for production-ready application
# Stage 1: Build stage
FROM node:18-alpine AS builder

# Set build arguments
ARG NODE_ENV=production
ARG APP_VERSION=1.0.0

# Set environment variables
ENV NODE_ENV=$NODE_ENV
ENV APP_VERSION=$APP_VERSION

# Install build dependencies
RUN apk add --no-cache \
    python3 \
    make \
    g++ \
    git

# Create app directory
WORKDIR /app

# Copy package files
COPY package*.json ./
COPY requirements.txt ./

# Install Node.js dependencies
RUN npm ci --only=production && npm cache clean --force

# Install Python dependencies
RUN pip install --no-cache-dir -r requirements.txt

# Copy source code
COPY . .

# Run build and tests
RUN npm run build
RUN npm run test

# Stage 2: Runtime stage
FROM python:3.11-slim AS runtime

# Set runtime arguments
ARG APP_USER=appuser
ARG APP_UID=1001
ARG APP_GID=1001

# Set environment variables
ENV NODE_ENV=production
ENV PYTHONUNBUFFERED=1
ENV PYTHONDONTWRITEBYTECODE=1
ENV APP_PORT=3000

# Install runtime dependencies
RUN apt-get update && apt-get install -y \
    curl \
    ca-certificates \
    && rm -rf /var/lib/apt/lists/*

# Create non-root user
RUN groupadd -g $APP_GID $APP_USER && \
    useradd -m -u $APP_UID -g $APP_GID -s /bin/bash $APP_USER

# Create app directory
WORKDIR /app

# Copy built application from builder stage
COPY --from=builder --chown=$APP_USER:$APP_GID /app/dist ./dist
COPY --from=builder --chown=$APP_USER:$APP_GID /app/node_modules ./node_modules
COPY --from=builder --chown=$APP_USER:$APP_GID /app/requirements.txt ./

# Install Python production dependencies
RUN pip install --no-cache-dir -r requirements.txt

# Copy configuration files
COPY --chown=$APP_USER:$APP_GID config/ ./config/
COPY --chown=$APP_USER:$APP_GID scripts/ ./scripts/

# Set permissions
RUN chmod +x scripts/*.sh

# Switch to non-root user
USER $APP_USER

# Health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
    CMD curl -f http://localhost:$APP_PORT/health || exit 1

# Expose port
EXPOSE $APP_PORT

# Set entrypoint
ENTRYPOINT ["./scripts/entrypoint.sh"]

# Default command
CMD ["npm", "start"]

# Stage 3: Development stage
FROM runtime AS development

# Override environment for development
ENV NODE_ENV=development

# Install development dependencies
RUN apt-get update && apt-get install -y \
    git \
    vim \
    && rm -rf /var/lib/apt/lists/*

# Install Node.js development dependencies
RUN npm install

# Switch back to root for development tools
USER root

# Install development tools
RUN pip install --no-cache-dir pytest pytest-cov black flake8

# Switch back to app user
USER $APP_USER

# Override command for development
CMD ["npm", "run", "dev"]

# Stage 4: Testing stage
FROM builder AS testing

# Install test dependencies
RUN npm install --no-save
RUN pip install --no-cache-dir pytest pytest-cov

# Run comprehensive tests
RUN npm run test:coverage
RUN pytest tests/ --cov=src --cov-report=xml

# Security scanning
RUN npm audit --audit-level high
RUN safety check

# Stage 5: Security scanning stage
FROM builder AS security

# Install security scanning tools
RUN npm install -g audit-ci
RUN pip install safety bandit

# Run security scans
RUN audit-ci --moderate
RUN safety check --json --output safety-report.json
RUN bandit -r src/ -f json -o bandit-report.json

# Export security reports
COPY --from=security /app/safety-report.json /reports/
COPY --from=security /app/bandit-report.json /reports/

3. Kubernetes Deployment with Helm and GitOps

# helm/app/Chart.yaml
apiVersion: v2
name: app
description: A Helm chart for deploying the application
type: application
version: 1.0.0
appVersion: "1.0.0"
home: https://github.com/organization/app
sources:
  - https://github.com/organization/app
maintainers:
  - name: DevOps Team
    email: devops@organization.com
keywords:
  - web
  - application
  - devops
annotations:
  category: WebApplication

# helm/app/values.yaml
# Default values for the application
replicaCount: 3

image:
  repository: ghcr.io/organization/app
  pullPolicy: IfNotPresent
  tag: "latest"

nameOverride: ""
fullnameOverride: ""

serviceAccount:
  create: true
  annotations: {}
  name: ""

podAnnotations: {}

podSecurityContext:
  fsGroup: 1001

securityContext:
  allowPrivilegeEscalation: false
  runAsNonRoot: true
  runAsUser: 1001
  runAsGroup: 1001
  capabilities:
    drop:
    - ALL
  readOnlyRootFilesystem: true

service:
  type: ClusterIP
  port: 80
  targetPort: 3000

ingress:
  enabled: true
  className: "nginx"
  annotations:
    cert-manager.io/cluster-issuer: "letsencrypt-prod"
    nginx.ingress.kubernetes.io/rate-limit: "100"
    nginx.ingress.kubernetes.io/rate-limit-window: "1m"
  hosts:
    - host: app.example.com
      paths:
        - path: /
          pathType: Prefix
  tls:
    - secretName: app-tls
      hosts:
        - app.example.com

resources:
  limits:
    cpu: 500m
    memory: 512Mi
  requests:
    cpu: 250m
    memory: 256Mi

autoscaling:
  enabled: true
  minReplicas: 3
  maxReplicas: 10
  targetCPUUtilizationPercentage: 70
  targetMemoryUtilizationPercentage: 80

nodeSelector: {}

tolerations: []

affinity:
  podAntiAffinity:
    preferredDuringSchedulingIgnoredDuringExecution:
    - weight: 100
      podAffinityTerm:
        labelSelector:
          matchExpressions:
          - key: app.kubernetes.io/name
            operator: In
            values:
            - app
        topologyKey: kubernetes.io/hostname

config:
  environment: production
  logLevel: info
  database:
    host: postgres.example.com
    port: 5432
    name: app_prod
  redis:
    host: redis.example.com
    port: 6379
  monitoring:
    enabled: true
    port: 9090

secrets:
  databasePassword: ""
  jwtSecret: ""
  apiKeys: ""

# helm/app/templates/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: {{ include "app.fullname" . }}
  labels:
    {{- include "app.labels" . | nindent 4 }}
spec:
  {{- if not .Values.autoscaling.enabled }}
  replicas: {{ .Values.replicaCount }}
  {{- end }}
  selector:
    matchLabels:
      {{- include "app.selectorLabels" . | nindent 6 }}
  template:
    metadata:
      annotations:
        checksum/config: {{ include (print $.Template.BasePath "/configmap.yaml") . | sha256sum }}
        checksum/secret: {{ include (print $.Template.BasePath "/secret.yaml") . | sha256sum }}
        {{- with .Values.podAnnotations }}
        {{- toYaml . | nindent 8 }}
        {{- end }}
      labels:
        {{- include "app.selectorLabels" . | nindent 8 }}
    spec:
      {{- with .Values.imagePullSecrets }}
      imagePullSecrets:
        {{- toYaml . | nindent 8 }}
      {{- end }}
      serviceAccountName: {{ include "app.serviceAccountName" . }}
      securityContext:
        {{- toYaml .Values.podSecurityContext | nindent 8 }}
      initContainers:
        - name: wait-for-db
          image: postgres:15-alpine
          command:
            - sh
            - -c
            - |
              until pg_isready -h {{ .Values.config.database.host }} -p {{ .Values.config.database.port }}; do
                echo "Waiting for database..."
                sleep 2
              done
        - name: migrate-db
          image: "{{ .Values.image.repository }}:{{ .Values.image.tag | default .Chart.AppVersion }}"
          command:
            - npm
            - run
            - migrate
          envFrom:
            - configMapRef:
                name: {{ include "app.fullname" . }}
            - secretRef:
                name: {{ include "app.fullname" . }}
      containers:
        - name: {{ .Chart.Name }}
          securityContext:
            {{- toYaml .Values.securityContext | nindent 12 }}
          image: "{{ .Values.image.repository }}:{{ .Values.image.tag | default .Chart.AppVersion }}"
          imagePullPolicy: {{ .Values.image.pullPolicy }}
          ports:
            - name: http
              containerPort: {{ .Values.service.targetPort }}
              protocol: TCP
            - name: metrics
              containerPort: {{ .Values.config.monitoring.port }}
              protocol: TCP
          livenessProbe:
            httpGet:
              path: /health
              port: http
            initialDelaySeconds: 30
            periodSeconds: 10
            timeoutSeconds: 5
            failureThreshold: 3
          readinessProbe:
            httpGet:
              path: /ready
              port: http
            initialDelaySeconds: 5
            periodSeconds: 5
            timeoutSeconds: 3
            failureThreshold: 3
          resources:
            {{- toYaml .Values.resources | nindent 12 }}
          envFrom:
            - configMapRef:
                name: {{ include "app.fullname" . }}
            - secretRef:
                name: {{ include "app.fullname" . }}
          volumeMounts:
            - name: tmp
              mountPath: /tmp
            - name: config
              mountPath: /app/config
              readOnly: true
        - name: log-shipper
          image: fluent/fluent-bit:2.0
          resources:
            limits:
              cpu: 100m
              memory: 128Mi
            requests:
              cpu: 50m
              memory: 64Mi
          volumeMounts:
            - name: varlog
              mountPath: /var/log
              readOnly: true
            - name: varlibdockercontainers
              mountPath: /var/lib/docker/containers
              readOnly: true
            - name: fluent-bit-config
              mountPath: /fluent-bit/etc/
      volumes:
        - name: tmp
          emptyDir: {}
        - name: config
          configMap:
            name: {{ include "app.fullname" . }}
        - name: varlog
          hostPath:
            path: /var/log
        - name: varlibdockercontainers
          hostPath:
            path: /var/lib/docker/containers
        - name: fluent-bit-config
          configMap:
            name: fluent-bit-config
      {{- with .Values.nodeSelector }}
      nodeSelector:
        {{- toYaml . | nindent 8 }}
      {{- end }}
      {{- with .Values.affinity }}
      affinity:
        {{- toYaml . | nindent 8 }}
      {{- end }}
      {{- with .Values.tolerations }}
      tolerations:
        {{- toYaml . | nindent 8 }}
      {{- end }}

# helm/app/templates/hpa.yaml
{{- if .Values.autoscaling.enabled }}
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: {{ include "app.fullname" . }}
  labels:
    {{- include "app.labels" . | nindent 4 }}
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: {{ include "app.fullname" . }}
  minReplicas: {{ .Values.autoscaling.minReplicas }}
  maxReplicas: {{ .Values.autoscaling.maxReplicas }}
  metrics:
    {{- if .Values.autoscaling.targetCPUUtilizationPercentage }}
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: {{ .Values.autoscaling.targetCPUUtilizationPercentage }}
    {{- end }}
    {{- if .Values.autoscaling.targetMemoryUtilizationPercentage }}
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: {{ .Values.autoscaling.targetMemoryUtilizationPercentage }}
    {{- end }}
{{- end }}

# helm/app/templates/monitoring.yaml
{{- if .Values.config.monitoring.enabled }}
apiVersion: v1
kind: Service
metadata:
  name: {{ include "app.fullname" . }}-metrics
  labels:
    {{- include "app.labels" . | nindent 4 }}
spec:
  type: ClusterIP
  ports:
    - port: {{ .Values.config.monitoring.port }}
      targetPort: metrics
      protocol: TCP
      name: metrics
  selector:
    {{- include "app.selectorLabels" . | nindent 4 }}

---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: {{ include "app.fullname" . }}
  labels:
    {{- include "app.labels" . | nindent 4 }}
spec:
  selector:
    matchLabels:
      {{- include "app.selectorLabels" . | nindent 6 }}
  endpoints:
    - port: metrics
      interval: 30s
      path: /metrics
{{- end }}

# GitOps Application Manifest (ArgoCD)
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: app-production
  namespace: argocd
spec:
  project: default
  source:
    repoURL: https://github.com/organization/app-helm
    targetRevision: HEAD
    path: helm/app
    helm:
      valueFiles:
        - values-production.yaml
  destination:
    server: https://kubernetes.default.svc
    namespace: production
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
    syncOptions:
      - CreateNamespace=true
    retry:
      limit: 5
      backoff:
        duration: 5s
        factor: 2
        maxDuration: 3m

4. Terraform Infrastructure as Code

# terraform/main.tf
provider "aws" {
  region = var.aws_region
  
  default_tags {
    tags = {
      Environment = var.environment
      Project     = var.project_name
      ManagedBy   = "terraform"
    }
  }
}

# Terraform backend configuration
terraform {
  backend "s3" {
    bucket         = "terraform-state-${var.project_name}"
    key            = "infrastructure/terraform.tfstate"
    region         = "us-east-1"
    encrypt        = true
    dynamodb_table = "terraform-locks-${var.project_name}"
  }
  
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
    
    kubernetes = {
      source  = "hashicorp/kubernetes"
      version = "~> 2.0"
    }
    
    helm = {
      source  = "hashicorp/helm"
      version = "~> 2.0"
    }
    
    random = {
      source  = "hashicorp/random"
      version = "~> 3.0"
    }
    
    null = {
      source  = "hashicorp/null"
      version = "~> 3.0"
    }
  }
}

# terraform/variables.tf
variable "aws_region" {
  description = "AWS region"
  type        = string
  default     = "us-east-1"
}

variable "environment" {
  description = "Environment name"
  type        = string
  default     = "production"
}

variable "project_name" {
  description = "Project name"
  type        = string
  default     = "my-app"
}

variable "vpc_cidr" {
  description = "CIDR block for VPC"
  type        = string
  default     = "10.0.0.0/16"
}

variable "availability_zones" {
  description = "List of availability zones"
  type        = list(string)
  default     = ["us-east-1a", "us-east-1b", "us-east-1c"]
}

variable "cluster_name" {
  description = "EKS cluster name"
  type        = string
  default     = "my-app-cluster"
}

variable "cluster_version" {
  description = "EKS cluster version"
  type        = string
  default     = "1.28"
}

variable "node_groups" {
  description = "EKS node groups configuration"
  type = map(object({
    instance_type = string
    min_size      = number
    max_size      = number
    desired_size  = number
    disk_size     = number
  }))
  
  default = {
    general = {
      instance_type = "t3.medium"
      min_size      = 3
      max_size      = 10
      desired_size  = 3
      disk_size     = 50
    }
    
    compute = {
      instance_type = "c5.large"
      min_size      = 2
      max_size      = 5
      desired_size  = 2
      disk_size     = 100
    }
  }
}

# terraform/vpc.tf
# VPC Configuration
resource "aws_vpc" "main" {
  cidr_block           = var.vpc_cidr
  enable_dns_hostnames = true
  enable_dns_support   = true
  
  tags = {
    Name = "${var.project_name}-vpc"
  }
}

# Internet Gateway
resource "aws_internet_gateway" "main" {
  vpc_id = aws_vpc.main.id
  
  tags = {
    Name = "${var.project_name}-igw"
  }
}

# Public Subnets
resource "aws_subnet" "public" {
  count = length(var.availability_zones)
  
  vpc_id                  = aws_vpc.main.id
  cidr_block              = cidrsubnet(var.vpc_cidr, 8, count.index)
  availability_zone       = var.availability_zones[count.index]
  map_public_ip_on_launch = true
  
  tags = {
    Name = "${var.project_name}-public-${count.index}"
    Type = "Public"
  }
}

# Private Subnets
resource "aws_subnet" "private" {
  count = length(var.availability_zones)
  
  vpc_id            = aws_vpc.main.id
  cidr_block        = cidrsubnet(var.vpc_cidr, 8, count.index + 3)
  availability_zone = var.availability_zones[count.index]
  
  tags = {
    Name = "${var.project_name}-private-${count.index}"
    Type = "Private"
  }
}

# Database Subnets
resource "aws_subnet" "database" {
  count = length(var.availability_zones)
  
  vpc_id            = aws_vpc.main.id
  cidr_block        = cidrsubnet(var.vpc_cidr, 8, count.index + 6)
  availability_zone = var.availability_zones[count.index]
  
  tags = {
    Name = "${var.project_name}-database-${count.index}"
    Type = "Database"
  }
}

# Route Tables
resource "aws_route_table" "public" {
  vpc_id = aws_vpc.main.id
  
  route {
    cidr_block = "0.0.0.0/0"
    gateway_id = aws_internet_gateway.main.id
  }
  
  tags = {
    Name = "${var.project_name}-public-rt"
  }
}

resource "aws_route_table_association" "public" {
  count = length(aws_subnet.public)
  
  subnet_id      = aws_subnet.public[count.index].id
  route_table_id = aws_route_table.public.id
}

# EKS Cluster
resource "aws_eks_cluster" "main" {
  name     = var.cluster_name
  role_arn = aws_iam_role.eks_cluster.arn
  version  = var.cluster_version
  
  vpc_config {
    subnet_ids = concat(
      aws_subnet.public[*].id,
      aws_subnet.private[*].id
    )
    
    endpoint_public_access  = true
    endpoint_private_access = true
    
    public_access_cidrs = ["0.0.0.0/0"]
  }
  
  depends_on = [
    aws_iam_role_policy_attachment.eks_cluster_policy,
  ]
  
  tags = {
    Name = var.cluster_name
  }
}

# EKS Node Groups
resource "aws_eks_node_group" "main" {
  for_each = var.node_groups
  
  cluster_name    = aws_eks_cluster.main.name
  node_group_name = each.key
  node_role_arn   = aws_iam_role.eks_node.arn
  
  subnet_ids = aws_subnet.private[*].id
  
  scaling_config {
    desired_size = each.value.desired_size
    max_size     = each.value.max_size
    min_size     = each.value.min_size
  }
  
  instance_types = [each.value.instance_type]
  disk_size      = each.value.disk_size
  
  remote_access {
    ec2_ssh_key               = aws_key_pair.main.key_name
    source_security_group_ids = [aws_security_group.eks_nodes.id]
  }
  
  depends_on = [
    aws_iam_role_policy_attachment.eks_worker_node_policy,
    aws_iam_role_policy_attachment.eks_cni_policy,
    aws_iam_role_policy_attachment.eks_container_registry_policy,
  ]
  
  tags = {
    Name = "${var.cluster_name}-${each.key}"
    Type = each.key
  }
}

# IAM Roles
resource "aws_iam_role" "eks_cluster" {
  name = "${var.project_name}-eks-cluster-role"
  
  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Action = "sts:AssumeRole"
        Effect = "Allow"
        Principal = {
          Service = "eks.amazonaws.com"
        }
      }
    ]
  })
}

resource "aws_iam_role_policy_attachment" "eks_cluster_policy" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonEKSClusterPolicy"
  role       = aws_iam_role.eks_cluster.name
}

resource "aws_iam_role" "eks_node" {
  name = "${var.project_name}-eks-node-role"
  
  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Action = "sts:AssumeRole"
        Effect = "Allow"
        Principal = {
          Service = "ec2.amazonaws.com"
        }
      }
    ]
  })
}

resource "aws_iam_role_policy_attachment" "eks_worker_node_policy" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy"
  role       = aws_iam_role.eks_node.name
}

resource "aws_iam_role_policy_attachment" "eks_cni_policy" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy"
  role       = aws_iam_role.eks_node.name
}

resource "aws_iam_role_policy_attachment" "eks_container_registry_policy" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly"
  role       = aws_iam_role.eks_node.name
}

# Security Groups
resource "aws_security_group" "eks_cluster" {
  name        = "${var.project_name}-eks-cluster-sg"
  description = "Security group for EKS cluster"
  vpc_id      = aws_vpc.main.id
  
  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }
  
  tags = {
    Name = "${var.project_name}-eks-cluster-sg"
  }
}

resource "aws_security_group" "eks_nodes" {
  name        = "${var.project_name}-eks-nodes-sg"
  description = "Security group for EKS nodes"
  vpc_id      = aws_vpc.main.id
  
  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }
  
  tags = {
    Name = "${var.project_name}-eks-nodes-sg"
  }
}

# RDS Database
resource "aws_db_subnet_group" "main" {
  name       = "${var.project_name}-db-subnet-group"
  subnet_ids = aws_subnet.database[*].id
  
  tags = {
    Name = "${var.project_name}-db-subnet-group"
  }
}

resource "aws_security_group" "rds" {
  name        = "${var.project_name}-rds-sg"
  description = "Security group for RDS database"
  vpc_id      = aws_vpc.main.id
  
  ingress {
    from_port       = 5432
    to_port         = 5432
    protocol        = "tcp"
    security_groups = [aws_security_group.eks_nodes.id]
  }
  
  tags = {
    Name = "${var.project_name}-rds-sg"
  }
}

resource "aws_db_instance" "postgres" {
  identifier = "${var.project_name}-postgres"
  
  engine         = "postgres"
  engine_version = "15.4"
  instance_class = "db.t3.medium"
  
  allocated_storage     = 100
  max_allocated_storage = 1000
  storage_type          = "gp2"
  storage_encrypted     = true
  
  db_name  = "app"
  username = "app_user"
  password = random_password.db_password.result
  
  db_subnet_group_name = aws_db_subnet_group.main.name
  vpc_security_group_ids = [aws_security_group.rds.id]
  
  backup_retention_period = 7
  backup_window          = "03:00-04:00"
  maintenance_window     = "sun:04:00-sun:05:00"
  
  skip_final_snapshot       = false
  final_snapshot_identifier = "${var.project_name}-postgres-final-snapshot"
  
  deletion_protection = true
  
  tags = {
    Name = "${var.project_name}-postgres"
  }
}

# Redis ElastiCache
resource "aws_elasticache_subnet_group" "main" {
  name       = "${var.project_name}-cache-subnet-group"
  subnet_ids = aws_subnet.private[*].id
  
  tags = {
    Name = "${var.project_name}-cache-subnet-group"
  }
}

resource "aws_security_group" "redis" {
  name        = "${var.project_name}-redis-sg"
  description = "Security group for Redis"
  vpc_id      = aws_vpc.main.id
  
  ingress {
    from_port       = 6379
    to_port         = 6379
    protocol        = "tcp"
    security_groups = [aws_security_group.eks_nodes.id]
  }
  
  tags = {
    Name = "${var.project_name}-redis-sg"
  }
}

resource "aws_elasticache_replication_group" "redis" {
  replication_group_id       = "${var.project_name}-redis"
  description                 = "Redis cluster for ${var.project_name}"
  
  node_type                   = "cache.t3.micro"
  port                        = 6379
  parameter_group_name        = "default.redis7"
  
  num_cache_clusters         = 2
  automatic_failover_enabled = true
  multi_az_enabled          = true
  
  subnet_group_name  = aws_elasticache_subnet_group.main.name
  security_group_ids = [aws_security_group.redis.id]
  
  at_rest_encryption_enabled = true
  transit_encryption_enabled = true
  auth_token                 = random_password.redis_auth_token.result
  
  snapshot_retention_limit = 7
  snapshot_window         = "05:00-06:00"
  maintenance_window      = "sun:06:00-sun:07:00"
  
  tags = {
    Name = "${var.project_name}-redis"
  }
}

# S3 Buckets
resource "aws_s3_bucket" "app_storage" {
  bucket = "${var.project_name}-storage-${random_string.bucket_suffix.result}"
  
  tags = {
    Name = "${var.project_name}-storage"
  }
}

resource "aws_s3_bucket_versioning" "app_storage" {
  bucket = aws_s3_bucket.app_storage.id
  versioning_configuration {
    status = "Enabled"
  }
}

resource "aws_s3_bucket_encryption" "app_storage" {
  bucket = aws_s3_bucket.app_storage.id
  
  server_side_encryption_configuration {
    rule {
      apply_server_side_encryption_by_default {
        sse_algorithm = "AES256"
      }
    }
  }
}

resource "aws_s3_bucket_public_access_block" "app_storage" {
  bucket = aws_s3_bucket.app_storage.id
  
  block_public_acls       = true
  block_public_policy     = true
  ignore_public_acls      = true
  restrict_public_buckets = true
}

# Random resources
resource "random_password" "db_password" {
  length           = 32
  special          = true
  override_special = "!#$%&*()-_=+[]{}<>:?"
}

resource "random_password" "redis_auth_token" {
  length           = 64
  special          = true
  override_special = "!#$%&*()-_=+[]{}<>:?"
}

resource "random_string" "bucket_suffix" {
  length  = 8
  special = false
  upper   = false
}

# Outputs
output "cluster_name" {
  description = "EKS cluster name"
  value       = aws_eks_cluster.main.name
}

output "cluster_endpoint" {
  description = "EKS cluster endpoint"
  value       = aws_eks_cluster.main.endpoint
}

output "cluster_certificate_authority_data" {
  description = "EKS cluster certificate authority data"
  value       = aws_eks_cluster.main.certificate_authority[0].data
}

output "database_endpoint" {
  description = "RDS database endpoint"
  value       = aws_db_instance.postgres.endpoint
  sensitive   = true
}

output "redis_endpoint" {
  description = "Redis endpoint"
  value       = aws_elasticache_replication_group.redis.primary_endpoint_address
  sensitive   = true
}

output "storage_bucket" {
  description = "S3 storage bucket name"
  value       = aws_s3_bucket.app_storage.bucket
}

5. Monitoring with Prometheus and Grafana

# monitoring/prometheus.yml
global:
  scrape_interval: 15s
  evaluation_interval: 15s

rule_files:
  - "alert_rules.yml"

alerting:
  alertmanagers:
    - static_configs:
        - targets:
          - alertmanager:9093

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

  - job_name: 'kubernetes-apiservers'
    kubernetes_sd_configs:
      - role: endpoints
    scheme: https
    tls_config:
      ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
    relabel_configs:
      - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
        action: keep
        regex: default;kubernetes;https

  - job_name: 'kubernetes-nodes'
    kubernetes_sd_configs:
      - role: node
    scheme: https
    tls_config:
      ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
    relabel_configs:
      - action: labelmap
        regex: __meta_kubernetes_node_label_(.+)

  - job_name: 'kubernetes-pods'
    kubernetes_sd_configs:
      - role: pod
    relabel_configs:
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
        action: keep
        regex: true
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
        action: replace
        target_label: __metrics_path__
        regex: (.+)
      - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
        action: replace
        regex: ([^:]+)(?::\d+)?;(\d+)
        replacement: $1:$2
        target_label: __address__
      - action: labelmap
        regex: __meta_kubernetes_pod_label_(.+)
      - source_labels: [__meta_kubernetes_namespace]
        action: replace
        target_label: kubernetes_namespace
      - source_labels: [__meta_kubernetes_pod_name]
        action: replace
        target_label: kubernetes_pod_name

  - job_name: 'kubernetes-services'
    kubernetes_sd_configs:
      - role: service
    relabel_configs:
      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_probe]
        action: keep
        regex: true
      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
        action: replace
        target_label: __metrics_path__
        regex: (.+)
      - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
        action: replace
        regex: ([^:]+)(?::\d+)?;(\d+)
        replacement: $1:$2
        target_label: __address__
      - action: labelmap
        regex: __meta_kubernetes_service_label_(.+)
      - source_labels: [__meta_kubernetes_namespace]
        action: replace
        target_label: kubernetes_namespace
      - source_labels: [__meta_kubernetes_service_name]
        action: replace
        target_label: kubernetes_name

# monitoring/alert_rules.yml
groups:
  - name: kubernetes-apps
    rules:
      - alert: KubernetesPodCrashLooping
        expr: rate(kube_pod_container_status_restarts_total[15m]) > 0
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Pod {{ $labels.pod }} is crash looping"
          description: "Pod {{ $labels.pod }} in namespace {{ $labels.namespace }} is crash looping."

      - alert: KubernetesPodNotReady
        expr: kube_pod_status_ready{condition="true"} == 0
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "Pod {{ $labels.pod }} is not ready"
          description: "Pod {{ $labels.pod }} in namespace {{ $labels.namespace }} is not ready."

      - alert: KubernetesNodeNotReady
        expr: kube_node_status_condition{condition="Ready",status="true"} == 0
        for: 10m
        labels:
          severity: critical
        annotations:
          summary: "Node {{ $labels.node }} is not ready"
          description: "Node {{ $labels.node }} has been not ready for more than 10 minutes."

  - name: application
    rules:
      - alert: HighErrorRate
        expr: rate(http_requests_total{status=~"5.."}[5m]) / rate(http_requests_total[5m]) > 0.1
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "High error rate detected"
          description: "Error rate is {{ $value | humanizePercentage }} for {{ $labels.job }}."

      - alert: HighResponseTime
        expr: histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m])) > 1
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High response time detected"
          description: "95th percentile response time is {{ $value }}s for {{ $labels.job }}."

      - alert: LowThroughput
        expr: rate(http_requests_total[5m]) < 10
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "Low throughput detected"
          description: "Request rate is {{ $value }} requests/second for {{ $labels.job }}."

  - name: infrastructure
    rules:
      - alert: HighCPUUsage
        expr: 100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "High CPU usage on {{ $labels.instance }}"
          description: "CPU usage is {{ $value }}% on {{ $labels.instance }}."

      - alert: HighMemoryUsage
        expr: (node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes * 100 > 85
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "High memory usage on {{ $labels.instance }}"
          description: "Memory usage is {{ $value }}% on {{ $labels.instance }}."

      - alert: DiskSpaceLow
        expr: (node_filesystem_avail_bytes / node_filesystem_size_bytes) * 100 < 10
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Disk space low on {{ $labels.instance }}"
          description: "Disk space is {{ $value }}% available on {{ $labels.device }}."

# grafana/dashboards/app-dashboard.json
{
  "dashboard": {
    "id": null,
    "title": "Application Dashboard",
    "tags": ["app", "production"],
    "timezone": "browser",
    "panels": [
      {
        "id": 1,
        "title": "Request Rate",
        "type": "graph",
        "targets": [
          {
            "expr": "rate(http_requests_total[5m])",
            "legendFormat": "{{method}} {{status}}"
          }
        ],
        "yAxes": [
          {
            "label": "Requests/sec"
          }
        ]
      },
      {
        "id": 2,
        "title": "Response Time",
        "type": "graph",
        "targets": [
          {
            "expr": "histogram_quantile(0.50, rate(http_request_duration_seconds_bucket[5m]))",
            "legendFormat": "50th percentile"
          },
          {
            "expr": "histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))",
            "legendFormat": "95th percentile"
          },
          {
            "expr": "histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[5m]))",
            "legendFormat": "99th percentile"
          }
        ],
        "yAxes": [
          {
            "label": "Seconds"
          }
        ]
      },
      {
        "id": 3,
        "title": "Error Rate",
        "type": "graph",
        "targets": [
          {
            "expr": "rate(http_requests_total{status=~\"5..\"}[5m]) / rate(http_requests_total[5m])",
            "legendFormat": "Error Rate"
          }
        ],
        "yAxes": [
          {
            "label": "Percentage",
            "max": 1,
            "min": 0
          }
        ]
      },
      {
        "id": 4,
        "title": "Application Status",
        "type": "stat",
        "targets": [
          {
            "expr": "up{job=\"app\"}",
            "legendFormat": "Application Status"
          }
        ],
        "fieldConfig": {
          "defaults": {
            "mappings": [
              {
                "options": {
                  "0": {
                    "text": "DOWN",
                    "color": "red"
                  },
                  "1": {
                    "text": "UP",
                    "color": "green"
                  }
                },
                "type": "value"
              }
            ]
          }
        }
      }
    ],
    "time": {
      "from": "now-1h",
      "to": "now"
    },
    "refresh": "5s"
  }
}

DevOps Pipeline Architecture

CI/CD Pipeline Stages

graph TD
    A[Code Commit] --> B[Build Stage]
    B --> C[Test Stage]
    C --> D[Security Scan]
    D --> E[Package Stage]
    E --> F[Deploy Staging]
    F --> G[Integration Tests]
    G --> H[Approve Production]
    H --> I[Deploy Production]
    I --> J[Monitoring]
    J --> K[Rollback if needed]
    
    A1[Git Push] --> A
    B1[Docker Build] --> B
    C1[Unit Tests] --> C
    C2[Integration Tests] --> C
    D1[Vulnerability Scan] --> D
    E1[Image Registry] --> E
    F1[Kubernetes Deploy] --> F
    G1[E2E Tests] --> G
    H1[Manual Approval] --> H
    I1[Blue/Green Deploy] --> I
    J1[Prometheus/Grafana] --> J
    K1[Automated Rollback] --> K

Containerization Comparison

Container Runtimes

Runtime	Language	Security	Performance	Use Case
Docker	Go	Medium	Good	General Purpose
containerd	Go	High	Very Good	Production
CRI-O	Go	High	Good	Kubernetes
Podman	Go	High	Good	Daemonless

Orchestration Platforms

Platform	Complexity	Scalability	Cloud-Native	Use Case
Kubernetes	High	Very High	Yes	Enterprise
Docker Swarm	Low	Medium	Partial	Small/Medium
OpenShift	High	Very High	Yes	Enterprise
Nomad	Medium	High	Yes	Multi-Cloud

Infrastructure as Code Tools

Terraform vs. CloudFormation vs. Pulumi

Tool	Language	Multi-Cloud	State Management	Use Case
Terraform	HCL	Yes	Custom State	Multi-Cloud
CloudFormation	YAML	No	AWS Managed	AWS-only
Pulumi	Various	Yes	Custom State	Programmable
Ansible	YAML	Yes	No State	Configuration

IaC Best Practices

Modularity: Small, reusable modules
Versioning: Git-based version control
Testing: Automated infrastructure testing
Documentation: Automated documentation
Security: Security scanning and compliance

Monitoring and Observability

Observability Pillars

Pillar	Tools	Metrics	Use Case
Metrics	Prometheus, InfluxDB	Numerical data	Performance
Logs	ELK Stack, Loki	Textual data	Troubleshooting
Traces	Jaeger, Zipkin	Request flows	Distributed Systems
Events	CloudWatch, EventBridge	State changes	Audit Trail

Alerting Strategies

Threshold-based: Static thresholds
Anomaly Detection: Automatic anomaly detection
Predictive: Problem prediction
Business Metrics: Business-relevant metrics

Advantages and Disadvantages

Benefits of DevOps

Faster Delivery: Accelerated software development
Higher Quality: Automated testing and quality assurance
Better Collaboration: Integration of Dev and Ops
Scalability: Automated infrastructure scaling
Reliability: Consistent and repeatable deployments

Disadvantages

Complexity: High initial complexity
Costs: Investment in tools and training
Cultural Change: Requires organizational changes
Learning Curve: Steep learning curve for teams
Tool Overload: Many different tools

Common Exam Questions

What is the difference between CI and CD? CI (Continuous Integration) automates code build and testing, CD (Continuous Deployment) automates deployment to production.
Explain containerization with Docker! Docker isolates applications in containers with all dependencies, ensuring consistent environments across different systems.
When do you use Kubernetes vs. Docker Swarm? Kubernetes for complex, scalable applications in enterprise environments, Docker Swarm for simpler setups and small to medium-sized companies.
What is Infrastructure as Code? Infrastructure as Code is the practice of defining and managing infrastructure through code, enabling automation and versioning.

DevOps Fundamentals: CI/CD, Docker & Kubernetes

DevOps Fundamentals: CI/CD, Docker, Kubernetes, Automation, Monitoring & Infrastructure as Code

In a Nutshell

Compact Technical Description

Continuous Integration/Continuous Deployment (CI/CD)

Containerization

Orchestration

Infrastructure as Code (IaC)

Monitoring & Observability

Exam-Relevant Key Points

Core Components

Practical Examples

1. CI/CD Pipeline with GitHub Actions

2. Docker Multi-Stage Build with Best Practices

3. Kubernetes Deployment with Helm and GitOps

4. Terraform Infrastructure as Code

5. Monitoring with Prometheus and Grafana

DevOps Pipeline Architecture

CI/CD Pipeline Stages

Containerization Comparison

Container Runtimes

Orchestration Platforms

Infrastructure as Code Tools

Terraform vs. CloudFormation vs. Pulumi

IaC Best Practices

Monitoring and Observability

Observability Pillars

Alerting Strategies

Advantages and Disadvantages

Benefits of DevOps

Disadvantages

Common Exam Questions

Important Sources

Related Posts

Cloud Native Development Fundamentals

Docker Containers: Images, Dockerfile & Compose

TDD & CI/CD Explained: Red-Green-Refactor

DevOps Fundamentals: CI/CD, Docker, Kubernetes, Automation, Monitoring & Infrastructure as Code

IRC-Mania Die Seite rund um Sicherheit und Programmierung

In a Nutshell

Compact Technical Description

Continuous Integration/Continuous Deployment (CI/CD)

Containerization

Orchestration

Infrastructure as Code (IaC)

Monitoring & Observability

Exam-Relevant Key Points

Core Components

Practical Examples

1. CI/CD Pipeline with GitHub Actions

2. Docker Multi-Stage Build with Best Practices

3. Kubernetes Deployment with Helm and GitOps

IRC-Security.de

4. Terraform Infrastructure as Code

5. Monitoring with Prometheus and Grafana

DevOps Pipeline Architecture

CI/CD Pipeline Stages

Containerization Comparison

Container Runtimes

Orchestration Platforms

Infrastructure as Code Tools

Terraform vs. CloudFormation vs. Pulumi

IaC Best Practices

Monitoring and Observability

Observability Pillars

Alerting Strategies

Advantages and Disadvantages

Benefits of DevOps

Disadvantages

Common Exam Questions

Important Sources

Premium VPS Hosting

Related Posts

Cloud Native Development Fundamentals

Docker Containers: Images, Dockerfile & Compose

TDD & CI/CD Explained: Red-Green-Refactor