Phase 18 — KEDA — Event-Driven Autoscaling

KEDA (Kubernetes Event-Driven Autoscaling) scales your pods based on real signals — queue depth, message count, HTTP request rate, Prometheus metrics, or a cron schedule. Standard Kubernetes HPA only scales on CPU/RAM — KEDA scales on what actually matters.

How KEDA Extends HPA

Standard HPA:
  CPU > 70% → scale up
  CPU < 30% → scale down
  Problem: CPU doesn't always reflect load (batch jobs, queues)

KEDA:
  RabbitMQ queue depth > 100 → scale up workers
  Queue empty → scale down to 0 (zero pods = zero cost)
  Cron: 08:00–18:00 → 5 replicas, nights → 1 replica
  Prometheus metric: requests/s > 500 → scale up

Install KEDA

helm repo add kedacore https://kedacore.github.io/charts

helm install keda kedacore/keda \
  --namespace keda \
  --create-namespace

kubectl get pods -n keda

ScaledObject — The Core Resource

A ScaledObject links a Deployment to a trigger:

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: my-app-scaler
  namespace: production
spec:
  scaleTargetRef:
    name: my-app            # the Deployment to scale
  minReplicaCount: 0        # can scale to zero!
  maxReplicaCount: 20
  triggers:
    - type: prometheus
      metadata:
        serverAddress: http://prometheus.monitoring.svc:9090
        metricName: http_requests_total
        threshold: "100"    # scale up when > 100 req/s
        query: sum(rate(http_requests_total{app="my-app"}[1m]))

Example 1 — Scale on RabbitMQ Queue Depth

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: worker-scaler
  namespace: production
spec:
  scaleTargetRef:
    name: job-worker
  minReplicaCount: 0
  maxReplicaCount: 10
  triggers:
    - type: rabbitmq
      metadata:
        host: amqp://rabbitmq.production.svc:5672
        queueName: job-queue
        queueLength: "5"    # 1 worker per 5 messages in queue

When the queue is empty → 0 workers (saves all resources). When 50 messages → 10 workers processing in parallel.

Example 2 — Cron Schedule

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: api-cron-scaler
  namespace: production
spec:
  scaleTargetRef:
    name: api-deployment
  triggers:
    - type: cron
      metadata:
        timezone: "Europe/Paris"
        start: "0 8 * * 1-5"    # 08:00 Mon–Fri
        end:   "0 20 * * 1-5"   # 20:00 Mon–Fri
        desiredReplicas: "5"
    - type: cron
      metadata:
        timezone: "Europe/Paris"
        start: "0 20 * * 1-5"   # nights + weekends
        end:   "0 8 * * 1-5"
        desiredReplicas: "1"

Example 3 — Scale ML Workers on Job Queue

Scale Kubeflow training workers based on pending jobs:

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: ml-worker-scaler
  namespace: ai
spec:
  scaleTargetRef:
    name: ml-worker
  minReplicaCount: 0
  maxReplicaCount: 3        # max 3 workers (1 per node)
  triggers:
    - type: prometheus
      metadata:
        serverAddress: http://prometheus.monitoring.svc:9090
        metricName: ml_jobs_pending
        threshold: "1"
        query: ml_training_jobs_pending_total

When no ML jobs → 0 pods, freeing cluster resources for other workloads.

ScaledJob — For Batch Processing

For jobs that should run once per event (not continuously):

apiVersion: keda.sh/v1alpha1
kind: ScaledJob
metadata:
  name: image-processor
  namespace: production
spec:
  jobTargetRef:
    template:
      spec:
        containers:
          - name: processor
            image: harbor.local/platform/image-processor:latest
        restartPolicy: Never
  triggers:
    - type: rabbitmq
      metadata:
        queueName: images-to-process
        queueLength: "1"    # 1 job pod per message

One message = one pod = process and exit. Perfect for image processing, report generation, one-off tasks.

KEDA + This Cluster

Autoscale based on:
  ✔ n8n triggers webhook → scale API workers
  ✔ Airflow DAG enqueues jobs → scale batch processors
  ✔ Prometheus: request rate → scale web services
  ✔ Cron: business hours → full capacity, nights → minimal
  ✔ ML job queue → scale training workers

Combined with scale-to-zero, idle workloads consume no CPU/RAM — your 48 GiB is always available where it's needed.

Done When

✔ KEDA pods Running in keda namespace
✔ First ScaledObject created and monitoring triggers
✔ Deployment scales up when trigger fires
✔ Deployment scales to zero when idle
✔ kubectl get scaledobjects shows all scalers

How KEDA Extends HPA​

Install KEDA​

ScaledObject — The Core Resource​

Example 1 — Scale on RabbitMQ Queue Depth​

Example 2 — Cron Schedule​

Example 3 — Scale ML Workers on Job Queue​

ScaledJob — For Batch Processing​

KEDA + This Cluster​

Done When​