The average Kubernetes cluster we inherit is running at around 20–35% CPU utilisation and 40–50% memory utilisation. That means for every dollar of compute the team thinks they're using, they're actually spending two to four. Cloud bills are not a pricing problem — they're an engineering problem.
This playbook covers the interventions we apply, in rough priority order, when taking over or auditing a production Kubernetes environment. We consistently recover 50–65% of compute spend without touching application code or SLOs.
Step 0: Get Visibility With OpenCost
You cannot optimise what you cannot see. Before touching anything, install OpenCost (or Kubecost if you want the commercial edition) to get per-namespace, per-workload cost visibility.
helm install opencost opencost/opencost \
--namespace opencost \
--create-namespace \
--set opencost.exporter.cloudProviderApiKey="YOUR_KEY"
OpenCost gives you:
- Cost allocation per deployment, namespace, label, and team.
- Efficiency scores (requested vs actual utilisation).
- Idle resource cost — the single most important number for most clusters.
Get this in front of engineering leads and finance. In our experience, seeing "a substantial monthly spend on idle pods in the staging namespace" is the fastest way to get organisation-wide buy-in for what comes next.
Step 1: Right-Size Workloads With VPA
Developers consistently over-request CPU and memory. The Vertical Pod Autoscaler (VPA) in recommendation mode analyses actual usage and tells you what your containers actually need.
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: api-service-vpa
spec:
targetRef:
apiVersion: "apps/v1"
kind: Deployment
name: api-service
updatePolicy:
updateMode: "Off" # Recommendation only — don't auto-apply yet
Run VPA in Off mode for one week to collect recommendations, then review. We typically see CPU requests that can be halved and memory requests that can be reduced by 30–40% without any impact on performance or OOMKills.
Once you're confident, switch to Auto mode for non-critical workloads. Keep stateful services on Off — VPA evicts pods to resize them, which is fine for stateless APIs but painful for databases.
Step 2: Spot and Preemptible Nodes
Spot instances (AWS) and preemptible VMs (GCP) are 60–80% cheaper than on-demand equivalents. The catch: they can be reclaimed with 2 minutes' notice.
The correct architecture is a mixed node pool:
- On-demand nodes: 20–30% of base capacity, used for stateful workloads, daemonsets, and system components.
- Spot nodes: 70–80% of capacity, used for stateless application pods.
Pod disruption budgets and graceful shutdown handlers (SIGTERM → drain in-flight requests → exit) are prerequisites. Most web applications tolerate this well. Batch jobs and ML training runs actually benefit — you checkpoint and resume, which improves fault tolerance anyway.
Step 3: Replace Cluster Autoscaler With Karpenter
Karpenter (now a CNCF project, AWS-native but increasingly portable) replaces the Cluster Autoscaler with a fundamentally better model: instead of scaling pre-defined node groups, Karpenter provisions the exact instance type that best fits the pending pods.
Results from a recent migration (300-node EKS cluster):
- Node provisioning time: 7 minutes → 90 seconds.
- Instance type utilisation efficiency: improved by 22% (better bin packing).
- Monthly compute spend: down 18% from Karpenter alone, before other changes.
# Karpenter NodePool — mixed on-demand + spot
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
name: default
spec:
template:
spec:
requirements:
- key: karpenter.sh/capacity-type
operator: In
values: ["spot", "on-demand"]
- key: kubernetes.io/arch
operator: In
values: ["amd64", "arm64"]
nodeClassRef:
apiVersion: karpenter.k8s.aws/v1beta1
kind: EC2NodeClass
name: default
disruption:
consolidationPolicy: WhenUnderutilized
consolidateAfter: 30s
The consolidateAfter: 30s setting enables bin packing: Karpenter will consolidate underutilised nodes aggressively, bin-packing pods onto fewer, fuller nodes and terminating the empties.
Step 4: Event-Driven Scaling With KEDA
HPA scales on CPU and memory. KEDA (Kubernetes Event-Driven Autoscaler) scales on external signals — queue depth, database row count, Datadog metrics, Prometheus queries — whatever actually reflects your workload.
A concrete example: a batch processing service that processes messages from an SQS queue. With HPA, you'd scale on CPU, which lags the actual work by several minutes. With KEDA:
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: queue-processor
spec:
scaleTargetRef:
name: queue-processor
minReplicaCount: 0 # Scale to zero when queue is empty
maxReplicaCount: 50
triggers:
- type: aws-sqs-queue
metadata:
queueURL: https://sqs.us-east-1.amazonaws.com/123456/jobs
queueLength: "5" # Target 5 messages per replica
awsRegion: us-east-1
minReplicaCount: 0 is the key win: the service consumes zero compute between batch runs. For workloads with variable or bursty traffic patterns, this alone can reduce compute spend by 40–70%.
Step 5: Namespace Resource Quotas
Without quotas, a single misconfigured deployment can consume the entire cluster's capacity. Resource quotas enforce limits at the team or environment level:
apiVersion: v1
kind: ResourceQuota
metadata:
name: staging-quota
namespace: staging
spec:
hard:
requests.cpu: "20"
requests.memory: "40Gi"
limits.cpu: "40"
limits.memory: "80Gi"
count/pods: "100"
Pair quotas with LimitRanges to set default requests/limits on containers that don't specify them (preventing unbounded resource consumption by careless developers).
Key Takeaways
- Install OpenCost first — visibility is the foundation of all FinOps work.
- VPA in recommendation mode typically reveals 30–50% over-requested resources.
- Spot/preemptible nodes at 70–80% of capacity save 60–80% on those node costs.
- Karpenter bin-packing + consolidation is a 15–25% improvement over Cluster Autoscaler alone.
- KEDA scale-to-zero is transformative for batch and event-driven workloads.
- Namespace quotas prevent runaway consumption and enforce team accountability.