Vertical Pod Autoscaling, or VPA, improves cost efficiency by setting each pod's resource requests to its observed usage rather than to a number a developer guessed at deploy time. Because the scheduler provisions nodes against requests, requests that match reality let the cluster pack tighter and run on fewer nodes. VPA is the automation layer over manual rightsizing: it measures, recommends, and optionally applies right-sized requests continuously, so the request-to-usage gap stays closed as workloads change.
This article is part of our Kubernetes and container cost cluster. For the full picture, start with our complete guide to Kubernetes cost optimization, the pillar this piece links up to. VPA automates the work explained manually in how to rightsize Kubernetes requests and limits.
How Vertical Pod Autoscaling works
VPA has three parts: a recommender that observes historical and live usage and computes target requests, an updater that evicts pods whose requests are off so they restart with the new values, and an admission controller that injects the recommended requests when pods are created. You can run it in recommendation-only mode, where it surfaces target requests without changing anything, or in auto mode, where it applies them. For cost work, the recommender is the valuable part: it tells you exactly how over-provisioned each workload is.
Where VPA delivers cost efficiency
VPA earns its keep on workloads with steady or slowly changing resource profiles that have been over-requested, which describes a large share of typical clusters. By pulling requests down to observed usage plus a margin, it frees capacity the scheduler can reclaim, and combined with consolidation that freed capacity becomes fewer nodes and a smaller bill. It is especially useful at scale because it rightsizes hundreds of workloads continuously, work no team would sustain by hand. The savings only land if a consolidating autoscaler then removes the freed nodes, as covered in how to reduce idle capacity in Kubernetes.
Want requests rightsized automatically and safely?
Our cost audit profiles your workloads, identifies where VPA will help and where it will hurt, and rolls it out with the guardrails to avoid disruption. On the performance model, you pay only from realized savings. No savings, no fee.
Book a cloud cost audit →The limits: VPA and HPA together
VPA changes how big each pod is; the Horizontal Pod Autoscaler, or HPA, changes how many pods there are. They serve different needs and can conflict if both act on the same CPU or memory signal, so the standard pattern is HPA on a custom or external metric while VPA manages requests, or VPA in recommendation mode feeding manual updates. VPA also restarts pods to apply changes, which workloads must tolerate through disruption budgets and graceful shutdown. It is not a fit for highly spiky workloads where horizontal scaling is the right answer.
How to roll VPA out safely
Start in recommendation-only mode across non-critical workloads and compare its targets against current requests to size the opportunity before changing anything. Move the safest, steadiest workloads to auto mode first, with conservative bounds so VPA cannot set requests dangerously low, and watch for eviction-driven restarts. Expand gradually as confidence grows. The discipline here mirrors any rightsizing program: measure, apply to safe candidates, verify, then widen.
| Aspect | VPA | HPA |
|---|---|---|
| Adjusts | Pod size (requests) | Pod count (replicas) |
| Best for | Steady, over-requested workloads | Spiky, scalable workloads |
| Cost effect | Closes request-to-usage gap | Matches replicas to load |
| Disruption | Restarts pods to apply | Adds and removes pods |
| Run mode | Recommend or auto | Continuous |
VPA behavior and modes above reflect the project as of May 2026. Verify current VPA capabilities, in-place resize support, and managed-platform integration against your platform's documentation before enabling auto mode, as these features evolve.
The Kubernetes Cost Optimization Handbook includes the VPA rollout checklist and the HPA-plus-VPA pattern behind this article. It is the downloadable companion.
The short version
Vertical Pod Autoscaling improves cost efficiency by setting pod requests to observed usage, automating the rightsizing that closes most Kubernetes waste. Use the recommender to size the opportunity, roll auto mode out to steady workloads with conservative bounds, and pair it with HPA carefully and consolidation always. When you want VPA scoped and rolled out without disruption, that is what our rightsizing and waste elimination service delivers.