To rightsize Kubernetes requests and limits you need one thing first: real usage data. A request is a reservation, the CPU and memory the scheduler sets aside for a pod, and the cluster pays for the sum of requests regardless of actual use. Most clusters carry requests that were guessed once and never revisited, which is why typical CPU utilization against requests sits far below what it should. Closing that gap is where the money is.
This how-to is part of our Kubernetes and container cost cluster. For the full picture, start with our complete guide to Kubernetes cost optimization, the pillar this piece links up to. It is the pod-level prerequisite to rightsizing node pools and instance types, which fixes the layer underneath.
Measure actual usage before touching anything
Pull at least one to two weeks of CPU and memory usage per container so weekly cycles and traffic peaks are visible. Look at the distribution, not the average: a high percentile such as the 95th or 99th tells you the safe headroom, while the average alone will set requests too low and cause throttling or eviction. The gap between the request and the high-percentile actual usage is your rightsizing opportunity, expressed in reserved cores and gigabytes you are paying for but not using.
Set requests to reality plus a margin
Set CPU and memory requests to a high percentile of observed usage plus a modest safety margin, not to a round number someone liked. Right requests are what let the scheduler pack pods densely, so this single change usually recovers the most capacity. Treat CPU and memory separately, because a pod is often generous on one and tight on the other, and a one-size request wastes whichever dimension it overshoots. Roll changes out gradually and watch for throttling so you tune toward fit, not toward fragility.
Use limits to protect, not to size
Requests and limits do different jobs and confusing them is a common, costly mistake. The request drives scheduling and cost; the limit caps how much a pod can burst. For memory, set a limit to prevent a leak from taking down a node, since exceeding it triggers an out-of-memory kill. For CPU, be cautious with tight limits because they throttle the workload even when the node has spare capacity. The pattern that works for most services is a request at real usage and a memory limit as a guardrail, with CPU limits used sparingly.
Want your cluster rightsized for you?
Our cost audit reads your actual pod usage, recommends requests and limits per workload, and tunes the result so utilization rises without risking stability. On the performance model, you pay only from realized savings. No savings, no fee.
Book a cloud cost audit →Let the Vertical Pod Autoscaler hold the savings
Manual rightsizing is a snapshot that decays as workloads change. The Vertical Pod Autoscaler observes usage and recommends or applies request changes automatically, which keeps requests honest without a quarterly spreadsheet exercise. Start it in recommendation mode to build trust, then move suitable workloads to automatic updates. Be careful combining it with horizontal autoscaling on the same metric, and exclude workloads where restarts are disruptive. Done right, the autoscaler turns rightsizing from a project into a property of the cluster.
Higher requests are not the only fix for instability
When a pod misbehaves, the reflex is to raise requests, which quietly inflates cost across every replica. Often the real issue is a memory leak, a noisy neighbor, or a missing limit, not a genuine need for more reserved capacity. Diagnose before you reserve, because a permanent request increase to paper over a bug is one of the most expensive habits in Kubernetes. Pair this discipline with the structural view in node pool rightsizing so pods and nodes are sized together.
| Setting | What it controls | How to size |
|---|---|---|
| CPU request | Scheduling and cost | High percentile plus margin |
| Memory request | Scheduling and cost | High percentile plus margin |
| Memory limit | Guardrail against leaks | Above request, room to burst |
| CPU limit | Throttle cap | Use sparingly |
| VPA | Holds rightsizing | Recommend, then automate |
Autoscaler behavior and feature names above reflect Kubernetes and the major providers as of May 2026. Verify current VPA behavior in your platform's documentation before automating, as it changes.
The Kubernetes Cost Optimization Handbook includes the request-sizing percentiles and the VPA rollout plan behind this article. It is the downloadable companion.
The short version
Measure real usage, set requests to a high percentile plus a margin, use memory limits as guardrails and CPU limits sparingly, let the Vertical Pod Autoscaler keep requests honest, and diagnose instability instead of inflating reservations. Then size the nodes beneath them with node pool rightsizing. When you want the cluster rightsized and kept that way, that is what our rightsizing and waste elimination service delivers.