Reserved and committed GPU capacity lowers your accelerator rate in exchange for committing to a level of usage over a term, typically one or three years, the same bargain as reserved instances and committed-use discounts on ordinary compute. The discount is meaningful because GPU on-demand rates are the highest in the cloud, but the risk is also higher: commit to capacity you stop using and you pay for idle silicon at a discount, which is still waste. The rule that makes commitments pay is sizing them to a clean, rightsized baseline, never to today's unoptimized usage.
This article is part of our AI, GPU and ML cluster. For the full picture, start with the complete guide to AI and GPU cost optimization, the pillar this piece links up to. Committed capacity is the last move in the Cut step of our See, Cut, Lock, Run method: you optimize first, then lock the rate on whatever demand remains steady.
Commit before you rightsize and clear idle GPU time, and you lock in the waste at a discount. Commit after, and you lock the rate on demand you will genuinely run. The discount is the same either way; what changes is whether you are paying a reduced rate for work or for idle accelerators.
The three ways to buy a GPU-hour
Every GPU workload is bought through some mix of three instruments. On-demand is the flexible, full-price rate with no commitment, right for spiky and short-lived needs. Spot or preemptible is the deepest discount in exchange for interruption, right for fault-tolerant training and sweeps, covered in spot GPUs: cutting training costs by up to 90 percent. Reserved or committed capacity sits between them: a lower rate than on-demand in exchange for a usage commitment over a term, right for the steady baseline you will run regardless. The art is splitting your demand across the three so each GPU-hour is bought at the cheapest rate its workload can tolerate.
| Instrument | Discount | Commitment | Best for |
|---|---|---|---|
| On-demand | None | None | Spiky, short-lived, unpredictable |
| Spot / preemptible | Deepest | None, but interruptible | Fault-tolerant training and sweeps |
| Reserved / committed | Moderate to large | 1 or 3 year term | Steady, predictable baseline |
How GPU commitments differ across clouds
The shape varies by provider. Some offer capacity reservations that guarantee the hardware is available and can be combined with a separate billing discount, others fold GPUs into broader committed-use or savings-plan style discounts that apply across instance families, and reserved instances tie the discount to a specific configuration. For scarce, in-demand accelerators, a reservation can also be about securing access, not only price, because the cheapest rate is worthless if you cannot get the GPUs. These programs and their terms move quickly, so verify the current GPU commitment options, discount levels, and whether a given program guarantees capacity against each provider's live documentation before you commit.
Sitting on steady GPU demand at on-demand rates?
Our cost audit rightsizes your accelerator fleet first, separates the steady baseline from the spiky and interruptible work, and sizes commitments to the clean baseline so you capture the discount without locking in waste. On the performance model, you pay only from realized savings. No savings, no fee.
Book a cloud cost audit →How to size a GPU commitment safely
Size to the floor, not the ceiling. Look at your steady-state GPU usage over a representative window after rightsizing, identify the level of demand that is present nearly all the time, and commit to that floor. Cover the variable layer above it with on-demand and spot. Committing to the floor means the reservation is almost always in use, so you capture the discount without paying for idle capacity, while the flexible instruments absorb the peaks. Avoid committing to peak usage or to a number inflated by workloads you have not yet rightsized, because that is how a discount turns into a stranded cost. The same logic governs commitments across every cluster, covered in our commitment management work and the FinOps implementation service.
When not to commit
Commitments are wrong when demand is genuinely unpredictable, when the model or framework is changing fast enough that you cannot trust next quarter's GPU mix, or when a new accelerator generation is imminent and would strand a long-term commitment to older hardware. In those cases keep flexibility with on-demand and spot, and revisit committing once the baseline stabilizes. A short-term or smaller commitment can be a sensible hedge while you build confidence in the baseline rather than an all-or-nothing bet.
The AI and GPU Cost Control Guide includes our GPU commitment sizing model and the baseline-versus-peak split we use on engagements. It is the downloadable companion to this article.
The short version
Reserved and committed GPU capacity lowers the rate on steady accelerator demand in exchange for a term commitment. The discount is large because GPU rates are high, but it only pays if you rightsize and clear idle time first, then commit to the floor of demand and cover the variable layer with on-demand and spot. Verify each provider's current GPU commitment programs before signing. For the sizing step that should come first, see how to rightsize GPU instances. When you want your GPU commitments sized to a clean baseline, that is exactly what our FinOps implementation service delivers.