Running Kubernetes workloads on Spot instances trades a small amount of reliability for a large discount. Spot (called Spot VMs on Google Cloud, Spot Instances on AWS, and Spot Virtual Machines on Azure) is spare provider capacity sold cheaply on the understanding it can be reclaimed. Kubernetes is unusually well suited to it, because the scheduler already reschedules pods when a node disappears.
This how-to is part of our Kubernetes and container cost cluster. For the full picture, start with our complete guide to Kubernetes cost optimization, the pillar this piece links up to.
Step 1: decide which workloads qualify
Spot suits anything that tolerates a node vanishing on short notice: stateless web and API services with multiple replicas, batch and data-processing jobs, CI runners, dev and test environments, and queue consumers that retry. It does not suit single-replica stateful services, workloads with long ungraceful shutdowns, or anything that holds local state it cannot rebuild. The rule of thumb: if losing one pod for a minute is invisible to users, it is a Spot candidate.
Step 2: create a dedicated Spot node pool
Run Spot capacity in its own node pool, separate from the on-demand pool that carries critical workloads. On managed Kubernetes this is a node pool flagged as Spot or preemptible. Keep a smaller on-demand or committed pool for the control-sensitive workloads, and let the Spot pool carry the bulk of stateless compute. This split is the foundation of the whole pattern; it also pairs naturally with GKE Autopilot, Spot, and bin packing.
Step 3: steer pods with taints, tolerations, and affinity
Taint the Spot nodes so nothing lands there by accident, then add a matching toleration to the workloads you have cleared for Spot. Use node affinity or nodeSelector to prefer the Spot pool for those workloads and the on-demand pool for the rest. This keeps databases and ingress controllers off Spot while packing batch and stateless services onto it.
Step 4: handle disruption gracefully
The provider sends a termination signal before reclaiming a Spot node, typically a short warning window. Make sure workloads handle SIGTERM cleanly, set sensible terminationGracePeriodSeconds, and use Pod Disruption Budgets so the scheduler never drains too many replicas at once. Spread replicas across zones and node pools with topology spread constraints, so a single Spot reclamation never takes a whole service down.
Want Spot rolled out without the risk?
Our cost audit identifies which of your Kubernetes workloads are safe for Spot, builds the node pools and disruption handling, and measures the savings. On the performance model, you pay only from realized savings. No savings, no fee.
Book a cloud cost audit →Step 5: blend Spot with on-demand and commitments
The strongest setups run a layered fleet: a committed-use or reserved base for the always-on critical workloads, on-demand for short bursts, and Spot for everything elastic and fault-tolerant. The autoscaler fills the Spot pool first and falls back to on-demand when Spot is unavailable. This blend is where the cluster bill drops the most while reliability holds; for the autoscaling side, see rightsizing node pools and instance types.
| Workload | Spot fit | Why |
|---|---|---|
| Stateless web / API (multi-replica) | Strong | Reschedules instantly |
| Batch and data jobs | Strong | Retries on interruption |
| CI runners, dev/test | Strong | No user impact |
| Single-replica stateful services | Weak | No failover headroom |
| Databases, ingress controllers | Avoid | Reclamation risks outages |
Spot product names and behavior above reflect the providers as of May 2026. Verify current interruption notice windows and Spot terms in the provider's documentation before relying on them, as they change.
The Kubernetes Cost Optimization Handbook includes the Spot node pool patterns and the disruption-handling manifests behind this article. It is the downloadable companion.
The short version
Pick fault-tolerant, multi-replica workloads, run them in a dedicated Spot node pool, steer pods with taints and tolerations, handle the termination signal gracefully with disruption budgets, and blend Spot with a committed base. The discount is large and the risk is manageable when the pattern is right. To allocate the savings back to teams, read Kubernetes cost allocation. When you want Spot rolled out safely across your clusters, that is what our rightsizing and waste elimination service delivers.