Zombie Infrastructure: Finding What Everyone Forgot

Zombie infrastructure is any cloud resource that is still running and still billing but no longer serves a live purpose, because the workload it supported is gone and nobody decommissioned the resource behind it. It differs from idle capacity, which belongs to a live workload that simply runs part-time, and from over-provisioning, which is a live resource that is too large. A zombie has no live workload at all. It is the residue of every migration, experiment, and shutdown that cleaned up the application but left the infrastructure standing. On a long-lived estate, zombies can account for a meaningful slice of the bill precisely because no one is looking for them.

This article is part of our complete guide to cloud rightsizing and waste elimination, the cluster pillar it links up to. Hunting zombies is part of the Cut step in our See, Cut, Lock, Run method, and it overlaps closely with how to find idle cloud resources across providers, since most zombies first surface as idle.

A zombie has no owner and no purpose

The defining test is not low utilization but absence of an owner who can explain why it exists. An idle resource with a clear owner is schedulable. An idle resource nobody claims is a zombie, and the fix is decommissioning, not scheduling.

Why zombies accumulate

Zombies are a structural consequence of self-service provisioning without a decommission step. Creating a resource takes seconds and requires no approval; retiring one requires someone to remember it exists and to be confident nothing depends on it. That asymmetry guarantees accumulation. A team finishes a migration and moves on, leaving the source environment running. An engineer leaves the company and their experiments keep billing. A project is cancelled but its infrastructure was never tied to the project's lifecycle. None of these is negligence; each is the default outcome when nothing forces a teardown. This is the same dynamic that drives the broader 30 percent cloud waste problem.

Where to look first

Zombies cluster in predictable places. The richest hunting grounds are the leftovers of things that ended.

Zombie type	Where it hides	How to confirm
Orphaned compute	Instances behind retired apps	No traffic, no owner tag, no recent login
Unattached disks	Volumes left after instance deletion	Attachment state shows nothing
Dead environments	Whole dev or staging stacks	No deploys, no traffic for months
Stale snapshots and images	Backup vaults and image registries	Age past any retention rule
Idle endpoints	Load balancers, NAT gateways, reserved IPs	No connections, nothing attached

Want the zombies found and retired for you?

Our cloud cost audit hunts forgotten resources across every account on AWS, Azure, GCP and OCI, traces ownership, and hands you a safe decommission plan. On the performance model, you pay only from realized savings. No savings, no fee.

Book a cloud cost audit →

Trace ownership before you kill anything

The danger with zombies is that some are not dead, only sleeping. A disaster-recovery standby, a quarterly batch system, and a rarely-used compliance tool all look like zombies by utilization but are alive by design. So the step that matters most is tracing ownership. Use tags first, and where tags are missing, follow the trail: the resource that created it, the account it lives in, the naming convention, the last person to touch it in the activity log. The discipline of resolving missing ownership is in how to tackle untagged and unowned resources. Only when you can either find an owner who confirms it is dead, or exhaust every trace and find nothing, should the resource move to the decommission queue.

Decommission safely, with a delay

The safest way to retire a zombie is not to delete it immediately but to stop it, snapshot it, and wait. Stop the compute so it stops billing the expensive part while you keep the cheap snapshot as insurance. Tag it as a decommission candidate with a date, and if nothing breaks and no one objects within a set window, delete it. This staged approach catches the false positive, the resource that turns out to matter, before the irreversible step. Snapshot first and keep the rollback obvious, the same care described in storage waste: snapshots, orphaned disks, and old backups. The delay costs a little storage and saves you from the one deletion that would have caused an outage.

Go deeper · free framework

The Cloud Waste Audit Framework includes the queries we use to surface orphaned resources and the ownership-tracing checklist that separates true zombies from disaster-recovery standbys.

Stop new zombies from being born

Finding today's zombies is cleanup; the durable fix is preventing tomorrow's. Tie infrastructure to a lifecycle so resources cannot outlive their purpose silently: require an owner tag at creation, give non-production resources an expiry, and make decommissioning part of every project's closeout rather than an afterthought. The systematic version of this is in how to stop cloud waste from coming back, the Lock step that keeps the zombie population from rebuilding. Combined with the continuous detection in how to build a continuous waste detection process, it turns zombie hunting from a recurring crawl into a property of the platform.

The short version

Zombie infrastructure is the forgotten resources that kept billing after their workload was gone, and they accumulate because creating is easy and retiring is hard. Hunt them where things ended, trace ownership before killing anything, decommission with a staged delay and a snapshot, and prevent new ones with lifecycle tagging and expiry. When you want the hunt run across the whole estate at once, that is what our rightsizing and waste elimination service delivers.