Logging and telemetry costs are a pipeline, and you pay at every stage of it. You pay to ingest the data, often by the gigabyte, the moment it arrives. You pay again to index it so it can be searched quickly, which on many platforms is the most expensive part. And you pay continuously to retain it, with hot searchable retention costing far more than cold archive. The reason these bills run away is that the volume of logs and metrics scales with traffic, with the number of services, and with every debug line a developer left on, while retention defaults are generous and nobody is incentivized to trim. Reducing the cost means attacking each stage on its own terms: ingest less, index only what you search, and retire data to cheaper tiers on a schedule.
This article is part of our complete guide to cloud storage and data cost optimization, the cluster pillar it links up to. It complements data retention policies that save money, which sets the retention discipline this article applies to observability data specifically.
Logs and telemetry are billed at ingestion, indexing and retention. Cut volume at the source, index only what you actually search, and tier the rest to cheap archive before it ever reaches the premium store.
Cut ingestion volume at the source
The cheapest log is the one you never send. Before any data reaches the observability platform, filter and sample it at the agent or collector. Drop the debug and trace lines that were left on after an incident, deduplicate repetitive health-check and heartbeat noise, and sample high-volume low-value streams such as access logs rather than ingesting every line. For metrics, reduce cardinality: every unique combination of label values is a separate time series to store, and a single high-cardinality label such as a raw user ID or request ID can multiply the metric count by orders of magnitude. Cutting volume at the source reduces the bill at every downstream stage at once, which makes it the highest-leverage change available, the same source-side principle as tackling logging and observability sprawl.
Index only what you search
Indexing is what makes logs instantly searchable, and on many platforms it is the dominant cost, charged separately from raw storage. The lever is to index selectively: keep full indexing for the high-value logs you genuinely query during incidents, such as application errors and security events, and route the rest to a cheaper non-indexed or archive tier that you can still search slowly when you need to. Most platforms now support this split, sometimes called flex or archive logging, where data lands cheaply and is only rehydrated for search on demand. The question to ask of every log stream is not whether it might be useful someday but whether you actually search it, and how fast you need the answer when you do.
| Stage | What drives the cost | The lever |
|---|---|---|
| Ingestion | Gigabytes and time series sent | Filter, sample, reduce metric cardinality |
| Indexing | Data made instantly searchable | Index only high-value streams; archive the rest |
| Hot retention | Searchable days kept | Short hot window, then tier down |
| Archive retention | Long-term cold copies | Cheap object storage with expiry |
Is observability the line nobody will cut?
Our cloud cost audit profiles your highest-volume log and metric streams, tunes ingestion and retention to what you actually query, and proves the saving against a clean baseline on AWS, Azure, GCP and OCI. On the performance model, you pay only from realized savings. No savings, no fee.
Book a cloud cost audit →Tier retention to match how data ages
Observability data loses value quickly. The logs from the last few days are queried constantly during incidents; logs from last quarter are queried almost never, and usually only for compliance or a postmortem. So retention should be tiered: a short hot window of fully searchable data, a longer warm window in a cheaper queryable tier, and a long cold archive in object storage for anything you must keep for compliance. Set these windows per stream rather than applying one long retention to everything, because a ninety-day searchable retention on a high-volume access log is almost always money spent on data nobody will ever open. This is the same staged approach as tiering data automatically by access pattern, applied to telemetry.
The Cloud Storage and Egress Cost Playbook includes the log-tiering worksheet and the metric-cardinality checklist we use to cut observability spend without losing incident visibility.
Govern it so it stays cut
Logging cost creeps back because every new service ships with its own logging and every incident adds a debug stream that never gets turned off. So the saving has to be governed, not just made once. Put a per-team or per-service budget on observability spend so the cost has an owner, route new log sources through a default-cheap configuration rather than a default-verbose one, and review the highest-volume streams on a schedule the way you would any other recurring waste. This is the continuous discipline described in building a continuous waste detection process. Without it, telemetry cost ratchets up with every deploy, because adding a log is easy and removing one is nobody's job.
Watch the per-feature observability traps
Beyond raw logs and metrics, modern observability platforms bill for several features that each have their own runaway mode. Custom metrics are charged per active time series, so a well-meaning dashboard that adds a high-cardinality custom metric can multiply cost overnight; audit the custom-metric inventory and drop the ones nobody graphs. Application tracing is often priced per ingested or indexed span, and full-fidelity tracing on a high-throughput service is expensive, so sample traces rather than capturing every request, keeping full capture only for errors and a representative baseline. Synthetic monitors and uptime checks bill per check per location, and teams accumulate dozens pointed at endpoints that no longer matter. And per-host or per-container agent pricing scales with a fleet that autoscaling can inflate, so it is worth confirming you are not paying agent fees for ephemeral nodes that live for minutes. Each of these is the observability version of the forgotten resources in auditing your cloud storage footprint: small individually, large in aggregate, and invisible until someone counts them.
The short version
Reduce logging and telemetry storage costs by treating ingestion, indexing and retention as three separate bills. Filter and sample at the source and cut metric cardinality to ingest less; index only the streams you actually search and archive the rest cheaply; tier retention so hot searchable data is a short window and the long tail lives in cheap archive with expiry; and govern the whole thing with per-team budgets so it does not creep back. Keep the visibility that matters during incidents and stop paying premium rates for data nobody queries. When you want the highest-volume streams found and observability spend proven down across the estate, that is part of what our rightsizing and waste elimination service delivers.