Streaming and messaging costs are easy to start and hard to see. A team stands up a Kafka cluster or a Kinesis stream, sets retention generously, provisions throughput for the peak it might one day hit, and the bill becomes a fixed cost that nobody revisits as the traffic pattern settles. Because these systems are billed on a mix of provisioned capacity, data volume, retention and inter-zone transfer, the waste is spread across several dimensions at once, which is exactly why it survives a casual look at the invoice. The method below works through those dimensions in order, from the throughput you provision to the bytes you move between zones, so that each cut is measured against real usage rather than the high-water mark the system was sized for.
This article is part of our complete guide to cloud storage and data cost optimization, the cluster pillar it links up to. It sits alongside reducing logging and telemetry storage costs, since logging pipelines often ride the same streaming backbone and share the same retention and volume levers.
Streaming and messaging is billed on capacity, volume, retention and cross-zone transfer at once. Optimize each dimension against measured throughput, not the peak the system was provisioned for.
Step 1: Match provisioned throughput to real usage
The first and usually largest source of streaming and messaging waste is provisioned capacity that exceeds real throughput. Kinesis data streams in provisioned mode are billed per shard, Kafka clusters are sized to broker count and instance type, and a system provisioned for a forecast peak that never arrived pays for that headroom every hour. Pull the actual ingest and consume rates over a representative window, compare them against the provisioned capacity, and right-size the shard count or broker fleet to the observed load plus a sensible burst margin rather than the original guess. Where the traffic is spiky or unpredictable, on-demand or autoscaling capacity modes often beat a fixed provision, because they bill for what flows rather than for the ceiling. This is the same rightsizing logic covered in finding idle cloud resources across providers, applied to throughput instead of compute.
Step 2: Cut retention to what consumers actually need
Retention is the dimension teams set generously and forget. A stream that keeps seven days of data when consumers replay at most a few hours pays to store and replicate days of events nobody reads. Look at how far back consumers actually seek, the longest realistic replay or reprocessing window, and set retention to cover that plus a margin rather than to a round number chosen for comfort. On Kafka this is topic-level retention and log segment settings; on Kinesis it is the retention period per stream; on Pub/Sub it is message retention duration. Trimming retention reduces both the storage the system holds and, where the platform replicates retained data, the volume it keeps in sync. The same retention discipline applies across the data estate, as set out in data retention policies that save money.
| Cost dimension | Where it hides | The lever |
|---|---|---|
| Provisioned throughput | Shards or brokers sized for a peak that never came | Right-size to measured load; use on-demand for spiky traffic |
| Retention | Days kept when consumers replay hours | Set retention to the real replay window |
| Partition and shard count | Over-partitioned topics carrying overhead per partition | Consolidate to the parallelism consumers use |
| Cross-zone transfer | Producers and consumers in different zones from brokers | Co-locate, or use zone-aware routing |
| Idle topics and streams | Streams kept alive after the consumer was retired | Find and delete the orphans |
Is your streaming backbone billed for capacity you never use?
Our cloud cost audit profiles every stream and queue against real throughput, right-sizes capacity and retention, and proves the saving against a clean baseline on AWS, Azure, GCP and OCI. On the performance model, you pay only from realized savings. No savings, no fee.
Book a cloud cost audit →Step 3: Right-size partitioning, not just capacity
Partition and shard counts carry their own cost beyond the throughput they enable. Each partition adds replication, metadata and broker-side overhead, so a topic over-partitioned for a parallelism consumers never reach pays for coordination it does not use. Set partition count to the consumer parallelism the workload genuinely needs, which is usually far lower than the maximum a team provisions out of caution, and remember that you can scale partitions up later far more easily than you can reclaim the overhead of too many. On the consumer side, batching reads and acknowledgements reduces the request volume that some messaging services bill per operation, so SNS, SQS and Pub/Sub costs fall when consumers pull in batches rather than one message at a time. These are the same over-provisioning patterns described in over-provisioning and how to stop it, expressed in partitions and request counts.
Step 4: Keep producers and consumers close to the brokers
Cross-zone transfer is the streaming cost most often missed because it does not appear on the streaming line at all, it appears as data transfer. When producers, consumers and brokers sit in different availability zones, every message crosses a zone boundary and is billed as inter-zone traffic, and a high-throughput stream can run up a transfer bill that rivals its capacity cost. Where the platform supports it, enable zone-aware or rack-aware routing so consumers prefer a replica in their own zone, and co-locate the heaviest producers and consumers with the brokers they talk to most. The boundary economics are the same as those in reducing inter-service and inter-region traffic, and on a busy event backbone the inter-zone line is worth checking before you assume the capacity is the problem.
The Cloud Storage and Egress Cost Playbook includes the streaming throughput worksheet and the cross-zone traffic map we use to size event systems to real load before the bill compounds.
Step 5: Find and retire idle streams and topics
The last step is the cleanup. Streaming and messaging estates accumulate orphans: topics created for a feature that shipped differently, streams left running after the consumer was retired, dead-letter queues that fill but are never drained. Each one holds capacity, retention and sometimes replication that serves no live consumer. Inventory every topic, stream and queue, map each to a current consumer, and flag the ones with no active reader for retirement. This is the streaming-specific case of the broader hunt in zombie infrastructure: finding what everyone forgot, and on a mature platform the orphan list is often longer than anyone expects. Verify current per-shard, per-request and retention pricing for each provider in its documentation as of May 2026 before sizing the saving, since streaming pricing models and capacity modes change.
The short version
Streaming and messaging costs hide across five dimensions at once. Match provisioned throughput to measured load and prefer on-demand for spiky traffic; cut retention to the real replay window; right-size partition counts to the parallelism consumers use and batch reads to reduce per-request charges; co-locate producers and consumers with brokers so messages stop crossing zone boundaries; and find and retire the idle topics, streams and queues that no live consumer reads. Work the dimensions in order, measure each cut against real usage, and verify current pricing before sizing the saving. When you want every stream and queue profiled against actual throughput and the waste proven down, that is part of what our rightsizing and waste elimination service delivers.