Azure Cosmos DB Cost Control: RU/s and Autoscale

Azure Cosmos DB cost control starts with understanding request units per second, the RU/s that meter every read, write, and query. Provisioned standard throughput bills for the capacity you reserve whether you use it or not, autoscale flexes capacity between a floor and a ceiling for a per-unit premium, and serverless charges only for the request units you consume. Picking the wrong model for your traffic, then over-provisioning on top of it, is the most common reason a Cosmos DB bill runs higher than it should.

This article is part of our Azure cluster. For the full account-wide picture, start with the complete guide to Azure cost optimization, the pillar this piece links up to. Sizing Cosmos DB throughput is a Cut-step and Lock-step decision in our See, Cut, Lock, Run method: match capacity to real demand, then put guardrails on it so it does not creep back up.

The request unit is the currency

One RU is the cost of a point read of a small item. Every operation has an RU charge, and large queries, cross-partition fan-outs, and heavy indexing cost many more. Control the RU charge per operation and the right throughput model, and you control the bill.

How request units drive the bill

Cosmos DB does not bill for CPU or memory directly. It bills for throughput, expressed in request units per second, and for stored data. Each operation consumes RU based on item size, the complexity of the query, the indexing it touches, and whether it crosses partitions. A point read by id and partition key is cheap; a cross-partition query that scans and sorts is expensive. The two ways to cut the bill are therefore to lower the RU cost of your operations and to stop paying for RU/s you never consume. The first is an application and schema problem, the second is a provisioning model problem, and a good review tackles both.

Provisioned, autoscale, and serverless

There are three throughput models, and the right one depends entirely on your traffic shape. Standard provisioned throughput reserves a fixed RU/s and bills for it continuously, which is most economical for steady, predictable, high-utilization workloads. Autoscale flexes capacity between a floor and a ten-times ceiling automatically and bills per hour for the highest RU/s reached that hour, at a per-unit premium over standard, which pays off for spiky or unpredictable traffic that would otherwise force you to provision for the peak. Serverless charges purely for consumed request units with no reserved floor, which suits intermittent, low-volume, or development workloads where capacity sits unused much of the time. Choosing among them is the same provisioned-versus-consumption trade-off you weigh elsewhere on Azure, related to the commitment thinking in understanding Azure capacity reservations.

Model	Bills for	Best for
Standard provisioned	Reserved RU/s, always	Steady, high-utilization traffic
Autoscale	Peak RU/s per hour, at a premium	Spiky or unpredictable traffic
Serverless	Consumed RU only	Intermittent, low-volume, dev

When autoscale saves and when it costs more

Autoscale is not automatically cheaper. Because it bills at a premium per RU/s, a workload that runs at consistently high utilization pays more on autoscale than it would on standard provisioned throughput sized to that steady load. The break-even rule is utilization: if your average usage sits well below your peak, autoscale wins because you stop paying for the peak around the clock; if your usage is flat and high, standard wins. The practical mistake is leaving everything on autoscale by default, including steady production workloads that would be cheaper on a right-sized fixed allocation. Profile the traffic for each container, then assign the model that matches its shape rather than applying one model everywhere.

Paying for RU/s your Cosmos DB containers never touch?

Our Azure cost audit profiles throughput per container, moves steady workloads off autoscale premiums and spiky ones onto it, trims indexing waste, and applies reserved capacity where it pays. On the performance model, you pay only from realized savings. No savings, no fee.

Book an Azure cost audit →

Cut the RU cost per operation

Even with the right throughput model, inefficient operations inflate the bill. The biggest lever is the partition key: a poorly chosen key creates hot partitions that throttle under load and force you to over-provision RU/s to compensate. Choose a key that spreads reads and writes evenly. The second lever is the indexing policy, which by default indexes every property; trimming it to index only the paths you actually filter or sort on can cut write RU charges substantially. Set a time-to-live to expire data that ages out rather than paying to store and index it forever, avoid cross-partition queries where a targeted read would do, and tune query patterns to read by id and partition key wherever possible. These are the same disciplined data-cost habits that govern Azure SQL Database cost optimization.

Lock in the saving

Once throughput is right-sized, keep it there. Reserved capacity for Cosmos DB can discount steady provisioned throughput in exchange for a one or three year commitment, which is worth applying to the baseline you are confident will persist. Set budgets and alerts so a developer doubling a container's RU/s shows up immediately rather than at month end, the discipline covered in Azure budgets and cost alerts. Cosmos DB pricing, the autoscale ratio, and reserved-capacity terms change over time, so verify the current models and rates against Microsoft's live Azure documentation before you commit a configuration.

Go deeper · free guide

The Azure Cost Optimization Field Guide includes our throughput-model decision tree and the partition-key review we run on Cosmos DB engagements. It is the downloadable companion to this article.

The short version

Azure Cosmos DB cost control comes down to matching the throughput model to each container's traffic shape, standard for steady, autoscale for spiky, serverless for intermittent, and then cutting the RU cost per operation through a good partition key, a trimmed indexing policy, and TTL. Apply reserved capacity to the stable baseline, put budgets and alerts around it, and verify current pricing before committing. When you want every container sized to real demand instead of a default, that is exactly what our Azure cost optimization service delivers.