Home/Library/Cloud Run & Functions Cost
How-to · Google Cloud · Updated May 2026

Cloud Run and Cloud Functions Cost Optimization

Cloud Run and Cloud Functions cost optimization is about paying only for the work you do. Get CPU and memory right, let services scale to zero, pick the correct billing mode, and tune concurrency, and serverless compute stays as cheap as it promises.

Cloud Run and Cloud Functions cost optimization matters because serverless feels free until it is not. The model bills for the resources your code uses while it runs, so over-allocated memory, services pinned to always-on, the wrong billing mode, and low concurrency quietly inflate the bill. The good news is that the levers are simple and low-risk: most of the saving comes from right-sizing and from letting the platform do what it is designed to do, which is scale to zero when idle.

This article links up to our complete guide to Google Cloud cost optimization, the pillar for this cluster. Serverless containers are one end of the spectrum; the other is managed Kubernetes, covered in GKE cost optimization, and the right choice between them is a cost decision in itself.

Right-size CPU and memory to real usage

Both Cloud Run and Cloud Functions bill on the CPU and memory you allocate, so the most common waste is allocating more than the workload needs. Measure actual memory and CPU usage from Cloud Monitoring and set the allocation to fit with a modest safety margin, rather than defaulting to a large size. Because you pay per allocated unit for the duration of execution, trimming an over-provisioned memory setting cuts the bill on every single invocation. Faster, leaner code also helps directly, since you pay for execution time.

Let services scale to zero

The defining benefit of Cloud Run and Cloud Functions is that they scale to zero: when no requests are arriving, no instances run and you pay nothing for compute. The way teams break this is by setting a minimum number of instances to avoid cold starts. Minimum instances keep capacity warm around the clock and bill accordingly, so use them only where cold-start latency genuinely hurts the user, and set the minimum as low as the experience allows. For background, batch and internal services, let them scale to zero and accept the occasional cold start.

Serverless bill creeping up?

Our Google Cloud cost audit profiles your Cloud Run and Cloud Functions usage, right-sizes CPU and memory, tunes minimum instances and concurrency, and picks the billing mode that fits each service. On the performance model, you pay only from realized savings. No savings, no fee.

Book a GCP cost audit →

Choose the right Cloud Run billing mode

Cloud Run offers two billing modes. Request-based (instance billing only during request processing) charges you only while a request is being handled, which is cheapest for spiky, request-driven services that are idle much of the time. Instance-based billing charges for the full lifetime of an instance, including idle time between requests, but is suited to services that do background work outside requests or need always-allocated CPU. Match the mode to the traffic shape: bursty public endpoints favor request-based; services doing continuous background processing favor instance-based.

Tune concurrency so you run fewer instances

Concurrency is how many simultaneous requests a single Cloud Run instance handles. The default is conservative, and raising it where your code is safely concurrent means each instance does more work, so the platform spins up fewer instances for the same traffic, which directly lowers cost. Profile how much load an instance can take without latency degrading, then set concurrency accordingly. For Cloud Functions, the equivalent lever is the concurrency setting on the underlying runtime where supported.

Mind the surrounding costs

Compute is not the whole bill. Watch egress, which is charged when responses leave Google Cloud, and the cost of services your functions call, such as a database opened on every invocation rather than pooled. Set budgets and alerts so a runaway scale-out, for instance a retry storm, is caught quickly. Cloud Run and Cloud Functions features and billing modes reflect Google Cloud as of May 2026; verify current options in the Cloud Run and Cloud Functions documentation before changing configuration.

LeverWhat it cuts
Right-size CPU and memoryCost on every invocation
Allow scale-to-zeroIdle-time compute charges
Low / zero minimum instancesAlways-warm capacity cost
Request-based billing (where it fits)Charges during idle between requests
Higher concurrencyNumber of instances needed
Go deeper · free field guide

The Google Cloud Cost Optimization Field Guide includes the serverless right-sizing checklist and the billing-mode decision tree we use on Cloud Run estates. It is the downloadable companion to this guide.

Common questions about serverless cost on Google Cloud

Does Cloud Run charge when no requests are arriving?

With request-based billing and no minimum instances, no. The service scales to zero and you pay nothing for compute while idle. If you set minimum instances or use instance-based billing, you pay to keep capacity warm between requests, so reserve those settings for services that genuinely need them.

What are minimum instances and when should I use them?

Minimum instances keep a set number of containers warm so requests avoid cold-start latency. They bill around the clock, so use them only where cold-start delay measurably hurts the user experience, and set the number as low as that experience allows. Background and internal services rarely need them.

Cloud Run or Cloud Functions, which is cheaper?

For a given workload the cost is usually similar because both bill on allocated resources and execution time. The right choice is driven more by fit: Cloud Functions suits small event-driven snippets, while Cloud Run suits containerized services and gives finer control over concurrency and billing mode, which can make it cheaper at higher request volumes.

The short version

For Cloud Run and Cloud Functions cost optimization, right-size CPU and memory to measured usage, let services scale to zero and keep minimum instances as low as the experience allows, choose request-based billing for spiky services and instance-based for background work, and raise concurrency so fewer instances carry the load. When you want serverless spend profiled and tuned across the estate, that is what our Google Cloud cost optimization service delivers.

The Cloud Cost Brief

Cloud pricing moves. We tell you when it matters.

New commitment instruments, FOCUS changes, hyperscaler pricing shifts, and the plays that actually move a bill. No schedule, no filler.

Subscribe · Work email only