AWS Lambda Cost Optimization: Memory, Duration

AWS Lambda cost optimization is unusual because the meter rewards counterintuitive moves: giving a function more memory can make it cheaper, not more expensive. Lambda bills for the number of requests plus compute measured in gigabyte-seconds, which is the memory you allocate multiplied by how long the function runs. Tune that formula deliberately and serverless stays cheap as it scales; leave it on defaults and it quietly becomes a meaningful line on the bill.

This article links up to our complete guide to AWS cost optimization, the pillar for this cluster. Lambda tuning is part of the Cut step of our See, Cut, Lock, Run method. And because a Compute Savings Plan covers Lambda, it connects directly to the commitment strategy in Compute vs EC2 Instance Savings Plans.

How Lambda billing works

Cost = requests + (allocated memory × execution duration). Lambda allocates CPU in proportion to memory, so more memory means a faster CPU. The optimization is finding the memory setting where the speed-up pays for the extra memory.

Lever 1: Tune memory, because it sets CPU too

Memory is the single most important Lambda setting, because allocating more memory also allocates more CPU. A function under-provisioned on memory runs on a slow CPU, takes longer, and can cost more in gigabyte-seconds than the same function with more memory that finishes faster. The relationship is not linear, so the only reliable way to find the sweet spot is to measure. The open-source AWS Lambda Power Tuning tool runs a function across memory settings and charts cost against speed, showing you the configuration that minimizes cost or balances cost and latency. Run it on your highest-volume functions first, where the saving is largest.

Lever 2: Cut duration

Since you pay for every millisecond, shortening execution directly cuts cost. The usual wins: reuse connections and SDK clients across invocations by initializing them outside the handler, cache configuration and secrets rather than fetching them on every call, trim oversized dependencies and deployment packages that slow cold starts, and move slow synchronous calls to asynchronous patterns where the function does not have to wait. Right-sizing memory per lever one often cuts duration as a side effect, because the faster CPU finishes sooner.

Lever 3: Manage concurrency and cold starts

Concurrency controls how many function instances run at once, and it carries cost implications in both directions. Provisioned concurrency keeps a set number of instances warm to eliminate cold-start latency, but you pay for that warm capacity whether or not it is used, so apply it only to latency-sensitive functions with predictable traffic, and schedule it to match the daily curve rather than running it flat around the clock. For most functions, the cheaper path is to reduce cold-start duration through smaller packages and faster initialization rather than paying to avoid cold starts entirely. Reserved concurrency, by contrast, caps a function's scale to protect downstream systems and your budget from a runaway invocation storm.

Want your serverless spend tuned?

Our AWS cost audit power-tunes your highest-volume Lambda functions, models Graviton and Savings Plan coverage, and quantifies the saving before you change a line of code. On the performance model, you pay only from realized savings. No savings, no fee.

Book an AWS cost audit →

Lever 4: Move to Graviton

Lambda functions can run on AWS Graviton, the Arm-based architecture, which is priced lower per gigabyte-second than x86 and frequently runs faster as well. For most functions the switch is a single configuration change plus a test pass, as long as your dependencies support Arm. The combination of a lower rate and shorter duration makes Graviton one of the highest-return, lowest-effort Lambda moves available, much as it is for EC2 with Graviton.

Lever 5: Cover steady Lambda spend with a Savings Plan

Lambda is included in AWS Compute Savings Plans, so a baseline of predictable serverless spend can be committed for a discount just like EC2 and Fargate. If your Lambda bill has a steady floor, fold it into the same Compute Savings Plan ladder that covers the rest of your compute, following the coverage and laddering discipline in Savings Plans vs Reserved Instances. As always, tune the functions first so the commitment lands on optimized usage, not waste.

Lever	What it changes	Effort
Memory tuning	Finds the cheapest memory and CPU point	Low, use Power Tuning
Duration cuts	Fewer billed milliseconds per call	Medium, code changes
Concurrency control	Avoids paying for idle warm capacity	Low to medium
Graviton	Lower rate and often faster	Low, config plus test
Compute Savings Plan	Discounts steady Lambda spend	Low, purchase

Lambda billing model, tooling and Graviton availability above reflect AWS offerings as of May 2026. Verify current pricing and architecture support in the Lambda console before changing functions, as features and rates change.

Go deeper · free field guide

The AWS Cost Optimization Field Guide includes the Lambda tuning checklist and the gigabyte-second model we use to prioritize functions by spend. It is the downloadable companion to this guide.

The short version

Power-tune memory because it sets CPU and often lowers cost, cut duration by reusing connections and trimming packages, apply provisioned concurrency only where latency demands it, move functions to Graviton, and fold steady Lambda spend into a Compute Savings Plan. Tune first, then commit. When you want serverless spend optimized across the estate, that is what our AWS cost optimization service delivers.

AWS Lambda Cost Optimization: Memory, Duration, and Concurrency

Lever 1: Tune memory, because it sets CPU too

Lever 2: Cut duration

Lever 3: Manage concurrency and cold starts

Want your serverless spend tuned?

Lever 4: Move to Graviton

Lever 5: Cover steady Lambda spend with a Savings Plan

The short version

Cloud pricing moves. We tell you when it matters.