Home/Library/How to Optimize Data Warehouse Costs
How to · Storage & Data · Updated May 2026

How to Optimize Data Warehouse Costs

A cloud data warehouse can be the fastest-growing line in the bill, because its cost scales with how much data teams scan and how long the compute runs, and both tend to climb quietly. Optimizing data warehouse costs means getting control of three things at once: the compute you pay for, the storage you hold, and above all the volume of data each query touches.

Data warehouse costs come from three sources, and optimizing them means working all three. Compute is the engine that runs queries, billed either per query by data scanned or by the time a warehouse or cluster stays running. Storage is the data held in the warehouse, usually a smaller line but one that grows without limit if nothing expires it. And the data scanned per query is the multiplier that turns a single badly written query into a large bill on scan-priced platforms. Whether you run BigQuery, Amazon Redshift, Snowflake or Azure Synapse, the same levers apply: shrink what each query scans, right-size and schedule the compute, and keep the stored data lean. The biggest wins almost always come from the scan side, because that is where a small change in query or table design produces a large change in cost.

This article is part of our complete guide to cloud storage and data cost optimization, the cluster pillar it links up to. It pairs with the platform-level view in BigQuery vs Redshift vs Synapse: cost compared, which weighs the pricing models against each other.

The core idea

Warehouse cost is compute plus storage plus data scanned. On scan-priced platforms the scan term dominates, so partitioning, clustering and column pruning are the highest-leverage changes you can make.

Understand which pricing model you are on

The right optimization depends entirely on how your warehouse charges. Scan-based pricing, such as BigQuery on-demand, bills by the bytes a query reads, so the goal is to make each query read less. Capacity or time-based pricing, such as Redshift clusters, Snowflake virtual warehouses, Synapse dedicated pools, and BigQuery editions with slots, bills for compute provisioned or running, so the goal is to keep the compute right-sized and idle time to a minimum. Many platforms now offer both models, and choosing the cheaper one for your workload shape is itself a major lever: steady heavy usage usually favors a committed capacity model, while spiky or exploratory usage often favors paying per query. Verify the current pricing models and rates for your platform in its documentation before restructuring, as of May 2026, since these change frequently.

Cut the data each query scans

On scan-priced warehouses this is where most of the money is, and the techniques are well established. Partition tables by a column that queries filter on, usually a date, so a query for last week reads one week of data rather than the whole table. Cluster or sort within partitions so the engine can skip blocks that cannot match. Select only the columns you need rather than using a wildcard, because columnar warehouses bill by the columns scanned and a select-star query reads everything. Materialize common aggregations so dashboards query a small summary table instead of re-scanning raw events every refresh. And avoid re-scanning the same data repeatedly in pipelines, the subject of reducing ETL and data pipeline costs. These changes routinely cut scan volume by a large factor, and the bill falls with it.

LeverCutsApplies to
PartitioningData scanned per queryAll major warehouses
Clustering / sort keysBlocks scanned within partitionsBigQuery, Redshift, Snowflake
Column pruningColumns read by select-starAll columnar warehouses
Materialized viewsRepeated aggregation scansAll major warehouses
Auto-suspendIdle compute timeSnowflake, capacity models
Storage lifecycle / expiryStored data that has aged outAll

Want your warehouse spend brought under control?

Our cloud cost audit profiles your costliest queries, restructures the tables and compute behind them, and proves the saving against a clean baseline on AWS, Azure, GCP and OCI. On the performance model, you pay only from realized savings. No savings, no fee.

Book a cloud cost audit →

Right-size and schedule the compute

On capacity-priced warehouses the cost is the compute, so the levers are the same ones that work everywhere else: size it correctly and do not pay for it when it is idle. Right-size clusters and virtual warehouses to the actual workload rather than provisioning for the largest query, the warehouse equivalent of rightsizing compute. Use auto-suspend so a warehouse stops when no queries are running and resumes on demand, which on per-second-billed platforms eliminates idle compute cost entirely. Separate workloads onto appropriately sized compute so a heavy batch job does not force you to run a large warehouse for light interactive queries. And schedule non-urgent batch work for off-peak windows where the platform supports cheaper scheduled capacity. Idle warehouse compute is the same pure waste covered in the economics of idle.

Commit once the baseline is clean

Most warehouse platforms offer committed-use discounts: committed slots, reserved capacity, or annual commitments that cut the rate substantially in exchange for a usage promise. The sequencing rule is the same as everywhere in our method: optimize first, commit second. Restructure the heavy queries, right-size the compute and clear the idle time to establish a clean, lower baseline, then commit to that baseline rather than to the inflated pre-optimization figure. Committing first locks in a one-to-three-year promise to pay for the waste you were about to remove. Once the steady-state usage is genuinely steady and minimized, a capacity commitment against it is one of the largest single discounts available on warehouse spend.

Go deeper · free playbook

The Cloud Storage and Egress Cost Playbook includes the warehouse query-cost audit template and the partitioning checklist we use to cut scan volume before touching compute commitments.

Keep the stored data lean

Storage is usually the smaller part of a warehouse bill, but it grows without bound if nothing manages it, and large tables also make queries more expensive to scan. Expire data that has aged past its useful window, drop staging and intermediate tables that pipelines leave behind, and move cold historical data to cheaper external storage queried on demand rather than holding it all in the warehouse's premium storage. Many platforms let you query data in object storage directly, so genuinely cold history can live in an archive tier and still be reachable. This is the same retention discipline as data retention policies that save money, applied to the warehouse. Leaner tables cost less to store and less to scan, so the saving compounds.

The short version

Optimize data warehouse costs by first knowing whether you pay per scan or per capacity, then cutting the data each query scans through partitioning, clustering, column pruning and materialized views, right-sizing and auto-suspending the compute, committing only once the baseline is clean, and expiring stored data that has aged out. On scan-priced platforms the query restructuring is the largest lever; on capacity-priced ones it is the compute. Verify your platform's current pricing, since it changes. When you want the costliest queries found and the warehouse spend proven down across the estate, that is part of what our rightsizing and waste elimination service delivers.

The Cloud Cost Brief

Cloud pricing moves. We tell you when it matters.

New commitment instruments, FOCUS changes, hyperscaler pricing shifts, and the plays that actually move a bill. No schedule, no filler.

Subscribe · Work email only