Across our cost audits, one number keeps recurring: roughly 40% of the cloud data platform bill is waste. Not "waste" as in capability someone uses — waste as in warehouses idling at 3 a.m., dev environments cloned and forgotten, and queries scanning terabytes to return forty rows. We call it the 40% rule, and to be clear about what it is: a heuristic drawn from our engagements, not a guarantee. Some estates yield 25%, a few have yielded over 55%. But the pattern is consistent enough that when a CFO asks whether the data platform bill can come down without touching capability, our answer is almost always yes — and most of it is findable in week one.
Why the same money leaks from every estate
Cloud data platforms made compute frictionless, and friction was the old cost control. Nobody files a ticket to spin up a warehouse anymore; they click. The result is that costs grow with team behaviour, not with business value — and because the bill arrives as one undifferentiated number, nobody owns any specific part of it. The waste isn't exotic. It is the same six categories, every time:
- Oversized warehouses — sized for one heavy month-end job, then left running Large all month. Right-sizing and schedule-based scaling routinely recovers 10–15% of the bill on its own.
- Auto-suspend left at defaults — a warehouse idling on a 10-minute suspend after sub-second dashboard queries burns paid minutes for nothing. Tuning suspend windows per workload is an afternoon of work.
- Query hygiene —
SELECT *into BI tools, joins that fan out before they filter, scans that ignore clustering and partition pruning. The top 20 queries by cost are usually 5–10% of total spend and most are trivially fixable. - Storage tiering and time-travel retention — 90-day time travel on staging tables that are rebuilt nightly, and years of cold data on hot storage tiers.
- Dev/test sprawl — full production clones per developer, per branch, never torn down. We have found dev environments outspending production.
- No chargeback visibility — the meta-problem that lets the other five persist.
The first five checks we run
If you want to test the rule on your own estate before calling anyone, run these in order of effort-to-payoff:
- 1. Idle-to-active ratio per warehouse. Pull seven days of metering. Any warehouse billing more idle time than query time is your first target.
- 2. Auto-suspend settings vs workload pattern. Interactive BI warehouses should suspend in 60 seconds or less; defaults are usually 5–10x that.
- 3. Top 20 queries by cumulative cost. Not the slowest — the most expensive over the week. Repetitive scheduled queries dominate this list and fixing one fixes it forever.
- 4. Time-travel and retention settings on non-production schemas. Staging and scratch data rarely needs more than one day.
- 5. Environments with no queries in 30 days. Every estate has them. Suspend first, delete after a polite warning email.
Nobody overspends on purpose. They overspend because nobody can see who's spending.
Key takeaways
- ~40% savings is a recurring engagement pattern, not a promise — your number depends on how long the estate has grown unwatched.
- Warehouse right-sizing and auto-suspend tuning are the fastest wins; both are config changes, not projects.
- Attack the top 20 queries by cumulative cost — scheduled, repetitive queries pay back the fix every single day.
- Storage is the quiet leak: time-travel retention and hot-tier cold data compound monthly.
- Chargeback visibility is the only fix that keeps the other fixes fixed.
Visibility is the cut that lasts
Every technical fix above decays without ownership. The lasting change in our engagements is chargeback — or at minimum showback: cost per team, per warehouse, per pipeline, on a dashboard the teams themselves see weekly. Behaviour changes fast when an engineering lead watches their own line. One client's analytics team cut its spend 30% in the first month of showback with no mandate from above — they simply hadn't known a single scheduled report cost more than their tooling budget. Tag everything, attribute everything, and make the bill a metric teams manage like latency.
What "without cutting capability" means in practice
None of the changes above remove a dashboard, slow a pipeline beyond its SLA, or take a tool away from an analyst. That is the discipline of the 40% rule: capability is defined by what the business consumes, not by what the platform idles on. We typically structure this as a one-week audit producing a costed, prioritised backlog, then two to three weeks of implementation alongside your team as part of our platform services. The audit pays for itself before the implementation starts or we tell you upfront that your estate is already tight — it happens, and we'd rather say so. Send us last month's bill and we'll give you a first read.