Zero-downtime warehouse migration: our 16-week playbook

Last year we moved a 40TB warehouse with 300 pipelines for a manufacturing client, and not a single dashboard went dark. No weekend freeze, no "the numbers look different" emails on Monday. That outcome was not luck — it was the same 16-week playbook we have now run across multiple migrations, and the core of it fits in one sentence: never cut anyone over to a system you haven't proven against the old one, row by row, for weeks. Here is the playbook in full.

Weeks 1–2: inventory and dependency mapping

You cannot migrate what you haven't counted. We start by harvesting query logs, scheduler configs and BI metadata to build a complete inventory: every table, every pipeline, every downstream consumer — dashboards, extracts, APIs, that one finance spreadsheet refreshing via ODBC since 2019. On the manufacturing migration, the client's official catalogue listed 220 pipelines; the logs found 300. The inventory becomes a dependency graph that dictates migration order and, crucially, surfaces the 20–30% of objects that nothing has touched in six months. Those get decommissioned on paper now, not migrated.

The other non-negotiable in week one: rollback criteria, defined upfront and signed by the steering group. If reconciliation pass rates drop below threshold, or a severity-one incident lands during a cutover wave, we roll back — automatically, without a meeting. Deciding this when nobody is under pressure is what makes the decision usable when everybody is.

Weeks 3–6: pipeline conversion with dbt

We convert pipelines into dbt against the target platform, in dependency order from the graph. dbt is the workhorse here for three reasons: conversions become reviewable pull requests instead of hand-ported scripts, tests travel with the models, and lineage is generated rather than documented. Stored-procedure logic gets rewritten as models; orchestration moves to the target scheduler. We deliberately resist "improving" business logic mid-flight — the goal of this phase is a provably equivalent system, because every intentional change you mix in becomes noise in the reconciliation that follows. Improvements go in a backlog for week 17 onwards.

Weeks 7–12: dual-run with reconciliation gates

This is the secret, and it is why the playbook spends six of its sixteen weeks here. Both warehouses run in parallel on the same source feeds, and every table must clear three automated gates daily:

Row counts — exact match per table, per load window. Cheap, fast, catches gross failures.
Aggregate checksums — sums, distinct counts and min/max on key business columns, which catch the subtle stuff: type coercion, timezone drift, NULL-handling differences between SQL dialects.
Sampled row-level diffs — a deterministic sample of full rows compared field by field, catching what aggregates average away.

Results land on a reconciliation dashboard visible to the client, not just to us. A table is cutover-eligible only after 14 consecutive green days. On the 40TB migration, the gates caught 41 discrepancies in week eight — including a legacy rounding behaviour finance had unknowingly depended on for years. Every one was found by a machine before a human ever saw a wrong number.

Trust isn't migrated. It's reconciled, daily, in front of the people who own the numbers.

Key takeaways

Inventory from query logs, not from documentation — expect to find 20–30% more pipelines than anyone admits to.
Define rollback criteria in week one and make them automatic; mid-incident debates are how downtime happens.
Convert with dbt and keep logic changes out of the migration — equivalence first, improvements after.
Dual-run with daily reconciliation gates — row counts, aggregate checksums, sampled diffs — is the entire secret.
Cut consumers over in waves, rehearse every wave, and never decommission until the last wave has soaked.

Weeks 13–15: cutover in waves, rehearsed

Consumers move in waves ordered by blast radius: internal analysts first, departmental dashboards second, executive reporting and external feeds last. Each wave gets a rehearsal — a full dry run against a checklist with named owners and timed steps — before the real repointing, which is a connection-string change, not a data move, because the data has already been proven green for weeks. Each wave soaks for two to three days before the next begins. On the manufacturing migration all three waves completed inside ten days, and the rollback plan was never invoked. It existed, rehearsed, for every wave anyway.

Week 16: decommission — the step everyone skips

The old warehouse gets read-only mode for a final week, then export of audit-retention data, then shutdown. Skipping this step is how enterprises end up paying for two platforms indefinitely and how "shadow" reports keep pulling from a stale system months later. The migration is finished when the old bill is zero — a theme we return to in our FinOps work.

If a warehouse migration is on your roadmap, our platform practice runs this playbook end to end, fixed scope and fixed timeline. Talk to us about what you're moving and we'll tell you honestly whether 16 weeks is realistic for your estate.

Zero-downtime warehouse migration: our 16-week playbook

Weeks 1–2: inventory and dependency mapping

Weeks 3–6: pipeline conversion with dbt

Weeks 7–12: dual-run with reconciliation gates

Key takeaways

Weeks 13–15: cutover in waves, rehearsed

Week 16: decommission — the step everyone skips

Related reading

Snowflake vs Databricks in 2026: how to actually choose

The 40% rule: cutting cloud data costs

Planning a migration?