India's Digital Personal Data Protection Act is, underneath the legal language, a set of engineering requirements. Consent must bind data to a purpose. Data principals can demand erasure and correction. Retention beyond purpose needs justification. Breaches must be detected and reported. Every one of those obligations lands on the data platform — and whether compliance costs you a quarterly scramble or nothing at all depends almost entirely on architecture decisions you can make now. To be clear: this is an engineering perspective on what the Act means for platform design, not legal advice — work with counsel on the obligations themselves.
We've spent the past year helping Indian enterprises get their lakehouses and warehouses DPDP-ready, and the same six checklist areas come up every time. Here they are, in the order we tackle them.
1. Know your personal data — all of it
You cannot protect, purpose-bind or erase data you can't find. Most estates we audit have personal data in three to five times more places than the data owners believe: staging tables, abandoned sandbox copies, BI extracts, log files with request payloads. Manual surveys always undercount, so the first build is automated PII discovery — scanners that profile every column across the estate, classify fields (name, phone, Aadhaar-class identifiers, financial data), and write the results into the catalog as machine-readable tags. Run it continuously, not once: new pipelines create new copies every week.
2. Bind purpose to data, and enforce it at access time
DPDP's consent model ties processing to a stated purpose. Architecturally, that means purpose becomes metadata: every dataset carrying personal data gets purpose tags in the catalog, and — this is the part most programmes skip — those tags get enforced at query and access time through the platform's policy engine. A marketing analyst querying data collected for service delivery should hit a policy denial, not a clean result set. If purpose lives only in a spreadsheet, it's documentation. In the policy engine, it's compliance.
3. Rights, retention and the lineage that makes them possible
Data principals can demand erasure and correction — and "we deleted it from the CRM" is not erasure if the same record survives in the warehouse, four downstream marts and a BI extract. The only scalable answer is column-level lineage: when an erasure request arrives, lineage tells you every table, copy and derived dataset the record reached, and the deletion job walks that graph. Without lineage, every request is a manual archaeology project measured in days; with it, it's a parameterised job measured in minutes.
Retention is the same discipline applied on a schedule. We implement it as policy-as-code:
- Retention rules versioned in git — per dataset class, with owners and review dates, deployed like any other code.
- Automated expiry jobs — data past its retention window is deleted or anonymised by the platform, not by someone remembering.
- Time-travel windows tuned down — lakehouse formats keep deleted data recoverable by default; if your vacuum window is 90 days, your "erased" data isn't erased for 90 days. Tune it deliberately.
- Deletion certificates — every expiry run logs what was removed and under which rule, which is exactly what an audit asks for.
If finding every copy of a customer's data takes a war room, you don't have a platform — you have a liability.
4. Breach readiness and the processor chain
The Act expects breaches to be detected, reported and handled — which presupposes you can see access in the first place. The platform-side checklist: access logging on every dataset carrying personal data, retained and queryable; anomaly alerts on unusual patterns (a service account bulk-reading a PII table at 3 a.m. should page someone); and incident runbooks that are rehearsed, with the catalog's classification tags pre-wired into them so you can state within hours — not weeks — whose data and which fields were touched.
Finally, look beyond your own estate. Every SaaS tool, analytics vendor and offshore processor touching personal data sits in your compliance chain: contracts need to reflect DPDP obligations, and cross-border flows need to be mapped and reviewed against the government's notified restrictions as they evolve. The platform contribution here is an accurate, current map of which external systems receive which data — which, again, is lineage.
Key takeaways
- DPDP obligations are platform requirements in disguise — architecture decides whether compliance is cheap or a recurring scramble.
- Automated, continuous PII discovery comes first: you can't purpose-bind or erase data you haven't found.
- Purpose tags only count when enforced at query time by the policy engine — a spreadsheet of purposes is documentation, not control.
- Erasure and correction at scale require column-level lineage; retention should be policy-as-code with automated expiry and tuned time-travel windows.
- Breach readiness means access logs, anomaly alerts and rehearsed runbooks wired to your classification tags — plus a current map of processor and cross-border flows.
Make compliance a property of the architecture
The common thread across all six areas is that none of them are bolt-ons. A platform with a catalog, column-level lineage and policy enforcement built into its core gets DPDP readiness largely as a side effect — the same machinery also powers data quality, cost attribution and self-service access. A platform without them turns every regulatory obligation into a manual project. That's why our data platform and governance engagements treat catalog, lineage and policy as foundation work, not phase two. If you're planning a DPDP readiness programme — or building a new platform and want compliance designed in from day one — talk to our governance practice and we'll walk you through the checklist against your own estate.