Skip to content
Research Lab
GCPAnalytics on regulated data10 min read

A confidential data lake on GCP, audit ready

Build a BigQuery centred data lake that handles regulated data with classification, customer managed keys, VPC Service Controls, and access approval baked in from the start.

The starting point

Most data platforms grow before they are governed. The first hard question is usually who can read what, and that question gets harder once a dozen teams have copied tables sideways.

GCP gives you tight primitives to fix this early. Customer managed keys in Cloud KMS, column and row policies in BigQuery, VPC Service Controls around the perimeter, and Workload Identity Federation for non human access.

Pipeline

Sources land in Cloud Storage. A DLP scan runs on landing and tags objects with sensitivity. Dataflow normalises and writes into BigQuery raw, then Dataform shapes curated tables. Every step runs inside a VPC Service Controls perimeter.

Encryption uses CMEK at rest in every service. The same key ring rotates quarterly, with HSM backed keys for the highest sensitivity classes.

Access model

Access is granted to groups, never individuals. Column policies attach to taxonomy tags, so a single decision (mark column as PII) propagates to every consumer.

  • workload identity federation for github actions and external services
  • access approval required for support engineer reads
  • iam conditions to scope role bindings to projects and time windows
  • audit logs streamed to bigquery and pinned to scc
  • break glass procedure logged and reviewed weekly

Cost and operations

BigQuery slots are reserved for predictable workloads. Ad hoc and exploratory queries land on the on demand pool with a per user cap. Storage cost is controlled by table partitioning and lifecycle rules on raw zones.

References

Official documentation and standards we draw on for this pattern.

Links open in a new tab

Takeaway

Governance is cheap when you set it up before the data lands. It is a programme of work once the data is already everywhere.

More from the lab

Related research.

Get started

Tell us where it hurts. We will tell you what good looks like.

A 30 minute call with a senior practitioner. No sales motion. Clear next step.