Context
Databricks operates regulated customer data pipelines across AWS, Azure, and GCP. Today, ingestion and transformation jobs use broad cloud IAM roles and shared service principals, which creates audit risk, over-permissioned access to storage, and weak isolation between dev, staging, and prod.
You are asked to design a least-privilege access model for Databricks pipelines that run batch and streaming workloads using Delta Live Tables / Lakeflow Declarative Pipelines, Databricks Workflows, Unity Catalog, and cloud-native identities. The design must support secure cross-cloud ingestion, environment isolation, and auditable access to bronze, silver, and gold data products.
Scale Requirements
- Pipelines: 1,200 scheduled batch pipelines and 180 continuous streaming pipelines
- Data volume: 9 PB total in object storage, 45 TB/day new data
- Tenancy: 60 internal platform teams, 300+ service identities, 3 environments per cloud
- Latency: streaming SLA < 3 minutes end-to-end; batch completion by 6 AM local region
- Auditability: all access decisions traceable within 15 minutes
Requirements
- Design identity and access boundaries for Databricks Workspaces, Unity Catalog metastores, catalogs, schemas, tables, volumes, and external locations across AWS, Azure, and GCP.
- Enforce least privilege for pipeline execution identities so each pipeline can read only required sources and write only approved targets.
- Support secretless authentication where possible using AWS IAM roles, Azure Managed Identities, and GCP service accounts with Databricks credential passthrough equivalents or workload identity patterns.
- Define how Databricks Workflows and Lakeflow Declarative Pipelines obtain short-lived credentials for cloud storage and downstream systems.
- Include controls for schema evolution, data quality failures, and restricted promotion from bronze to silver/gold.
- Provide monitoring, alerting, and automated remediation for privilege drift, failed policy enforcement, and unauthorized access attempts.
Constraints
- Must use Databricks-native governance first: Unity Catalog, service principals, cluster policies, compute policies, and audit logs.
- No long-lived static cloud keys stored in notebooks or pipeline configs.
- Must satisfy SOC 2, GDPR, and customer-managed VPC/VNet deployment patterns.
- Incremental platform budget increase is capped at $40K/month.