Project Context
Databricks' internal platform engineering team is replacing a manual quarterly access review process for Databricks workspaces, Unity Catalog data access, and production jobs. Today, 6 operations analysts spend several days each quarter exporting permissions, emailing approvers, tracking responses in spreadsheets, and filing audit evidence. You are the DevOps Engineer asked to lead execution of an automated workflow before the next SOX audit window in 10 weeks.
Key Stakeholders
Security wants stronger controls and immutable audit logs. Finance and Internal Audit need the process live before quarter close. Data platform engineering wants minimal disruption to production Databricks jobs and does not want to pause other roadmap work. Workspace owners want a simple approval experience and are resistant to another internal tool.
Constraints
- Deadline: 10 weeks until the next audit checkpoint
- Team: 4 engineers, 1 security analyst, 1 internal audit partner; no additional headcount
- Budget: $85,000 for contractor support and tooling only
- Scope: 120 Databricks workspaces, 1,800 users/groups, and 350 production jobs
- Requirement: reduce manual effort by at least 70% while preserving approval evidence for 7 years
- Dependency: identity data comes from Okta, and approver metadata is incomplete for ~15% of groups
Complications
- A recent incident exposed over-provisioned access in two production workspaces, increasing executive scrutiny.
- The Security team wants full automation in one release, while Internal Audit is willing to accept a phased rollout if evidence quality is strong.
- One senior engineer is allocated only 50% because they also own Databricks job reliability for a revenue-critical pipeline.
Deliverables
- Create a 10-week execution plan with milestones, owners, and dependency management.
- Define the MVP scope versus deferred scope, including clear trade-offs.
- Propose a rollout and rollback plan that protects production Databricks workflows.
- Define success metrics for operational efficiency, audit readiness, and adoption.
- Identify the top risks and mitigation actions, including how you would handle missing approver metadata.