Databricks is standardizing how internal platform engineering supports four user groups on a shared Lakehouse platform: data engineering, data science, ML, and application integration teams. Today, these teams use overlapping but inconsistent workflows across Databricks Workflows, Delta Live Tables, Unity Catalog, MLflow, Model Serving, and partner integrations, creating delivery delays and unclear ownership.
You are the DevOps Engineer responsible for driving a 12-week execution plan with a cross-functional team of 11 people: 4 platform engineers, 2 data engineers, 1 MLOps engineer, 1 security engineer, 1 SRE, 1 product manager, and you. The goal is not to redesign the platform from scratch, but to launch a practical operating model, baseline automation, and onboarding path that improves reliability and speed for all four groups before the next quarter starts.
The Engineering Director wants a fast rollout with minimal disruption. The Head of Data Platform wants stronger governance through Unity Catalog and fewer one-off exceptions. Data science leads want self-service access to MLflow and Model Serving without waiting on platform tickets. Application integration teams want stable APIs and SLAs for downstream consumption. Security is concerned that current cluster policies and secret management are inconsistent.