Project Background
Databricks wants to launch an internal-facing and customer-pilot support assistant built on the Databricks Agent Framework to help solutions architects and support engineers answer product questions using Databricks documentation, runbooks, and incident summaries. The prototype already uses Databricks Vector Search over Delta Lake-backed knowledge sources, DBRX through Databricks Foundation Model APIs, and MLflow Agent Evaluation for offline scoring, but it is not production-ready.
You are the AI engineer driving execution across a 9-person cross-functional team: 4 AI/data engineers, 1 platform engineer, 1 PM, 1 support operations lead, 1 security engineer, and 1 technical writer. Leadership wants a pilot live in 8 weeks because the field team is entering a major renewal cycle and wants faster answers on Apache Spark, Unity Catalog, Model Serving, and Mosaic AI questions.
Key Stakeholders
The VP of Support wants a broad pilot quickly to reduce ticket handling time. The Security lead insists that all retrieval sources be governed through Databricks Unity Catalog and that no sensitive incident data leak into prompts. The Engineering Manager wants to avoid overbuilding before usage is proven. The GTM enablement lead wants coverage across the full Databricks stack, even if answer quality is uneven at launch.
Constraints
- Timeline: 8 weeks to pilot launch, 12 weeks to executive review
- Budget: $120,000 for external annotation, evaluation, and limited contractor support
- Data: 2.4M documentation chunks, 180K support case summaries, 14K incident postmortems in Delta Lake
- Team capacity: no additional full-time headcount; 2 engineers are only 50% allocated
- Non-negotiables: Unity Catalog governance, human escalation path, rollback within 30 minutes
Complications
- The support case summaries contain inconsistent redaction quality, and Security has flagged 6% as potentially sensitive.
- The VP of Support is asking to include Spark Streaming and Delta Lake troubleshooting on day 1, but offline faithfulness and groundedness are only strong for Unity Catalog and Model Serving content.
- The Vector Search indexing pipeline currently refreshes once every 24 hours, while product docs change multiple times per day.
Your Task
- Build an 8-week execution plan with milestones, owners, and launch gates.
- Define the MVP scope and what you would defer despite stakeholder pressure.
- Propose how you will use MLflow Agent Evaluation, LLM-as-Judge, faithfulness, and groundedness to make launch decisions under ambiguity.
- Create a risk mitigation and rollback plan for pilot launch.
- Specify success metrics for the first 30 days and how you would iterate rapidly without losing control of scope.