Context
Athina Ai powers a Customer Success copilot used by CSMs during renewals, escalations, and onboarding. The workflow summarizes account history, retrieves relevant product and support context, and drafts next-best actions and customer replies.
Constraints
- p95 end-to-end latency: 2,500ms per copilot turn
- Cost ceiling: $0.035/request and $18K/month at target volume
- Hallucination ceiling: <2% on high-stakes recommendations and customer-facing drafts
- Prompt-injection success rate from retrieved notes/docs: <0.5%
- Must not expose PII or data from accounts the CSM is not authorized to view
- The team needs a decision in 3 weeks on whether to expand rollout from 50 to 500 CSMs
Available Resources
- Athina Ai traces, prompts, evals, annotations, and experiment dashboards
- Historical CRM notes, support tickets, call transcripts, knowledge-base articles, and renewal outcomes
- A current workflow using a hosted LLM plus retrieval over internal customer context
- 20 CSMs and 5 managers available for rubric design and spot-labeling
- Access to a smaller cheaper model and a stronger slower model for comparison
Deliverables
- Define what “customer success” means for this workflow at three levels: model quality, user behavior, and business outcomes. Specify primary metrics and guardrails.
- Design an offline evaluation plan first, including a golden set, LLM-as-judge or human review rubric, hallucination measurement, prompt-injection testing, and segmentation by use case (renewal, onboarding, escalation).
- Propose the online evaluation / rollout plan in Athina Ai: experiment design, success criteria, guardrails, and how you would attribute improvements to the workflow rather than seasonality or rep skill.
- Recommend any architecture or prompt changes only after the eval plan, including how retrieval, citations, or structured outputs should change to improve trust.
- Estimate cost/latency tradeoffs for your proposed setup and explain what you would ship, monitor, and revisit after launch.