You’re the analytics lead for NovaDesk, a B2B SaaS customer support platform serving 6,000 companies and ~1.8M end-customer conversations/day. NovaDesk recently launched an AI support agent that can autonomously resolve tickets (refunds, password resets, delivery status) and escalate to humans when needed. The AI agent is priced as an add-on: $0.65 per successful resolution or $49/seat/month for “agent-assisted” workflows.
After a strong launch, leadership is worried about retention: many customers try the agent for a week and then revert to human-only workflows. The Head of Product asks you to define “AI agent retention” in a way that is (a) meaningful for revenue, (b) resistant to gaming (e.g., forcing the agent to touch every ticket), and (c) actionable for product and ML teams.
In the last 8 weeks:
Stakeholders disagree on what “retained” means:
You have one week to propose a retention measurement framework to be used in the next QBR, and it must work for both self-serve SMB and enterprise customers with custom workflows.
| Source | What it captures | Grain |
|---|---|---|
agent_conversations | Each conversation handled by AI/human, timestamps, outcome | conversation_id |
agent_actions | Tool calls (refund API, order lookup), policy version, latency | action_id |
org_billing | Plan, add-on status, invoices, credits, churn dates | org_id, invoice |
routing_rules_audit | Routing configuration changes over time | org_id, change_event |
human_handoff | Escalations, reasons, time-to-human | handoff_event |
csat_surveys | End-user CSAT and comments | conversation_id |
Define “AI agent retention” with at least two complementary retention metrics:
Specify:
Provide a calculation approach including cohorting rules and edge cases:
Decompose retention into actionable drivers (product, routing, model quality, operations), and list the top hypotheses for the observed W4 drop.
Propose guardrails to ensure retention improvements don’t come from harmful behavior (e.g., higher deflection but worse CSAT, or cost blowups from tool calls).
Recommend 3–5 actions you’d take if your decomposition shows retention is primarily driven by (a) poor outcomes, (b) latency/reliability, or (c) misconfigured routing.