Context
Intuit wants a GenAI-powered analyst assistant that scans user feedback and product signals across TurboTax, Credit Karma, and QuickBooks to uncover new growth opportunities, such as unmet jobs-to-be-done, onboarding friction, or cross-sell moments. The output will be used by Product Growth Analysts, so recommendations must be evidence-backed and auditable.
Constraints
- Daily batch generation for leadership review must finish within 45 minutes over 2M text records/day
- Interactive drill-down for an analyst must return in <4 seconds p95
- Cost ceiling: $25K/month at projected usage of 8K analyst queries/month plus daily batch jobs
- Hallucination ceiling: <4% unsupported claims on a labeled golden set
- Every recommendation must cite supporting evidence from approved sources only
- System must resist prompt injection from user-generated text and must not expose PII from customer feedback
Available Data / Models
- 12 months of anonymized support chats, app reviews, NPS verbatims, community posts, and call transcripts from TurboTax, Credit Karma, and QuickBooks
- Structured product telemetry: funnel steps, drop-off events, plan type, tenure, acquisition channel, and feature adoption
- Existing warehouse tables with user segment metadata and experiment history
- Approved LLMs from OpenAI or Anthropic, plus an internal vector store and BM25 search
- A small team of Growth Analysts available to label a 300-example golden set
Deliverables
- Design an eval-first LLM system that identifies and ranks new growth opportunities, including how retrieval, prompting, and structured output work.
- Write the core system prompt that forces grounded recommendations with citations, confidence, and refusal behavior when evidence is weak.
- Define offline and online evaluation, including how you measure opportunity quality, hallucination, prompt-injection robustness, and analyst usefulness.
- Estimate cost and latency for both daily batch processing and interactive analyst queries, and explain key tradeoffs.
- List major failure modes and mitigations, especially around unsupported insights, stale evidence, segment bias, and PII leakage.