Context
Northstar, a B2B SaaS analytics company, wants an internal AI feature that identifies new growth opportunities from unstructured customer signals. Product and growth teams want weekly recommendations such as underserved segments, unmet use cases, expansion triggers, and feature gaps grounded in evidence.
Constraints
- Weekly batch job over 2M text records; ad hoc analyst query path must return in under 4 seconds p95
- Monthly budget ceiling: $18,000
- Hallucination ceiling: fewer than 5% of surfaced opportunities may contain unsupported claims on a labeled evaluation set
- Every recommendation must include supporting evidence and confidence
- System must resist prompt injection from user-generated text (e.g. reviews saying "ignore prior instructions")
- PII from CRM notes cannot be exposed in outputs
Available Resources
- 12 months of Zendesk tickets, app store reviews, Gong call summaries, CRM notes, and win/loss interview transcripts
- Structured metadata: account size, industry, plan tier, region, ARR, churn risk, feature usage
- Approved models: GPT-4.1-mini / GPT-4.1, text-embedding-3-large
- Existing warehouse, vector store, and BI dashboard
- 20 PMs and sales leads available to label a 300-item golden set
Task
- Design an LLM-powered pipeline that clusters signals, retrieves evidence, and generates a ranked list of growth opportunities with citations.
- Write the core system prompt for grounded opportunity generation, including refusal behavior when evidence is weak or conflicting.
- Define an evaluation plan first: offline quality metrics, hallucination checks, prompt-injection tests, and online success metrics after launch.
- Estimate cost and latency for both the weekly batch workflow and the ad hoc analyst workflow, and explain how you would stay within budget.
- Identify key failure modes, safety risks, and tradeoffs, including when you would use a larger model versus a cheaper model.