Context
Jasper offers enterprise customers workflow-specific content generation, such as producing outbound sales emails, ad copy variants, and brand-safe product descriptions from customer inputs and style guides. A large retail customer says the outputs are "inconsistent" and wants a measurable quality bar before expanding usage.
Constraints
- p95 end-to-end latency: 2,500ms per generation
- Cost ceiling: $0.03 per completed generation at 2M generations/month
- Hallucination / unsupported-claim rate: <2% on customer-approved eval sets
- Brand/style compliance: >95%
- Prompt injection success rate from user inputs or retrieved brand docs: <0.5%
- Human review capacity is limited to 500 samples/week, so evaluation must scale beyond manual QA
Available Resources
- Historical prompts, model outputs, user edits, thumbs-up/down signals, and regeneration events for this customer workflow
- Customer assets: style guide, prohibited phrases list, product catalog, approved claims, and 5,000 past human-approved outputs
- Access to one frontier model and one cheaper model for judge / generation experiments
- Ability to add structured output, retrieval over customer documents, and lightweight post-generation validators
Task
- Define an eval-first framework for determining whether Jasper is producing high-quality outputs for this specific customer workflow. Be explicit about offline and online evaluation, golden-set construction, rubrics, segmentation, and how you would calibrate any LLM-as-judge.
- Propose the minimum viable architecture and prompt changes needed after the evaluation plan, including whether you would use plain prompting, structured generation, retrieval over customer assets, or fine-tuning.
- Specify how you would measure and reduce hallucinations, brand violations, and prompt-injection risk from both user inputs and retrieved customer content.
- Estimate cost and latency tradeoffs for your approach, including what you would log, monitor, and alert on in production.
- Describe how you would decide whether the workflow is ready for rollout, partial rollout, or rollback for this customer.