Business Context
ZendeskFlow, a B2B customer support platform, wants to use prompt-based AI to draft responses for repetitive inbound tickets such as password resets, billing clarification, and account access issues. Your task is to design an NLP workflow that generates reply drafts and automatically decides whether each draft is safe to send, needs human review, or should be rejected.
Data
- Volume: 180,000 historical support tickets and agent-written replies collected over 12 months
- Text length: customer messages range from 20-900 words; median 110 words
- Language: English only for the first release
- Label distribution: 62% safe-to-send, 28% human-review, 10% reject
- Input fields: ticket subject, ticket body, product tier, issue category, prior macros used, final QA disposition
Success Criteria
A good solution should achieve macro-F1 >= 0.84 on the 3-way approval decision and recall >= 0.95 on the reject class so inaccurate or risky drafts are rarely auto-approved. The system should also produce deterministic, auditable prompt outputs for repeated ticket patterns.
Constraints
- End-to-end latency must stay under 700 ms per ticket
- No customer PII may be stored in prompts or logs
- The model must run in a private VPC and support weekly prompt/version updates
- Human reviewers must be able to inspect why a draft was flagged
Requirements
- Build a prompt-based response generation pipeline for repetitive support tickets.
- Build a downstream NLP model that classifies generated drafts into safe_to_send, review, or reject.
- Define preprocessing for ticket text, prompt templates, and redaction of sensitive fields.
- Implement training and evaluation in Python using a realistic modern stack.
- Explain how you would enforce consistency, monitor drift, and audit prompt changes over time.