Automate Support Reply Draft Approval

Business Context

ZendeskFlow, a B2B customer support platform, wants to use prompt-based AI to draft responses for repetitive inbound tickets such as password resets, billing clarification, and account access issues. Your task is to design an NLP workflow that generates reply drafts and automatically decides whether each draft is safe to send, needs human review, or should be rejected.

Data

Volume: 180,000 historical support tickets and agent-written replies collected over 12 months
Text length: customer messages range from 20-900 words; median 110 words
Language: English only for the first release
Label distribution: 62% safe-to-send, 28% human-review, 10% reject
Input fields: ticket subject, ticket body, product tier, issue category, prior macros used, final QA disposition

Success Criteria

A good solution should achieve macro-F1 >= 0.84 on the 3-way approval decision and recall >= 0.95 on the reject class so inaccurate or risky drafts are rarely auto-approved. The system should also produce deterministic, auditable prompt outputs for repeated ticket patterns.

Constraints

End-to-end latency must stay under 700 ms per ticket
No customer PII may be stored in prompts or logs
The model must run in a private VPC and support weekly prompt/version updates
Human reviewers must be able to inspect why a draft was flagged

Requirements

Build a prompt-based response generation pipeline for repetitive support tickets.
Build a downstream NLP model that classifies generated drafts into safe_to_send, review, or reject.
Define preprocessing for ticket text, prompt templates, and redaction of sensitive fields.
Implement training and evaluation in Python using a realistic modern stack.
Explain how you would enforce consistency, monitor drift, and audit prompt changes over time.

Business Context

Data

Volume: 180,000 historical support tickets and agent-written replies collected over 12 months

Text length: customer messages range from 20-900 words; median 110 words

Language: English only for the first release

Label distribution: 62% safe-to-send, 28% human-review, 10% reject

Input fields: ticket subject, ticket body, product tier, issue category, prior macros used, final QA disposition

Requirements

Build a prompt-based response generation pipeline for repetitive support tickets.

Build a downstream NLP model that classifies generated drafts into safe_to_send, review, or reject.

Define preprocessing for ticket text, prompt templates, and redaction of sensitive fields.

Implement training and evaluation in Python using a realistic modern stack.

Explain how you would enforce consistency, monitor drift, and audit prompt changes over time.

Business Context

Data

Volume: 180,000 historical support tickets and agent-written replies collected over 12 months

Text length: customer messages range from 20-900 words; median 110 words

Language: English only for the first release

Label distribution: 62% safe-to-send, 28% human-review, 10% reject

Input fields: ticket subject, ticket body, product tier, issue category, prior macros used, final QA disposition

Requirements

Build a prompt-based response generation pipeline for repetitive support tickets.

Build a downstream NLP model that classifies generated drafts into safe_to_send, review, or reject.

Define preprocessing for ticket text, prompt templates, and redaction of sensitive fields.

Implement training and evaluation in Python using a realistic modern stack.

Explain how you would enforce consistency, monitor drift, and audit prompt changes over time.

Business Context

Data

Volume: 180,000 historical support tickets and agent-written replies collected over 12 months

Text length: customer messages range from 20-900 words; median 110 words

Language: English only for the first release

Label distribution: 62% safe-to-send, 28% human-review, 10% reject

Input fields: ticket subject, ticket body, product tier, issue category, prior macros used, final QA disposition

Requirements

Build a prompt-based response generation pipeline for repetitive support tickets.

Build a downstream NLP model that classifies generated drafts into safe_to_send, review, or reject.

Define preprocessing for ticket text, prompt templates, and redaction of sensitive fields.

Implement training and evaluation in Python using a realistic modern stack.

Explain how you would enforce consistency, monitor drift, and audit prompt changes over time.

Interview Guides

Business Context

Data

Success Criteria

Constraints

Requirements

Automate Support Reply Draft Approval

Business Context

Data

Success Criteria

Constraints

Requirements

Your Answer

Automate Support Reply Draft Approval

Business Context

Data

Success Criteria

Constraints

Requirements

Automate Support Reply Draft Approval

Business Context

Data

Success Criteria

Constraints

Requirements

Your Answer