Business Context
Zendesk Support wants to measure average customer issue resolution time from raw chat transcripts across web and mobile support channels. The transcripts are noisy: system events, bot messages, repeated transfers, missing timestamps, and inconsistent closure language make the metric unreliable without an NLP-driven cleaning pipeline.
Data
- Volume: 8 million chat sessions collected over 18 months
- Text length: 5-250 utterances per session, median 32 turns
- Language: English (88%), Spanish (9%), other (3%)
- Structure: speaker tags, timestamps, queue events, agent handoffs, bot prompts, free-text messages
- Label availability: only 12% of chats have manually verified resolution timestamps for evaluation
A typical session may include duplicate reconnect events, automated disclaimers, greetings, idle timeouts, and post-resolution surveys. Some chats are resolved in-session, some escalate to email, and some end without confirmation.
Success Criteria
A good solution should produce a cleaned conversation timeline and an estimated resolution timestamp per chat with:
- ≥ 0.90 precision for identifying true resolution events
- ≤ 5 minutes median absolute error on the manually verified subset
- Stable average resolution-time reporting by queue, language, and channel
Constraints
- Must run in a secure environment; transcripts may contain PII
- Batch processing SLA: 8 million chats in under 6 hours
- Solution should be explainable enough for operations analysts to audit edge cases
Requirements
- Design a preprocessing pipeline to clean raw chat transcripts and normalize conversation structure.
- Detect and remove non-conversational noise such as bot boilerplate, system events, and duplicate reconnect messages.
- Identify the most likely resolution event using NLP and timestamp logic.
- Distinguish in-session resolution from unresolved or externally escalated chats.
- Describe how you would implement, evaluate, and monitor the pipeline in Python.