Business Context
ShopSphere is a global e-commerce marketplace with 38M monthly active buyers and a customer support org handling ~2.5M tickets/week across email, chat transcripts, and agent notes. The company has accumulated ~6.5TB of unstructured text over 4 years (tickets, chat logs, call-center summaries, refund disputes). Today, routing and resolution are largely manual: tickets are assigned by a rules engine and human triagers, leading to high first-response times (P90 = 9 hours) and inconsistent escalation for high-risk issues (fraud, chargebacks, safety complaints). Leadership wants to use the historical text corpus to improve customer service efficiency: faster routing, better self-serve deflection, and higher agent productivity.
Data Characteristics
You have access to:
- Raw sources: Zendesk tickets (subject + body), chat transcripts, agent internal notes, disposition codes, resolution outcomes, CSAT, refund amounts, and handle time.
- Volume: ~4.2B messages/utterances; ~650M “tickets” after conversation grouping.
- Text length: chat turns 1–40 tokens; emails 30–2,000 tokens (median ~180).
- Languages: English 78%, Spanish 12%, Portuguese 6%, other 4%.
- Domain vocabulary: SKU IDs, order IDs, carrier names, payment processors, policy terms (“A-to-Z claim”, “returnless refund”), and abbreviations used by agents.
- Labels: Only ~8% of tickets have reliable structured labels (issue type) due to taxonomy drift; however, you have outcomes (refund issued, escalation, time-to-resolution) for nearly all tickets.
Success Criteria (Business)
A solution is “good enough” if it:
- Reduces median time-to-first-response by 25% via better routing and prioritization.
- Increases self-serve deflection by 10% for low-risk, repetitive issues (e.g., tracking, address changes).
- Maintains or improves CSAT while preventing overload: high-risk queues must not exceed current staffing capacity.
Constraints
- Latency: routing decisions must be produced in <150 ms per incoming ticket.
- Privacy/Compliance: PII (names, emails, phone numbers, addresses, payment tokens) must be redacted; data cannot leave the VPC.
- Compute: training can use a multi-GPU cluster, but inference is limited to 2× A10 GPUs per region.
- Taxonomy drift: issue categories change quarterly; the system must be maintainable.
Requirements (Deliverables)
- Propose an end-to-end NLP approach that turns the historical corpus into (a) an automated ticket triage classifier (issue type + priority) and (b) a suggested next-best action (macro response template / KB article / escalation).
- Describe how you would create training data given sparse/noisy labels (e.g., weak supervision, distant labels from outcomes, active learning).
- Implement a baseline and an advanced model:
- Baseline: TF-IDF + linear classifier for issue type.
- Advanced: transformer fine-tuning for multi-task prediction (issue type + priority), with multilingual handling.
- Add a PII redaction + entity extraction layer (order IDs, SKUs, carrier, refund amount) to support analytics and templated responses.
- Define an evaluation plan tied to business metrics and include an error analysis strategy focused on high-cost mistakes (fraud/safety false negatives).
- Outline a deployment/monitoring plan for drift (new issue types, seasonal spikes, policy changes).