Business Context
You’re on the ML Platform team at HelpDeskNow, a customer support SaaS used by 3,200 enterprise clients (banking, travel, and e-commerce). The platform processes ~12M tickets/month across email and chat. Two high-impact product initiatives are being launched this quarter:
- Ticket Triage: Automatically route each incoming ticket to one of 38 queues (e.g., Billing Disputes, Login Issues, API Outage) and predict priority (P0–P3). Misrouting increases time-to-resolution and can breach SLAs.
- Agent Assist: Generate a draft response that agents can edit. This must be safe (no hallucinated refunds, no policy violations) and fast enough for interactive use.
Your interviewer wants to test whether you can explain and operationalize the difference between encoder-only and decoder-only Transformer architectures in a real system—beyond definitions.
Dataset
You have two labeled datasets and one unlabeled corpus:
| Dataset | Size | Target | Inputs | Notes |
|---|
| Triage labels | 8.5M tickets | 38-way queue + P0–P3 | subject + body + last 3 messages | Class imbalance: top 5 queues = 62% of volume; rare queues <0.3% each |
| Agent reply pairs | 2.1M (ticket, reply) | next-token generation | ticket thread → agent reply | Replies average 110 tokens; some contain PII |
| Unlabeled corpus | 90M messages | none | raw threads | Contains multilingual (12% ES/PT/FR) and policy-sensitive content |
Additional characteristics:
- PII appears in ~18% of tickets (names, emails, order IDs). You must avoid training leakage and unsafe generations.
- Missingness: ~9% of tickets have empty subject; ~6% have truncated bodies due to upstream limits.
- Latency requirements: triage must run at p95 < 80ms per ticket (batch + near-real-time). Agent assist must stream first tokens in < 300ms.
Success Criteria
- Triage: Macro-F1 ≥ 0.72 on queues; Recall ≥ 0.85 for P0/P1 tickets; calibration ECE ≤ 0.03.
- Agent Assist: Human eval: ≥ 4.2/5 helpfulness, ≤ 1.0% policy violations; automated: toxicity rate < 0.2%, PII leakage < 0.1%.
Constraints
- Deployment: single-region Kubernetes, 2×A10 GPUs for online generation, CPU autoscaling for triage.
- Compliance: must support data deletion requests within 30 days; training pipeline must be reproducible.
- Interpretability: triage model must provide top contributing spans for audit (e.g., highlight phrases that drove routing).
Deliverables (what you must produce in the interview)
- Architecture choice: For each task (triage vs agent assist), decide whether you’d use an encoder-only model, a decoder-only model, or a hybrid—and justify with attention patterns, objective functions, and compute/latency implications.
- Training plan: Pretraining/finetuning strategy, including how you’d use the unlabeled corpus, handle imbalance, and prevent leakage.
- Evaluation plan: Offline metrics + online monitoring; how you’d set thresholds for P0/P1 and how you’d evaluate generation safety.
- Production design: Inference architecture, caching, batching, model update cadence, and rollback strategy.
- Risk analysis: Failure modes specific to each architecture (e.g., miscalibration, hallucination, prompt injection) and mitigations.