Choose Encoder vs Decoder for Support

Business Context

You’re on the ML Platform team at HelpDeskNow, a customer support SaaS used by 3,200 enterprise clients (banking, travel, and e-commerce). The platform processes ~12M tickets/month across email and chat. Two high-impact product initiatives are being launched this quarter:

Ticket Triage: Automatically route each incoming ticket to one of 38 queues (e.g., Billing Disputes, Login Issues, API Outage) and predict priority (P0–P3). Misrouting increases time-to-resolution and can breach SLAs.
Agent Assist: Generate a draft response that agents can edit. This must be safe (no hallucinated refunds, no policy violations) and fast enough for interactive use.

Your interviewer wants to test whether you can explain and operationalize the difference between encoder-only and decoder-only Transformer architectures in a real system—beyond definitions.

Dataset

You have two labeled datasets and one unlabeled corpus:

Dataset	Size	Target	Inputs	Notes
Triage labels	8.5M tickets	38-way queue + P0–P3	subject + body + last 3 messages	Class imbalance: top 5 queues = 62% of volume; rare queues <0.3% each
Agent reply pairs	2.1M (ticket, reply)	next-token generation	ticket thread → agent reply	Replies average 110 tokens; some contain PII
Unlabeled corpus	90M messages	none	raw threads	Contains multilingual (12% ES/PT/FR) and policy-sensitive content

Additional characteristics:

PII appears in ~18% of tickets (names, emails, order IDs). You must avoid training leakage and unsafe generations.
Missingness: ~9% of tickets have empty subject; ~6% have truncated bodies due to upstream limits.
Latency requirements: triage must run at p95 < 80ms per ticket (batch + near-real-time). Agent assist must stream first tokens in < 300ms.

Success Criteria

Triage: Macro-F1 ≥ 0.72 on queues; Recall ≥ 0.85 for P0/P1 tickets; calibration ECE ≤ 0.03.
Agent Assist: Human eval: ≥ 4.2/5 helpfulness, ≤ 1.0% policy violations; automated: toxicity rate < 0.2%, PII leakage < 0.1%.

Constraints

Deployment: single-region Kubernetes, 2×A10 GPUs for online generation, CPU autoscaling for triage.
Compliance: must support data deletion requests within 30 days; training pipeline must be reproducible.
Interpretability: triage model must provide top contributing spans for audit (e.g., highlight phrases that drove routing).

Deliverables (what you must produce in the interview)

Architecture choice: For each task (triage vs agent assist), decide whether you’d use an encoder-only model, a decoder-only model, or a hybrid—and justify with attention patterns, objective functions, and compute/latency implications.
Training plan: Pretraining/finetuning strategy, including how you’d use the unlabeled corpus, handle imbalance, and prevent leakage.
Evaluation plan: Offline metrics + online monitoring; how you’d set thresholds for P0/P1 and how you’d evaluate generation safety.
Production design: Inference architecture, caching, batching, model update cadence, and rollback strategy.
Risk analysis: Failure modes specific to each architecture (e.g., miscalibration, hallucination, prompt injection) and mitigations.

Business Context

Ticket Triage: Automatically route each incoming ticket to one of 38 queues (e.g., Billing Disputes, Login Issues, API Outage) and predict priority (P0–P3). Misrouting increases time-to-resolution and can breach SLAs.
Agent Assist: Generate a draft response that agents can edit. This must be safe (no hallucinated refunds, no policy violations) and fast enough for interactive use.

Your interviewer wants to test whether you can explain and operationalize the difference between encoder-only and decoder-only Transformer architectures in a real system—beyond definitions.

Dataset

You have two labeled datasets and one unlabeled corpus:

Dataset	Size	Target	Inputs	Notes
Triage labels	8.5M tickets	38-way queue + P0–P3	subject + body + last 3 messages	Class imbalance: top 5 queues = 62% of volume; rare queues <0.3% each
Agent reply pairs	2.1M (ticket, reply)	next-token generation	ticket thread → agent reply	Replies average 110 tokens; some contain PII
Unlabeled corpus	90M messages	none	raw threads	Contains multilingual (12% ES/PT/FR) and policy-sensitive content

Additional characteristics:

PII appears in ~18% of tickets (names, emails, order IDs). You must avoid training leakage and unsafe generations.
Missingness: ~9% of tickets have empty subject; ~6% have truncated bodies due to upstream limits.
Latency requirements: triage must run at p95 < 80ms per ticket (batch + near-real-time). Agent assist must stream first tokens in < 300ms.

Success Criteria

Triage: Macro-F1 ≥ 0.72 on queues; Recall ≥ 0.85 for P0/P1 tickets; calibration ECE ≤ 0.03.
Agent Assist: Human eval: ≥ 4.2/5 helpfulness, ≤ 1.0% policy violations; automated: toxicity rate < 0.2%, PII leakage < 0.1%.

Constraints

Deployment: single-region Kubernetes, 2×A10 GPUs for online generation, CPU autoscaling for triage.
Compliance: must support data deletion requests within 30 days; training pipeline must be reproducible.
Interpretability: triage model must provide top contributing spans for audit (e.g., highlight phrases that drove routing).

Deliverables (what you must produce in the interview)

Architecture choice: For each task (triage vs agent assist), decide whether you’d use an encoder-only model, a decoder-only model, or a hybrid—and justify with attention patterns, objective functions, and compute/latency implications.
Training plan: Pretraining/finetuning strategy, including how you’d use the unlabeled corpus, handle imbalance, and prevent leakage.
Evaluation plan: Offline metrics + online monitoring; how you’d set thresholds for P0/P1 and how you’d evaluate generation safety.
Production design: Inference architecture, caching, batching, model update cadence, and rollback strategy.
Risk analysis: Failure modes specific to each architecture (e.g., miscalibration, hallucination, prompt injection) and mitigations.

Business Context

Ticket Triage: Automatically route each incoming ticket to one of 38 queues (e.g., Billing Disputes, Login Issues, API Outage) and predict priority (P0–P3). Misrouting increases time-to-resolution and can breach SLAs.
Agent Assist: Generate a draft response that agents can edit. This must be safe (no hallucinated refunds, no policy violations) and fast enough for interactive use.

Your interviewer wants to test whether you can explain and operationalize the difference between encoder-only and decoder-only Transformer architectures in a real system—beyond definitions.

Dataset

You have two labeled datasets and one unlabeled corpus:

Dataset	Size	Target	Inputs	Notes
Triage labels	8.5M tickets	38-way queue + P0–P3	subject + body + last 3 messages	Class imbalance: top 5 queues = 62% of volume; rare queues <0.3% each
Agent reply pairs	2.1M (ticket, reply)	next-token generation	ticket thread → agent reply	Replies average 110 tokens; some contain PII
Unlabeled corpus	90M messages	none	raw threads	Contains multilingual (12% ES/PT/FR) and policy-sensitive content

Additional characteristics:

PII appears in ~18% of tickets (names, emails, order IDs). You must avoid training leakage and unsafe generations.
Missingness: ~9% of tickets have empty subject; ~6% have truncated bodies due to upstream limits.
Latency requirements: triage must run at p95 < 80ms per ticket (batch + near-real-time). Agent assist must stream first tokens in < 300ms.

Success Criteria

Triage: Macro-F1 ≥ 0.72 on queues; Recall ≥ 0.85 for P0/P1 tickets; calibration ECE ≤ 0.03.
Agent Assist: Human eval: ≥ 4.2/5 helpfulness, ≤ 1.0% policy violations; automated: toxicity rate < 0.2%, PII leakage < 0.1%.

Constraints

Deployment: single-region Kubernetes, 2×A10 GPUs for online generation, CPU autoscaling for triage.
Compliance: must support data deletion requests within 30 days; training pipeline must be reproducible.
Interpretability: triage model must provide top contributing spans for audit (e.g., highlight phrases that drove routing).

Deliverables (what you must produce in the interview)

Architecture choice: For each task (triage vs agent assist), decide whether you’d use an encoder-only model, a decoder-only model, or a hybrid—and justify with attention patterns, objective functions, and compute/latency implications.
Training plan: Pretraining/finetuning strategy, including how you’d use the unlabeled corpus, handle imbalance, and prevent leakage.
Evaluation plan: Offline metrics + online monitoring; how you’d set thresholds for P0/P1 and how you’d evaluate generation safety.
Production design: Inference architecture, caching, batching, model update cadence, and rollback strategy.
Risk analysis: Failure modes specific to each architecture (e.g., miscalibration, hallucination, prompt injection) and mitigations.

Business Context

Ticket Triage: Automatically route each incoming ticket to one of 38 queues (e.g., Billing Disputes, Login Issues, API Outage) and predict priority (P0–P3). Misrouting increases time-to-resolution and can breach SLAs.
Agent Assist: Generate a draft response that agents can edit. This must be safe (no hallucinated refunds, no policy violations) and fast enough for interactive use.

Your interviewer wants to test whether you can explain and operationalize the difference between encoder-only and decoder-only Transformer architectures in a real system—beyond definitions.

Dataset

You have two labeled datasets and one unlabeled corpus:

Dataset	Size	Target	Inputs	Notes
Triage labels	8.5M tickets	38-way queue + P0–P3	subject + body + last 3 messages	Class imbalance: top 5 queues = 62% of volume; rare queues <0.3% each
Agent reply pairs	2.1M (ticket, reply)	next-token generation	ticket thread → agent reply	Replies average 110 tokens; some contain PII
Unlabeled corpus	90M messages	none	raw threads	Contains multilingual (12% ES/PT/FR) and policy-sensitive content

Additional characteristics:

PII appears in ~18% of tickets (names, emails, order IDs). You must avoid training leakage and unsafe generations.
Missingness: ~9% of tickets have empty subject; ~6% have truncated bodies due to upstream limits.
Latency requirements: triage must run at p95 < 80ms per ticket (batch + near-real-time). Agent assist must stream first tokens in < 300ms.

Success Criteria

Triage: Macro-F1 ≥ 0.72 on queues; Recall ≥ 0.85 for P0/P1 tickets; calibration ECE ≤ 0.03.
Agent Assist: Human eval: ≥ 4.2/5 helpfulness, ≤ 1.0% policy violations; automated: toxicity rate < 0.2%, PII leakage < 0.1%.

Constraints

Deployment: single-region Kubernetes, 2×A10 GPUs for online generation, CPU autoscaling for triage.
Compliance: must support data deletion requests within 30 days; training pipeline must be reproducible.
Interpretability: triage model must provide top contributing spans for audit (e.g., highlight phrases that drove routing).

Deliverables (what you must produce in the interview)

Architecture choice: For each task (triage vs agent assist), decide whether you’d use an encoder-only model, a decoder-only model, or a hybrid—and justify with attention patterns, objective functions, and compute/latency implications.
Training plan: Pretraining/finetuning strategy, including how you’d use the unlabeled corpus, handle imbalance, and prevent leakage.
Evaluation plan: Offline metrics + online monitoring; how you’d set thresholds for P0/P1 and how you’d evaluate generation safety.
Production design: Inference architecture, caching, batching, model update cadence, and rollback strategy.
Risk analysis: Failure modes specific to each architecture (e.g., miscalibration, hallucination, prompt injection) and mitigations.

Interview Guides

Business Context

Dataset

Success Criteria

Constraints

Deliverables (what you must produce in the interview)

Choose Encoder vs Decoder for Support

Business Context

Dataset

Success Criteria

Constraints

Deliverables (what you must produce in the interview)

Your Answer

Choose Encoder vs Decoder for Support

Business Context

Dataset

Success Criteria

Constraints

Deliverables (what you must produce in the interview)

Choose Encoder vs Decoder for Support

Business Context

Dataset

Success Criteria

Constraints

Deliverables (what you must produce in the interview)

Your Answer