Design Low-Latency Support Triage

Product Context

Sparksoft Support Cloud routes inbound customer conversations to the best help content, automation, or human queue. The platform serves enterprise support teams that need fast responses while keeping inference and infrastructure costs predictable.

Scale

Signal	Value
Enterprise agents supported	85K
End customers served monthly	120M
Peak inbound conversation QPS	18K requests/sec
Daily support events	900M messages, clicks, status changes
Knowledge base size	14M articles/macros/past resolutions
Active routing targets	35K queues, bots, workflows
p99 latency budget	180ms end-to-end
Availability target	99.95%

Task

Design an end-to-end ML system for Sparksoft Support Cloud that, for each incoming customer message, selects the best next action: retrieve relevant help content, rank likely resolution paths, and optionally re-rank for business rules such as SLA priority, language, and compliance. Your design should explicitly balance cost, performance, and reliability rather than optimizing only model quality.

Address the following:

Clarify the product objective, prediction target, and success metrics for automated support triage.
Propose a multi-stage architecture (retrieval → ranking → re-ranking) and explain which parts run online vs. batch.
Size the system and give a latency and cost budget across stages, including feature serving and model inference.
Choose models for each stage and justify why they fit the scale and cost constraints.
Define offline and online evaluation, including how you would measure cost-efficiency and guard against regressions.
Identify key failure modes such as feature drift, training-serving skew, stale indexes, and degraded fallbacks.

Constraints

40% of requests come from the top 200 enterprise tenants, creating strong traffic skew.
New knowledge-base articles must become retrievable within 10 minutes.
Some tenants prohibit cross-tenant training data leakage and require regional data residency.
GPU capacity is limited; most serving must run on CPU, with selective use of heavier models.
The system must degrade gracefully to deterministic routing rules if ML components are slow or unavailable.

Product Context

Scale

Signal	Value
Enterprise agents supported	85K
End customers served monthly	120M
Peak inbound conversation QPS	18K requests/sec
Daily support events	900M messages, clicks, status changes
Knowledge base size	14M articles/macros/past resolutions
Active routing targets	35K queues, bots, workflows
p99 latency budget	180ms end-to-end
Availability target	99.95%

Task

Address the following:

Clarify the product objective, prediction target, and success metrics for automated support triage.
Propose a multi-stage architecture (retrieval → ranking → re-ranking) and explain which parts run online vs. batch.
Size the system and give a latency and cost budget across stages, including feature serving and model inference.
Choose models for each stage and justify why they fit the scale and cost constraints.
Define offline and online evaluation, including how you would measure cost-efficiency and guard against regressions.
Identify key failure modes such as feature drift, training-serving skew, stale indexes, and degraded fallbacks.

Constraints

40% of requests come from the top 200 enterprise tenants, creating strong traffic skew.
New knowledge-base articles must become retrievable within 10 minutes.
Some tenants prohibit cross-tenant training data leakage and require regional data residency.
GPU capacity is limited; most serving must run on CPU, with selective use of heavier models.
The system must degrade gracefully to deterministic routing rules if ML components are slow or unavailable.

Product Context

Scale

Signal	Value
Enterprise agents supported	85K
End customers served monthly	120M
Peak inbound conversation QPS	18K requests/sec
Daily support events	900M messages, clicks, status changes
Knowledge base size	14M articles/macros/past resolutions
Active routing targets	35K queues, bots, workflows
p99 latency budget	180ms end-to-end
Availability target	99.95%

Task

Address the following:

Clarify the product objective, prediction target, and success metrics for automated support triage.
Propose a multi-stage architecture (retrieval → ranking → re-ranking) and explain which parts run online vs. batch.
Size the system and give a latency and cost budget across stages, including feature serving and model inference.
Choose models for each stage and justify why they fit the scale and cost constraints.
Define offline and online evaluation, including how you would measure cost-efficiency and guard against regressions.
Identify key failure modes such as feature drift, training-serving skew, stale indexes, and degraded fallbacks.

Constraints

40% of requests come from the top 200 enterprise tenants, creating strong traffic skew.
New knowledge-base articles must become retrievable within 10 minutes.
Some tenants prohibit cross-tenant training data leakage and require regional data residency.
GPU capacity is limited; most serving must run on CPU, with selective use of heavier models.
The system must degrade gracefully to deterministic routing rules if ML components are slow or unavailable.

Product Context

Scale

Signal	Value
Enterprise agents supported	85K
End customers served monthly	120M
Peak inbound conversation QPS	18K requests/sec
Daily support events	900M messages, clicks, status changes
Knowledge base size	14M articles/macros/past resolutions
Active routing targets	35K queues, bots, workflows
p99 latency budget	180ms end-to-end
Availability target	99.95%

Task

Address the following:

Clarify the product objective, prediction target, and success metrics for automated support triage.
Propose a multi-stage architecture (retrieval → ranking → re-ranking) and explain which parts run online vs. batch.
Size the system and give a latency and cost budget across stages, including feature serving and model inference.
Choose models for each stage and justify why they fit the scale and cost constraints.
Define offline and online evaluation, including how you would measure cost-efficiency and guard against regressions.
Identify key failure modes such as feature drift, training-serving skew, stale indexes, and degraded fallbacks.

Constraints

40% of requests come from the top 200 enterprise tenants, creating strong traffic skew.
New knowledge-base articles must become retrievable within 10 minutes.
Some tenants prohibit cross-tenant training data leakage and require regional data residency.
GPU capacity is limited; most serving must run on CPU, with selective use of heavier models.
The system must degrade gracefully to deterministic routing rules if ML components are slow or unavailable.

Interview Guides

Product Context

Scale

Task

Constraints

Design Low-Latency Support Triage

Product Context

Scale

Task

Constraints

Your Answer

Design Low-Latency Support Triage

Product Context

Scale

Task

Constraints

Design Low-Latency Support Triage

Product Context

Scale

Task

Constraints

Your Answer