Monitor Marketplace Ranking After Launch

Product Context

ShopNow is a large e-commerce marketplace. Its search and recommendation surfaces use an ML ranking system to order products for shoppers, and the team wants a robust post-deployment monitoring design that catches feature drift, model quality regressions, and training-serving skew before they materially hurt revenue.

Scale

Signal	Value
DAU	35M
Peak ranking QPS	180K requests/sec
Active product catalog	120M SKUs
Candidate set per request	~5K retrieved → 300 ranked → 40 re-ranked
End-to-end p99 latency budget	120ms
New / updated SKUs per day	9M
Daily impression events	~4.5B

Task

Design the end-to-end ML system and monitoring strategy for this ranking stack after deployment. Address the following:

Clarify the product objective, prediction target, and what “model quality” means online versus offline.
Propose a multi-stage architecture (retrieval → ranking → re-ranking) and explain which features are computed batch vs near-real-time vs fully online.
Design a monitoring framework for feature drift, label drift, calibration, delayed feedback, and training-serving skew across the pipeline.
Define how you would evaluate the system offline and online, including alert thresholds, dashboards, and rollback criteria.
Identify likely failure modes at this scale and explain how the system degrades safely when features or models are stale, missing, or unhealthy.

Constraints

p99 latency must stay under 120ms globally.
The business requires fresh inventory and price changes reflected within 10 minutes.
Some labels are delayed: purchases can occur hours after impression.
Cost matters: the online ranker must primarily run on CPU, with limited GPU budget for offline training only.
The system must support auditable monitoring for regulated categories where ranking changes can affect seller exposure.

Product Context

Scale

Signal	Value
DAU	35M
Peak ranking QPS	180K requests/sec
Active product catalog	120M SKUs
Candidate set per request	~5K retrieved → 300 ranked → 40 re-ranked
End-to-end p99 latency budget	120ms
New / updated SKUs per day	9M
Daily impression events	~4.5B

Task

Design the end-to-end ML system and monitoring strategy for this ranking stack after deployment. Address the following:

Clarify the product objective, prediction target, and what “model quality” means online versus offline.
Propose a multi-stage architecture (retrieval → ranking → re-ranking) and explain which features are computed batch vs near-real-time vs fully online.
Design a monitoring framework for feature drift, label drift, calibration, delayed feedback, and training-serving skew across the pipeline.
Define how you would evaluate the system offline and online, including alert thresholds, dashboards, and rollback criteria.
Identify likely failure modes at this scale and explain how the system degrades safely when features or models are stale, missing, or unhealthy.

Constraints

p99 latency must stay under 120ms globally.
The business requires fresh inventory and price changes reflected within 10 minutes.
Some labels are delayed: purchases can occur hours after impression.
Cost matters: the online ranker must primarily run on CPU, with limited GPU budget for offline training only.
The system must support auditable monitoring for regulated categories where ranking changes can affect seller exposure.

Product Context

Scale

Signal	Value
DAU	35M
Peak ranking QPS	180K requests/sec
Active product catalog	120M SKUs
Candidate set per request	~5K retrieved → 300 ranked → 40 re-ranked
End-to-end p99 latency budget	120ms
New / updated SKUs per day	9M
Daily impression events	~4.5B

Task

Design the end-to-end ML system and monitoring strategy for this ranking stack after deployment. Address the following:

Clarify the product objective, prediction target, and what “model quality” means online versus offline.
Propose a multi-stage architecture (retrieval → ranking → re-ranking) and explain which features are computed batch vs near-real-time vs fully online.
Design a monitoring framework for feature drift, label drift, calibration, delayed feedback, and training-serving skew across the pipeline.
Define how you would evaluate the system offline and online, including alert thresholds, dashboards, and rollback criteria.
Identify likely failure modes at this scale and explain how the system degrades safely when features or models are stale, missing, or unhealthy.

Constraints

p99 latency must stay under 120ms globally.
The business requires fresh inventory and price changes reflected within 10 minutes.
Some labels are delayed: purchases can occur hours after impression.
Cost matters: the online ranker must primarily run on CPU, with limited GPU budget for offline training only.
The system must support auditable monitoring for regulated categories where ranking changes can affect seller exposure.

Product Context

Scale

Signal	Value
DAU	35M
Peak ranking QPS	180K requests/sec
Active product catalog	120M SKUs
Candidate set per request	~5K retrieved → 300 ranked → 40 re-ranked
End-to-end p99 latency budget	120ms
New / updated SKUs per day	9M
Daily impression events	~4.5B

Task

Design the end-to-end ML system and monitoring strategy for this ranking stack after deployment. Address the following:

Clarify the product objective, prediction target, and what “model quality” means online versus offline.
Propose a multi-stage architecture (retrieval → ranking → re-ranking) and explain which features are computed batch vs near-real-time vs fully online.
Design a monitoring framework for feature drift, label drift, calibration, delayed feedback, and training-serving skew across the pipeline.
Define how you would evaluate the system offline and online, including alert thresholds, dashboards, and rollback criteria.
Identify likely failure modes at this scale and explain how the system degrades safely when features or models are stale, missing, or unhealthy.

Constraints

p99 latency must stay under 120ms globally.
The business requires fresh inventory and price changes reflected within 10 minutes.
Some labels are delayed: purchases can occur hours after impression.
Cost matters: the online ranker must primarily run on CPU, with limited GPU budget for offline training only.
The system must support auditable monitoring for regulated categories where ranking changes can affect seller exposure.

Interview Guides

Product Context

Scale

Task

Constraints

Monitor Marketplace Ranking After Launch

Product Context

Scale

Task

Constraints

Your Answer

Monitor Marketplace Ranking After Launch

Product Context

Scale

Task

Constraints

Monitor Marketplace Ranking After Launch

Product Context

Scale

Task

Constraints

Your Answer