Design Feature Drift Monitoring System

Scenario

You are supporting a production ML ranking system for a high-traffic digital platform where model quality directly affects revenue and user engagement. The system uses a mix of real-time and batch features from user behavior, item metadata, and contextual signals, and the team has seen silent regressions caused by upstream schema changes and shifting user behavior. You have been asked to design an end-to-end approach for detecting, diagnosing, and responding to feature drift before it materially harms model performance. The solution should work for both fast-changing online features and slower batch-computed features.

Scale

Signal	Value
DAU	18M
Peak prediction QPS	120K
Ranked items per request	150
Total model features	650
Real-time features	220
Batch features	430
Feature freshness SLA	real-time < 5 min; batch < 24 hr
p99 inference latency budget	120 ms
Training data retained	90 days

Question

How would you design the production ML system so feature drift monitoring is a first-class part of the end-to-end architecture, including how data is generated, served, evaluated, alerted on, and used to trigger mitigation or rollback decisions?

Scenario

Signal

Value

DAU

18M

Peak prediction QPS

120K

Ranked items per request

150

Total model features

650

Real-time features

220

Batch features

430

Feature freshness SLA

real-time < 5 min; batch < 24 hr

p99 inference latency budget

120 ms

Training data retained

90 days

Scenario

Signal

Value

DAU

18M

Peak prediction QPS

120K

Ranked items per request

150

Total model features

650

Real-time features

220

Batch features

430

Feature freshness SLA

real-time < 5 min; batch < 24 hr

p99 inference latency budget

120 ms

Training data retained

90 days

Scenario

Signal

Value

DAU

18M

Peak prediction QPS

120K

Ranked items per request

150

Total model features

650

Real-time features

220

Batch features

430

Feature freshness SLA

real-time < 5 min; batch < 24 hr

p99 inference latency budget

120 ms

Training data retained

90 days

Interview Guides

Scenario

Scale

Question

Design Feature Drift Monitoring System

Scenario

Scale

Question

Your Answer

Design Feature Drift Monitoring System

Scenario

Scale

Question

Design Feature Drift Monitoring System

Scenario

Scale

Question

Your Answer