You are responsible for a deployed ML ranking system for sponsored content in a large consumer app. The model scores ad candidates in real time using user, ad, and context features, and its output directly affects revenue and user experience. Recently, performance has become unstable after upstream schema changes and shifting traffic patterns, and leadership wants a robust way to detect feature drift before it causes business impact. You need to design monitoring that works for both fast-changing online features and slower batch features without creating excessive alert noise.
| Signal | Value |
|---|---|
| DAU | 45M |
| Peak ranking QPS | 220K requests/sec |
| Candidates scored per request | 120 |
| Active ad catalog | 18M ads |
| Distinct model features | 350 |
| p99 latency budget | 120 ms end-to-end |
| Training data volume | 2.5B scored impressions/day |
How would you design the end-to-end system to monitor feature drift in this deployed ranking stack, including how drift signals are computed, where they live in the architecture, and how they connect to retraining, alerting, rollback, and overall model health evaluation?