Product Context
AdStream is a large ad network serving sponsored search and display ads across publisher apps and websites. You need to design an ML system that detects and filters fraudulent ad clicks in real time before they are billed to advertisers or used for downstream optimization.
Scale
| Signal | Value |
|---|
| DAU | 120M users |
| Peak ad impression QPS | 900K |
| Peak click-event QPS | 85K |
| Advertisers | 1.8M active |
| Publishers | 220K active |
| Devices / browsers seen per day | 300M+ |
| End-to-end decision latency budget | 50ms p99 |
| Historical labeled events | ~30B clicks over 180 days |
Task
- Clarify the product goal and define what counts as fraudulent vs suspicious vs unknown traffic.
- Design the end-to-end architecture for real-time scoring, filtering, and post-hoc investigation.
- Propose a multi-stage decision system (fast rules / retrieval / ranking / re-review) and justify model choices at each stage.
- Explain the data pipeline, labels, feature store design, and how you avoid training-serving skew.
- Define offline and online evaluation, including delayed labels, false-positive cost, and advertiser trust guardrails.
- Identify major failure modes, monitoring, and rollback strategies.
Constraints
- Fraud labels are delayed and noisy: chargebacks, manual reviews, and advertiser complaints may arrive days later.
- False positives are expensive because blocking legitimate clicks hurts publisher revenue and advertiser delivery.
- Some features must be computed in streaming fashion (IP velocity, device fingerprint frequency, publisher-level anomaly rates).
- The system must support regional compliance requirements; raw IPs and user identifiers may have retention limits.
- Cost matters: the highest-volume path should run primarily on CPU, with minimal per-event state lookups.
- The platform must continue serving ads even if the ML scorer is degraded; define safe fallbacks.