Business Context
You’re working on AdPulse, a large-scale ads marketplace powering sponsored listings across a global e-commerce app (≈ 45M DAUs, 8–12B ad impressions/day, peak 250K requests/sec). The ranking stack uses predicted click-through rate (CTR) as a key input to expected revenue and relevance. A 0.1% relative improvement in calibrated CTR is estimated to be worth $3–5M/month due to better auction outcomes and fewer irrelevant ads.
The current CTR model is a legacy logistic regression trained on a random split. It performs well offline but regresses badly after launches, suggesting data leakage and distribution shift (new campaigns, new creatives, seasonal effects). You are asked to design a robust CTR prediction system suitable for production.
Dataset
You have an event log of ad impressions with click labels, joined with user, ad, and context metadata.
| Feature Group | Approx. Count | Examples | Notes |
|---|
| User | 10 | user_country, device_os, account_age_days, historical_ctr_7d | Some are delayed aggregates |
| Ad / Campaign | 12 | advertiser_id, campaign_id, bid_cpc, objective_type | High-cardinality IDs |
| Creative | 8 | creative_id, format, image_hash, text_length | Frequent cold-start |
| Context | 15 | query_category, page_type, placement_id, hour_of_day, weekday | Strong seasonality |
| Interaction / Cross | 5 | user×category affinity, ad×placement stats | Must avoid leakage |
- Scale: ~6.5B impressions over 28 days; training sample budget for iteration: 200M rows/day (you can downsample negatives but must justify).
- Target:
clicked (1 if click within 60 seconds of impression; else 0).
- Class balance: Highly imbalanced — overall CTR ≈ 1.2% (varies by placement 0.2%–6%).
- Missingness: 20–30% missing in some aggregates for new users/ads; 5% missing device fields due to privacy settings.
Success Criteria
- Offline: improve log loss by ≥ 1.5% relative vs baseline on a strictly time-based test set.
- Calibration: ECE < 0.02 overall and within key slices (top 10 placements, top 5 countries).
- Ranking utility: NDCG@K or AUC-PR improves meaningfully; additionally, lift in clicks in top-scored decile > 2.5× baseline.
- Online readiness: model supports p99 < 10 ms inference in the ranking service (excluding network), and can be retrained daily.
Constraints
- No leakage: any feature derived from future clicks, post-impression events, or aggregates that include the label window is disallowed.
- High-cardinality categoricals: millions of users/ads/creatives; memory and feature hashing choices matter.
- Cold start: new ads/creatives appear continuously; model must degrade gracefully.
- Interpretability: Ads policy team requires the ability to explain large score changes at least at feature-group level.
Deliverables (what you must produce in the interview)
- A modeling approach (baseline + improved model), including feature representations for high-cardinality IDs.
- A training/validation/test strategy that prevents leakage and reflects production traffic.
- A plan for handling imbalance, missing data, and cold-start.
- An evaluation plan: metrics, slice analysis, and thresholding (if applicable).
- A production plan: serving architecture, feature freshness, retraining cadence, and monitoring (drift + calibration).