Predict Ad CTR for Real-Time Ranking

Business Context

You’re working on AdPulse, a large-scale ads marketplace powering sponsored listings across a global e-commerce app (≈ 45M DAUs, 8–12B ad impressions/day, peak 250K requests/sec). The ranking stack uses predicted click-through rate (CTR) as a key input to expected revenue and relevance. A 0.1% relative improvement in calibrated CTR is estimated to be worth $3–5M/month due to better auction outcomes and fewer irrelevant ads.

The current CTR model is a legacy logistic regression trained on a random split. It performs well offline but regresses badly after launches, suggesting data leakage and distribution shift (new campaigns, new creatives, seasonal effects). You are asked to design a robust CTR prediction system suitable for production.

Dataset

You have an event log of ad impressions with click labels, joined with user, ad, and context metadata.

Feature Group	Approx. Count	Examples	Notes
User	10	user_country, device_os, account_age_days, historical_ctr_7d	Some are delayed aggregates
Ad / Campaign	12	advertiser_id, campaign_id, bid_cpc, objective_type	High-cardinality IDs
Creative	8	creative_id, format, image_hash, text_length	Frequent cold-start
Context	15	query_category, page_type, placement_id, hour_of_day, weekday	Strong seasonality
Interaction / Cross	5	user×category affinity, ad×placement stats	Must avoid leakage

Scale: ~6.5B impressions over 28 days; training sample budget for iteration: 200M rows/day (you can downsample negatives but must justify).
Target: clicked (1 if click within 60 seconds of impression; else 0).
Class balance: Highly imbalanced — overall CTR ≈ 1.2% (varies by placement 0.2%–6%).
Missingness: 20–30% missing in some aggregates for new users/ads; 5% missing device fields due to privacy settings.

Success Criteria

Offline: improve log loss by ≥ 1.5% relative vs baseline on a strictly time-based test set.
Calibration: ECE < 0.02 overall and within key slices (top 10 placements, top 5 countries).
Ranking utility: NDCG@K or AUC-PR improves meaningfully; additionally, lift in clicks in top-scored decile > 2.5× baseline.
Online readiness: model supports p99 < 10 ms inference in the ranking service (excluding network), and can be retrained daily.

Constraints

No leakage: any feature derived from future clicks, post-impression events, or aggregates that include the label window is disallowed.
High-cardinality categoricals: millions of users/ads/creatives; memory and feature hashing choices matter.
Cold start: new ads/creatives appear continuously; model must degrade gracefully.
Interpretability: Ads policy team requires the ability to explain large score changes at least at feature-group level.

Deliverables (what you must produce in the interview)

A modeling approach (baseline + improved model), including feature representations for high-cardinality IDs.
A training/validation/test strategy that prevents leakage and reflects production traffic.
A plan for handling imbalance, missing data, and cold-start.
An evaluation plan: metrics, slice analysis, and thresholding (if applicable).
A production plan: serving architecture, feature freshness, retraining cadence, and monitoring (drift + calibration).

Business Context

Dataset

You have an event log of ad impressions with click labels, joined with user, ad, and context metadata.

Feature Group	Approx. Count	Examples	Notes
User	10	user_country, device_os, account_age_days, historical_ctr_7d	Some are delayed aggregates
Ad / Campaign	12	advertiser_id, campaign_id, bid_cpc, objective_type	High-cardinality IDs
Creative	8	creative_id, format, image_hash, text_length	Frequent cold-start
Context	15	query_category, page_type, placement_id, hour_of_day, weekday	Strong seasonality
Interaction / Cross	5	user×category affinity, ad×placement stats	Must avoid leakage

Scale: ~6.5B impressions over 28 days; training sample budget for iteration: 200M rows/day (you can downsample negatives but must justify).
Target: clicked (1 if click within 60 seconds of impression; else 0).
Class balance: Highly imbalanced — overall CTR ≈ 1.2% (varies by placement 0.2%–6%).
Missingness: 20–30% missing in some aggregates for new users/ads; 5% missing device fields due to privacy settings.

Success Criteria

Offline: improve log loss by ≥ 1.5% relative vs baseline on a strictly time-based test set.
Calibration: ECE < 0.02 overall and within key slices (top 10 placements, top 5 countries).
Ranking utility: NDCG@K or AUC-PR improves meaningfully; additionally, lift in clicks in top-scored decile > 2.5× baseline.
Online readiness: model supports p99 < 10 ms inference in the ranking service (excluding network), and can be retrained daily.

Constraints

No leakage: any feature derived from future clicks, post-impression events, or aggregates that include the label window is disallowed.
High-cardinality categoricals: millions of users/ads/creatives; memory and feature hashing choices matter.
Cold start: new ads/creatives appear continuously; model must degrade gracefully.
Interpretability: Ads policy team requires the ability to explain large score changes at least at feature-group level.

Deliverables (what you must produce in the interview)

A modeling approach (baseline + improved model), including feature representations for high-cardinality IDs.
A training/validation/test strategy that prevents leakage and reflects production traffic.
A plan for handling imbalance, missing data, and cold-start.
An evaluation plan: metrics, slice analysis, and thresholding (if applicable).
A production plan: serving architecture, feature freshness, retraining cadence, and monitoring (drift + calibration).

Business Context

Dataset

You have an event log of ad impressions with click labels, joined with user, ad, and context metadata.

Feature Group	Approx. Count	Examples	Notes
User	10	user_country, device_os, account_age_days, historical_ctr_7d	Some are delayed aggregates
Ad / Campaign	12	advertiser_id, campaign_id, bid_cpc, objective_type	High-cardinality IDs
Creative	8	creative_id, format, image_hash, text_length	Frequent cold-start
Context	15	query_category, page_type, placement_id, hour_of_day, weekday	Strong seasonality
Interaction / Cross	5	user×category affinity, ad×placement stats	Must avoid leakage

Scale: ~6.5B impressions over 28 days; training sample budget for iteration: 200M rows/day (you can downsample negatives but must justify).
Target: clicked (1 if click within 60 seconds of impression; else 0).
Class balance: Highly imbalanced — overall CTR ≈ 1.2% (varies by placement 0.2%–6%).
Missingness: 20–30% missing in some aggregates for new users/ads; 5% missing device fields due to privacy settings.

Success Criteria

Offline: improve log loss by ≥ 1.5% relative vs baseline on a strictly time-based test set.
Calibration: ECE < 0.02 overall and within key slices (top 10 placements, top 5 countries).
Ranking utility: NDCG@K or AUC-PR improves meaningfully; additionally, lift in clicks in top-scored decile > 2.5× baseline.
Online readiness: model supports p99 < 10 ms inference in the ranking service (excluding network), and can be retrained daily.

Constraints

No leakage: any feature derived from future clicks, post-impression events, or aggregates that include the label window is disallowed.
High-cardinality categoricals: millions of users/ads/creatives; memory and feature hashing choices matter.
Cold start: new ads/creatives appear continuously; model must degrade gracefully.
Interpretability: Ads policy team requires the ability to explain large score changes at least at feature-group level.

Deliverables (what you must produce in the interview)

A modeling approach (baseline + improved model), including feature representations for high-cardinality IDs.
A training/validation/test strategy that prevents leakage and reflects production traffic.
A plan for handling imbalance, missing data, and cold-start.
An evaluation plan: metrics, slice analysis, and thresholding (if applicable).
A production plan: serving architecture, feature freshness, retraining cadence, and monitoring (drift + calibration).

Business Context

Dataset

You have an event log of ad impressions with click labels, joined with user, ad, and context metadata.

Feature Group	Approx. Count	Examples	Notes
User	10	user_country, device_os, account_age_days, historical_ctr_7d	Some are delayed aggregates
Ad / Campaign	12	advertiser_id, campaign_id, bid_cpc, objective_type	High-cardinality IDs
Creative	8	creative_id, format, image_hash, text_length	Frequent cold-start
Context	15	query_category, page_type, placement_id, hour_of_day, weekday	Strong seasonality
Interaction / Cross	5	user×category affinity, ad×placement stats	Must avoid leakage

Scale: ~6.5B impressions over 28 days; training sample budget for iteration: 200M rows/day (you can downsample negatives but must justify).
Target: clicked (1 if click within 60 seconds of impression; else 0).
Class balance: Highly imbalanced — overall CTR ≈ 1.2% (varies by placement 0.2%–6%).
Missingness: 20–30% missing in some aggregates for new users/ads; 5% missing device fields due to privacy settings.

Success Criteria

Offline: improve log loss by ≥ 1.5% relative vs baseline on a strictly time-based test set.
Calibration: ECE < 0.02 overall and within key slices (top 10 placements, top 5 countries).
Ranking utility: NDCG@K or AUC-PR improves meaningfully; additionally, lift in clicks in top-scored decile > 2.5× baseline.
Online readiness: model supports p99 < 10 ms inference in the ranking service (excluding network), and can be retrained daily.

Constraints

No leakage: any feature derived from future clicks, post-impression events, or aggregates that include the label window is disallowed.
High-cardinality categoricals: millions of users/ads/creatives; memory and feature hashing choices matter.
Cold start: new ads/creatives appear continuously; model must degrade gracefully.
Interpretability: Ads policy team requires the ability to explain large score changes at least at feature-group level.

Deliverables (what you must produce in the interview)

A modeling approach (baseline + improved model), including feature representations for high-cardinality IDs.
A training/validation/test strategy that prevents leakage and reflects production traffic.
A plan for handling imbalance, missing data, and cold-start.
An evaluation plan: metrics, slice analysis, and thresholding (if applicable).
A production plan: serving architecture, feature freshness, retraining cadence, and monitoring (drift + calibration).

Interview Guides

Business Context

Dataset

Success Criteria

Constraints

Deliverables (what you must produce in the interview)

Predict Ad CTR for Real-Time Ranking

Business Context

Dataset

Success Criteria

Constraints

Deliverables (what you must produce in the interview)

Your Answer

Predict Ad CTR for Real-Time Ranking

Business Context

Dataset

Success Criteria

Constraints

Deliverables (what you must produce in the interview)

Predict Ad CTR for Real-Time Ranking

Business Context

Dataset

Success Criteria

Constraints

Deliverables (what you must produce in the interview)

Your Answer