Business Context
The Google Ads ranking team wants a click-through rate (CTR) model for Search ads that can be retrained weekly and scored online during ad serving. You are given a candidate feature set from logs, advertiser metadata, and query context, and your task is to decide which features should be included in the first production model.
Dataset
The training data is built at the impression level from the last 90 days of U.S. English Search traffic.
| Feature Group | Count | Examples |
|---|
| Query context | 12 | query_length, query_category, device_type, hour_of_day |
| Ad & campaign metadata | 15 | campaign_type, bidding_strategy, ad_format, advertiser_vertical |
| Historical performance aggregates | 18 | ad_ctr_7d, campaign_ctr_30d, advertiser_spend_7d, quality_score_bucket |
| User/context signals | 10 | geo_region, is_signed_in, browser_family, prior_search_count_24h |
| Text-derived features | 8 | headline_length, keyword_overlap_ratio, landing_page_lang_match |
- Rows: 42M impressions, 63 candidate features
- Target: Binary click label per impression
- Positive rate: 6.4% clicked, 93.6% not clicked
- Missing data: 20% missing in some advertiser-history features for new campaigns; 3% missing in user-context fields due to logging gaps
Success Criteria
A good solution should:
- improve log loss by at least 3% over a frequency-based baseline,
- achieve AUC-ROC 0.76 and PR-AUC 0.22 on a held-out week,
- produce a feature set that is stable, explainable, and safe for online serving.
Constraints
- Online inference budget: p95 < 15 ms in the Google Ads serving stack
- Avoid leakage from post-impression or future aggregates
- Prefer features that are robust for cold-start advertisers and new queries
- The final model should remain interpretable enough to support feature reviews and policy audits
Deliverables
- Propose a feature selection strategy for this product setting, including how you would screen for leakage, redundancy, and instability.
- Build a baseline and a regularized model using the selected features.
- Compare feature importance methods and justify which features you would keep, transform, or drop.
- Evaluate the model on a time-based split and recommend a production-ready feature set.
- Explain how you would monitor feature drift and retrain the model after launch.