Business Context
Google Ads wants to predict whether a search ad impression will receive a click so bidding and ranking systems can use calibrated click-through-rate estimates. You are given an offline training dataset and asked to evaluate whether the current model suffers more from high bias or high variance, then recommend changes.
Dataset
| Feature Group | Count | Examples |
|---|
| Query and ad text features | 12 | query_length, ad_title_length, keyword_match_type, semantic_similarity_score |
| Auction context | 9 | device_type, country, hour_of_day, ad_position, page_type |
| Historical performance | 8 | advertiser_ctr_7d, campaign_ctr_30d, quality_score, conversion_rate_30d |
| User and session signals | 6 | returning_user, prior_searches_24h, signed_in, browser_family |
| Engineered interaction features | 5 | query_x_device, position_x_quality_score, hour_bucket |
- Size: 2.4M ad impressions, 40 features
- Target: Binary label indicating whether the impression was clicked
- Class balance: 11.6% clicked, 88.4% not clicked
- Missing data: ~7% missing in historical advertiser features for new campaigns; <2% missing in some session fields
Success Criteria
A strong solution should:
- correctly diagnose bias vs variance using train/validation/test behavior,
- compare at least two model families or complexity settings,
- use cross-validation and learning curves rather than a single split,
- recommend concrete actions that improve generalization,
- achieve log loss < 0.31 and AUC-ROC > 0.78 on the held-out test set.
Constraints
- Inference must stay under 15 ms p95 in a Google Ads batch scoring service.
- Model should be explainable enough to justify major feature or regularization changes.
- Retraining happens daily, so tuning must be computationally practical.
Deliverables
- Build a baseline and at least one higher-capacity model.
- Use training/validation curves to determine whether errors are caused by underfitting or overfitting.
- Quantify the impact of regularization, feature engineering, and model complexity.
- Report final offline metrics and explain the chosen operating point.
- Recommend production changes for improving the bias-variance trade-off.