Business Context
Google Ads uses click-through-rate (CTR) prediction models to rank and price sponsored results. You are asked to improve a binary classifier that predicts whether an impression will receive a click while reducing measurable bias against underrepresented advertiser and user segments.
Dataset
You are given a historical training set built from ad impression logs over 90 days.
| Feature Group | Count | Examples |
|---|
| Ad features | 12 | campaign_type, creative_format, bid_amount, ad_quality_score |
| Query/context | 10 | query_length, device_type, country, hour_of_day |
| User behavior aggregates | 8 | prior_ctr_7d, sessions_30d, conversion_rate_30d |
| Advertiser/account | 9 | vertical, account_age_days, spend_tier, region |
| Sensitive / audit-only attributes | 4 | user_gender, user_age_bucket, advertiser_size_bucket, market_tier |
- Rows: 24M ad impressions, 43 model features + 4 audit-only attributes
- Target:
clicked (1 if clicked, 0 otherwise)
- Class balance: 6.1% positive, 93.9% negative
- Missing data: 9% missing in user aggregates for cold-start users, 4% missing in advertiser metadata, sparse long-tail categories in region and vertical
- Known issue: training data over-represents large advertisers in Tier-1 markets and Android mobile traffic
Success Criteria
A good solution should improve fairness without causing unacceptable ranking degradation:
- PR-AUC drop must be less than 2% relative to the current production baseline
- Worst-group false negative rate gap across audit groups must be reduced by at least 30%
- Calibration error for each major segment must remain below 0.03
Constraints
- Batch retraining on Vertex AI once per day; online inference latency under 20 ms at p95
- Sensitive attributes may be used for offline auditing and bias mitigation analysis, but not served directly at inference
- The model must support explainability for policy and ads quality reviews
Deliverables
- Define what bias means in this CTR system and how you would measure it.
- Build a training pipeline that handles imbalance, missingness, and segment skew.
- Compare at least two mitigation strategies (for example: reweighting, thresholding, constrained training, or post-hoc calibration).
- Report overall and per-group performance on a held-out time-based test set.
- Recommend a production rollout and monitoring plan in Google Cloud.