Google Ads wants to predict whether a newly created search ad will receive a low quality rating within 7 days, so policy and optimization systems can intervene early. You are given a supervised learning dataset and asked to design a feature selection strategy that improves model quality without introducing leakage or making the model too expensive to serve.
The training data contains ad-level snapshots collected at ad creation time and aggregated over the first 24 hours only.
| Feature Group | Count | Examples |
|---|---|---|
| Ad text and metadata | 18 | headline_length, description_length, keyword_match_type, language, device_targeting |
| Advertiser/account signals | 12 | account_age_days, prior_policy_strikes, campaign_budget, vertical |
| Landing page features | 10 | page_load_ms, mobile_friendly_score, content_length, https_enabled |
| Early performance aggregates | 14 | impressions_24h, ctr_24h, avg_cpc_24h, bounce_rate_24h |
| Geography/time | 6 | country, hour_created, day_of_week, market_tier |
low_quality_7d = 1 if the ad is rated low quality within 7 days, else 0A good solution should improve validation PR-AUC over a simple all-features logistic regression baseline by at least 10%, while keeping offline feature generation simple enough for daily retraining and online scoring under 20 ms per ad.