Dataford
Interview Guides
Upgrade
All questions/Machine Learning/Choose Features for Google Ads CTR

Choose Features for Google Ads CTR

Medium
Machine Learning
Asked at 2 companies2Supervised LearningCross-ValidationFeature Engineering
Also asked at
GoogleMeta

Problem

Business Context

The Google Ads ranking team wants a click-through rate (CTR) model for Search ads that can be retrained weekly and scored online during ad serving. You are given a candidate feature set from logs, advertiser metadata, and query context, and your task is to decide which features should be included in the first production model.

Dataset

The training data is built at the impression level from the last 90 days of U.S. English Search traffic.

Feature GroupCountExamples
Query context12query_length, query_category, device_type, hour_of_day
Ad & campaign metadata15campaign_type, bidding_strategy, ad_format, advertiser_vertical
Historical performance aggregates18ad_ctr_7d, campaign_ctr_30d, advertiser_spend_7d, quality_score_bucket
User/context signals10geo_region, is_signed_in, browser_family, prior_search_count_24h
Text-derived features8headline_length, keyword_overlap_ratio, landing_page_lang_match
  • Rows: 42M impressions, 63 candidate features
  • Target: Binary click label per impression
  • Positive rate: 6.4% clicked, 93.6% not clicked
  • Missing data: 20% missing in some advertiser-history features for new campaigns; 3% missing in user-context fields due to logging gaps

Success Criteria

A good solution should:

  • improve log loss by at least 3% over a frequency-based baseline,
  • achieve AUC-ROC  0.76 and PR-AUC  0.22 on a held-out week,
  • produce a feature set that is stable, explainable, and safe for online serving.

Constraints

  • Online inference budget: p95 < 15 ms in the Google Ads serving stack
  • Avoid leakage from post-impression or future aggregates
  • Prefer features that are robust for cold-start advertisers and new queries
  • The final model should remain interpretable enough to support feature reviews and policy audits

Deliverables

  1. Propose a feature selection strategy for this product setting, including how you would screen for leakage, redundancy, and instability.
  2. Build a baseline and a regularized model using the selected features.
  3. Compare feature importance methods and justify which features you would keep, transform, or drop.
  4. Evaluate the model on a time-based split and recommend a production-ready feature set.
  5. Explain how you would monitor feature drift and retrain the model after launch.

Problem

Business Context

The Google Ads ranking team wants a click-through rate (CTR) model for Search ads that can be retrained weekly and scored online during ad serving. You are given a candidate feature set from logs, advertiser metadata, and query context, and your task is to decide which features should be included in the first production model.

Dataset

The training data is built at the impression level from the last 90 days of U.S. English Search traffic.

Feature GroupCountExamples
Query context12query_length, query_category, device_type, hour_of_day
Ad & campaign metadata15campaign_type, bidding_strategy, ad_format, advertiser_vertical
Historical performance aggregates18ad_ctr_7d, campaign_ctr_30d, advertiser_spend_7d, quality_score_bucket
User/context signals10geo_region, is_signed_in, browser_family, prior_search_count_24h
Text-derived features8headline_length, keyword_overlap_ratio, landing_page_lang_match
  • Rows: 42M impressions, 63 candidate features
  • Target: Binary click label per impression
  • Positive rate: 6.4% clicked, 93.6% not clicked
  • Missing data: 20% missing in some advertiser-history features for new campaigns; 3% missing in user-context fields due to logging gaps

Success Criteria

A good solution should:

  • improve log loss by at least 3% over a frequency-based baseline,
  • achieve AUC-ROC  0.76 and PR-AUC  0.22 on a held-out week,
  • produce a feature set that is stable, explainable, and safe for online serving.

Constraints

  • Online inference budget: p95 < 15 ms in the Google Ads serving stack
  • Avoid leakage from post-impression or future aggregates
  • Prefer features that are robust for cold-start advertisers and new queries
  • The final model should remain interpretable enough to support feature reviews and policy audits

Deliverables

  1. Propose a feature selection strategy for this product setting, including how you would screen for leakage, redundancy, and instability.
  2. Build a baseline and a regularized model using the selected features.
  3. Compare feature importance methods and justify which features you would keep, transform, or drop.
  4. Evaluate the model on a time-based split and recommend a production-ready feature set.
  5. Explain how you would monitor feature drift and retrain the model after launch.
Your answer
Try one AI text evaluation on us
Get structured feedback, scored against a 4-axis rubric. Premium unlocks unlimited.
0 wordstarget ~200
Up next
GoogleMitigate Bias in Ads CTR ModelsHardDiagnose Bias-Variance in Ads CTRMediumSelect Features for Ads QualityMedium
Next question