Choose Features for Google Ads CTR | Dataford Interview Questions

Welcome to your interview.

The question is on your right: Choose Features for Google Ads CTR. Take a moment with it first.

Talk your thinking through with me if you like - when you're confident, submit your answer and I'll grade it like a real screen (7/10 or better passes). Discussion and graded submissions share your five interviewer interactions, so spend them well.

Business Context

The Google Ads ranking team wants a click-through rate (CTR) model for Search ads that can be retrained weekly and scored online during ad serving. You are given a candidate feature set from logs, advertiser metadata, and query context, and your task is to decide which features should be included in the first production model.

Dataset

The training data is built at the impression level from the last 90 days of U.S. English Search traffic.

Feature Group	Count	Examples
Query context	12	query_length, query_category, device_type, hour_of_day
Ad & campaign metadata	15	campaign_type, bidding_strategy, ad_format, advertiser_vertical
Historical performance aggregates	18	ad_ctr_7d, campaign_ctr_30d, advertiser_spend_7d, quality_score_bucket
User/context signals	10	geo_region, is_signed_in, browser_family, prior_search_count_24h
Text-derived features	8	headline_length, keyword_overlap_ratio, landing_page_lang_match

Rows: 42M impressions, 63 candidate features
Target: Binary click label per impression
Positive rate: 6.4% clicked, 93.6% not clicked
Missing data: 20% missing in some advertiser-history features for new campaigns; 3% missing in user-context fields due to logging gaps

Success Criteria

A good solution should:

improve log loss by at least 3% over a frequency-based baseline,
achieve AUC-ROC 0.76 and PR-AUC 0.22 on a held-out week,
produce a feature set that is stable, explainable, and safe for online serving.

Constraints

Online inference budget: p95 < 15 ms in the Google Ads serving stack
Avoid leakage from post-impression or future aggregates
Prefer features that are robust for cold-start advertisers and new queries
The final model should remain interpretable enough to support feature reviews and policy audits

Deliverables

Propose a feature selection strategy for this product setting, including how you would screen for leakage, redundancy, and instability.
Build a baseline and a regularized model using the selected features.
Compare feature importance methods and justify which features you would keep, transform, or drop.
Evaluate the model on a time-based split and recommend a production-ready feature set.
Explain how you would monitor feature drift and retrain the model after launch.