Tune L1 vs L2 for Ads

Business Context

Meta Ads ranking models use high-dimensional behavioral and campaign features to predict whether a user will click an ad impression. A simple linear baseline is still valuable in production because it is fast, stable, and easy to debug, but it can overfit badly when feature space is wide and sparse. Your task is to compare L1 and L2 regularization in a click-through-rate classification setting and explain when each is preferable.

Dataset

You are given a training table built from one week of ad impression logs.

Feature Group	Count	Examples
Dense numerical	18	historical_ctr, advertiser_spend_7d, user_session_depth, page_load_ms
One-hot categorical	420	device_type, country, placement, campaign_objective, app_surface
Sparse hashed text/context	560	query intent bucket, ad text n-gram hashes, landing page topic hashes
Temporal	6	hour_of_day, day_of_week, recency_since_last_click

Rows: 1.2M ad impressions
Features: 1,004 engineered features after encoding
Target: clicked = 1 if the impression received a click, else 0
Class balance: 6.4% positive, 93.6% negative
Missing data: ~8% missing in some advertiser quality features and ~3% missing in user activity recency fields

Success Criteria

A strong solution should:

improve generalization versus an unregularized logistic regression baseline,
compare L1 and L2 using cross-validated model selection,
explain the impact on sparsity, stability, and interpretability,
achieve log loss < 0.23 and AUC-ROC > 0.78 on the holdout set.

Constraints

Inference must stay under 10 ms per request for a linear model in Meta Ads serving.
The model should remain interpretable enough to inspect top weighted features.
Training should fit on a single production training job without expensive feature selection passes.

Deliverables

Train logistic regression baselines with no regularization, L1, and L2 penalties.
Use cross-validation to tune regularization strength and select the best model.
Compare coefficient sparsity, validation performance, and holdout performance.
Explain what regularization is and when L1 should be preferred over L2, and vice versa.
Provide a production recommendation for Meta Ads ranking.

Business Context

Dataset

You are given a training table built from one week of ad impression logs.

Feature Group	Count	Examples
Dense numerical	18	historical_ctr, advertiser_spend_7d, user_session_depth, page_load_ms
One-hot categorical	420	device_type, country, placement, campaign_objective, app_surface
Sparse hashed text/context	560	query intent bucket, ad text n-gram hashes, landing page topic hashes
Temporal	6	hour_of_day, day_of_week, recency_since_last_click

Rows: 1.2M ad impressions
Features: 1,004 engineered features after encoding
Target: clicked = 1 if the impression received a click, else 0
Class balance: 6.4% positive, 93.6% negative
Missing data: ~8% missing in some advertiser quality features and ~3% missing in user activity recency fields

Success Criteria

A strong solution should:

improve generalization versus an unregularized logistic regression baseline,
compare L1 and L2 using cross-validated model selection,
explain the impact on sparsity, stability, and interpretability,
achieve log loss < 0.23 and AUC-ROC > 0.78 on the holdout set.

Constraints

Inference must stay under 10 ms per request for a linear model in Meta Ads serving.
The model should remain interpretable enough to inspect top weighted features.
Training should fit on a single production training job without expensive feature selection passes.

Deliverables

Train logistic regression baselines with no regularization, L1, and L2 penalties.
Use cross-validation to tune regularization strength and select the best model.
Compare coefficient sparsity, validation performance, and holdout performance.
Explain what regularization is and when L1 should be preferred over L2, and vice versa.
Provide a production recommendation for Meta Ads ranking.

Business Context

Dataset

You are given a training table built from one week of ad impression logs.

Feature Group	Count	Examples
Dense numerical	18	historical_ctr, advertiser_spend_7d, user_session_depth, page_load_ms
One-hot categorical	420	device_type, country, placement, campaign_objective, app_surface
Sparse hashed text/context	560	query intent bucket, ad text n-gram hashes, landing page topic hashes
Temporal	6	hour_of_day, day_of_week, recency_since_last_click

Rows: 1.2M ad impressions
Features: 1,004 engineered features after encoding
Target: clicked = 1 if the impression received a click, else 0
Class balance: 6.4% positive, 93.6% negative
Missing data: ~8% missing in some advertiser quality features and ~3% missing in user activity recency fields

Success Criteria

A strong solution should:

improve generalization versus an unregularized logistic regression baseline,
compare L1 and L2 using cross-validated model selection,
explain the impact on sparsity, stability, and interpretability,
achieve log loss < 0.23 and AUC-ROC > 0.78 on the holdout set.

Constraints

Inference must stay under 10 ms per request for a linear model in Meta Ads serving.
The model should remain interpretable enough to inspect top weighted features.
Training should fit on a single production training job without expensive feature selection passes.

Deliverables

Train logistic regression baselines with no regularization, L1, and L2 penalties.
Use cross-validation to tune regularization strength and select the best model.
Compare coefficient sparsity, validation performance, and holdout performance.
Explain what regularization is and when L1 should be preferred over L2, and vice versa.
Provide a production recommendation for Meta Ads ranking.

Business Context

Dataset

You are given a training table built from one week of ad impression logs.

Feature Group	Count	Examples
Dense numerical	18	historical_ctr, advertiser_spend_7d, user_session_depth, page_load_ms
One-hot categorical	420	device_type, country, placement, campaign_objective, app_surface
Sparse hashed text/context	560	query intent bucket, ad text n-gram hashes, landing page topic hashes
Temporal	6	hour_of_day, day_of_week, recency_since_last_click

Rows: 1.2M ad impressions
Features: 1,004 engineered features after encoding
Target: clicked = 1 if the impression received a click, else 0
Class balance: 6.4% positive, 93.6% negative
Missing data: ~8% missing in some advertiser quality features and ~3% missing in user activity recency fields

Success Criteria

A strong solution should:

improve generalization versus an unregularized logistic regression baseline,
compare L1 and L2 using cross-validated model selection,
explain the impact on sparsity, stability, and interpretability,
achieve log loss < 0.23 and AUC-ROC > 0.78 on the holdout set.

Constraints

Inference must stay under 10 ms per request for a linear model in Meta Ads serving.
The model should remain interpretable enough to inspect top weighted features.
Training should fit on a single production training job without expensive feature selection passes.

Deliverables

Train logistic regression baselines with no regularization, L1, and L2 penalties.
Use cross-validation to tune regularization strength and select the best model.
Compare coefficient sparsity, validation performance, and holdout performance.
Explain what regularization is and when L1 should be preferred over L2, and vice versa.
Provide a production recommendation for Meta Ads ranking.

Interview Guides

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Tune L1 vs L2 for Ads

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Your Answer

Tune L1 vs L2 for Ads

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Tune L1 vs L2 for Ads

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Your Answer