Tune Feed CTR Models

Business Context

Meta wants to predict whether a Facebook Feed impression will receive a click so ranking teams can compare simple and complex models before deployment. You are asked to explain the bias-variance trade-off with a concrete modeling exercise rather than a purely theoretical answer.

Dataset

You are given an offline training dataset built from historical Facebook Feed impressions.

Feature Group	Count	Examples
User features	8	account_age_days, prior_ctr_7d, sessions_7d, follows_count
Content features	7	post_type, media_count, text_length, page_category
Context features	6	hour_of_day, device_type, network_type, country_tier
Interaction features	5	user_page_affinity, recent_video_watch_rate, prior_page_ctr
Label	1	clicked (1) / not clicked (0)

Size: 1.2M impressions, 26 input features
Target: Binary classification — whether the impression was clicked
Class balance: 11.4% positive, 88.6% negative
Missing data: ~6% missing in prior engagement features for new or low-activity users; ~2% missing in content metadata

Success Criteria

A strong solution should clearly demonstrate, with metrics, how underfit and overfit models behave and identify a model complexity level that generalizes well. Good enough means improving validation and test log loss over a naive baseline while keeping train/validation gaps small and explaining the observed bias-variance trade-off.

Constraints

Inference should stay under 10 ms per impression in an online ranking service
The solution should be interpretable enough to explain why a simpler model may outperform a more complex one on unseen data
Training should be feasible on a single machine for experimentation

Deliverables

Train at least three models with increasing complexity (for example: regularized logistic regression, shallow decision tree, deep decision tree or high-degree polynomial model).
Compare train, validation, and test performance using appropriate classification metrics.
Explain which model shows high bias, which shows high variance, and how regularization or pruning changes the trade-off.
Recommend a production-ready model for Facebook Feed CTR prediction under the stated latency constraint.
Provide concise Python code that reproduces preprocessing, training, and evaluation.

Business Context

Dataset

You are given an offline training dataset built from historical Facebook Feed impressions.

Feature Group	Count	Examples
User features	8	account_age_days, prior_ctr_7d, sessions_7d, follows_count
Content features	7	post_type, media_count, text_length, page_category
Context features	6	hour_of_day, device_type, network_type, country_tier
Interaction features	5	user_page_affinity, recent_video_watch_rate, prior_page_ctr
Label	1	clicked (1) / not clicked (0)

Size: 1.2M impressions, 26 input features
Target: Binary classification — whether the impression was clicked
Class balance: 11.4% positive, 88.6% negative
Missing data: ~6% missing in prior engagement features for new or low-activity users; ~2% missing in content metadata

Success Criteria

Constraints

Inference should stay under 10 ms per impression in an online ranking service
The solution should be interpretable enough to explain why a simpler model may outperform a more complex one on unseen data
Training should be feasible on a single machine for experimentation

Deliverables

Train at least three models with increasing complexity (for example: regularized logistic regression, shallow decision tree, deep decision tree or high-degree polynomial model).
Compare train, validation, and test performance using appropriate classification metrics.
Explain which model shows high bias, which shows high variance, and how regularization or pruning changes the trade-off.
Recommend a production-ready model for Facebook Feed CTR prediction under the stated latency constraint.
Provide concise Python code that reproduces preprocessing, training, and evaluation.

Business Context

Dataset

You are given an offline training dataset built from historical Facebook Feed impressions.

Feature Group	Count	Examples
User features	8	account_age_days, prior_ctr_7d, sessions_7d, follows_count
Content features	7	post_type, media_count, text_length, page_category
Context features	6	hour_of_day, device_type, network_type, country_tier
Interaction features	5	user_page_affinity, recent_video_watch_rate, prior_page_ctr
Label	1	clicked (1) / not clicked (0)

Size: 1.2M impressions, 26 input features
Target: Binary classification — whether the impression was clicked
Class balance: 11.4% positive, 88.6% negative
Missing data: ~6% missing in prior engagement features for new or low-activity users; ~2% missing in content metadata

Success Criteria

Constraints

Inference should stay under 10 ms per impression in an online ranking service
The solution should be interpretable enough to explain why a simpler model may outperform a more complex one on unseen data
Training should be feasible on a single machine for experimentation

Deliverables

Train at least three models with increasing complexity (for example: regularized logistic regression, shallow decision tree, deep decision tree or high-degree polynomial model).
Compare train, validation, and test performance using appropriate classification metrics.
Explain which model shows high bias, which shows high variance, and how regularization or pruning changes the trade-off.
Recommend a production-ready model for Facebook Feed CTR prediction under the stated latency constraint.
Provide concise Python code that reproduces preprocessing, training, and evaluation.

Business Context

Dataset

You are given an offline training dataset built from historical Facebook Feed impressions.

Feature Group	Count	Examples
User features	8	account_age_days, prior_ctr_7d, sessions_7d, follows_count
Content features	7	post_type, media_count, text_length, page_category
Context features	6	hour_of_day, device_type, network_type, country_tier
Interaction features	5	user_page_affinity, recent_video_watch_rate, prior_page_ctr
Label	1	clicked (1) / not clicked (0)

Size: 1.2M impressions, 26 input features
Target: Binary classification — whether the impression was clicked
Class balance: 11.4% positive, 88.6% negative
Missing data: ~6% missing in prior engagement features for new or low-activity users; ~2% missing in content metadata

Success Criteria

Constraints

Inference should stay under 10 ms per impression in an online ranking service
The solution should be interpretable enough to explain why a simpler model may outperform a more complex one on unseen data
Training should be feasible on a single machine for experimentation

Deliverables

Train at least three models with increasing complexity (for example: regularized logistic regression, shallow decision tree, deep decision tree or high-degree polynomial model).
Compare train, validation, and test performance using appropriate classification metrics.
Explain which model shows high bias, which shows high variance, and how regularization or pruning changes the trade-off.
Recommend a production-ready model for Facebook Feed CTR prediction under the stated latency constraint.
Provide concise Python code that reproduces preprocessing, training, and evaluation.

Interview Guides

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Tune Feed CTR Models

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Your Answer

Tune Feed CTR Models

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Tune Feed CTR Models

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Your Answer