Stabilize Deep Ranking Training

Business Context

Meta is training a deeper neural network for Facebook Feed ranking to capture higher-order interactions across user, content, and context features. The current prototype underperforms a shallower baseline because gradients in early layers collapse during training, slowing convergence and reducing ranking quality.

Dataset

You are given an offline supervised learning dataset built from Feed impression logs.

Feature Group	Count	Examples
Dense numerical	28	prior CTR, dwell time stats, friend interaction counts, session depth
Categorical IDs	14	user_id bucket, author_id bucket, content_type, device_type, country
Temporal/context	9	hour_of_day, day_of_week, recency_since_last_open, network_type
Aggregated embedding inputs	6	historical author affinity, topic affinity, content embedding clusters

Size: 12.4M impressions, 57 input features
Target: Binary label indicating whether the impression received a meaningful engagement within 24 hours
Class balance: 11.6% positive, 88.4% negative
Missing data: 6% missing in some engagement aggregates for new users; sparse categorical coverage for tail creators

Success Criteria

A good solution should:

Improve validation AUC-ROC by at least 0.015 over a plain deep MLP baseline without stabilization
Keep training numerically stable for 20+ epochs without early gradient collapse
Achieve p95 online inference < 15 ms per request after export

Constraints

The model must remain deployable in a low-latency ranking stack
You should explain which interventions specifically address vanishing gradients
The solution should be reproducible and monitorable in production

Deliverables

Build a deep neural network baseline and a stabilized version.
Show how you detect vanishing gradients during training.
Apply and justify mitigation techniques such as residual connections, normalization, activation choice, and initialization.
Evaluate both models on held-out data using ranking-relevant metrics.
Recommend a production-ready training and refresh strategy.

Business Context

Dataset

You are given an offline supervised learning dataset built from Feed impression logs.

Feature Group	Count	Examples
Dense numerical	28	prior CTR, dwell time stats, friend interaction counts, session depth
Categorical IDs	14	user_id bucket, author_id bucket, content_type, device_type, country
Temporal/context	9	hour_of_day, day_of_week, recency_since_last_open, network_type
Aggregated embedding inputs	6	historical author affinity, topic affinity, content embedding clusters

Size: 12.4M impressions, 57 input features
Target: Binary label indicating whether the impression received a meaningful engagement within 24 hours
Class balance: 11.6% positive, 88.4% negative
Missing data: 6% missing in some engagement aggregates for new users; sparse categorical coverage for tail creators

Success Criteria

A good solution should:

Improve validation AUC-ROC by at least 0.015 over a plain deep MLP baseline without stabilization
Keep training numerically stable for 20+ epochs without early gradient collapse
Achieve p95 online inference < 15 ms per request after export

Constraints

The model must remain deployable in a low-latency ranking stack
You should explain which interventions specifically address vanishing gradients
The solution should be reproducible and monitorable in production

Deliverables

Build a deep neural network baseline and a stabilized version.
Show how you detect vanishing gradients during training.
Apply and justify mitigation techniques such as residual connections, normalization, activation choice, and initialization.
Evaluate both models on held-out data using ranking-relevant metrics.
Recommend a production-ready training and refresh strategy.

Business Context

Dataset

You are given an offline supervised learning dataset built from Feed impression logs.

Feature Group	Count	Examples
Dense numerical	28	prior CTR, dwell time stats, friend interaction counts, session depth
Categorical IDs	14	user_id bucket, author_id bucket, content_type, device_type, country
Temporal/context	9	hour_of_day, day_of_week, recency_since_last_open, network_type
Aggregated embedding inputs	6	historical author affinity, topic affinity, content embedding clusters

Size: 12.4M impressions, 57 input features
Target: Binary label indicating whether the impression received a meaningful engagement within 24 hours
Class balance: 11.6% positive, 88.4% negative
Missing data: 6% missing in some engagement aggregates for new users; sparse categorical coverage for tail creators

Success Criteria

A good solution should:

Improve validation AUC-ROC by at least 0.015 over a plain deep MLP baseline without stabilization
Keep training numerically stable for 20+ epochs without early gradient collapse
Achieve p95 online inference < 15 ms per request after export

Constraints

The model must remain deployable in a low-latency ranking stack
You should explain which interventions specifically address vanishing gradients
The solution should be reproducible and monitorable in production

Deliverables

Build a deep neural network baseline and a stabilized version.
Show how you detect vanishing gradients during training.
Apply and justify mitigation techniques such as residual connections, normalization, activation choice, and initialization.
Evaluate both models on held-out data using ranking-relevant metrics.
Recommend a production-ready training and refresh strategy.

Business Context

Dataset

You are given an offline supervised learning dataset built from Feed impression logs.

Feature Group	Count	Examples
Dense numerical	28	prior CTR, dwell time stats, friend interaction counts, session depth
Categorical IDs	14	user_id bucket, author_id bucket, content_type, device_type, country
Temporal/context	9	hour_of_day, day_of_week, recency_since_last_open, network_type
Aggregated embedding inputs	6	historical author affinity, topic affinity, content embedding clusters

Size: 12.4M impressions, 57 input features
Target: Binary label indicating whether the impression received a meaningful engagement within 24 hours
Class balance: 11.6% positive, 88.4% negative
Missing data: 6% missing in some engagement aggregates for new users; sparse categorical coverage for tail creators

Success Criteria

A good solution should:

Improve validation AUC-ROC by at least 0.015 over a plain deep MLP baseline without stabilization
Keep training numerically stable for 20+ epochs without early gradient collapse
Achieve p95 online inference < 15 ms per request after export

Constraints

The model must remain deployable in a low-latency ranking stack
You should explain which interventions specifically address vanishing gradients
The solution should be reproducible and monitorable in production

Deliverables

Build a deep neural network baseline and a stabilized version.
Show how you detect vanishing gradients during training.
Apply and justify mitigation techniques such as residual connections, normalization, activation choice, and initialization.
Evaluate both models on held-out data using ranking-relevant metrics.
Recommend a production-ready training and refresh strategy.

Interview Guides

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Stabilize Deep Ranking Training

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Your Answer

Stabilize Deep Ranking Training

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Stabilize Deep Ranking Training

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Your Answer