Interview Guides

Choose Logistic Regression for Integrity Risk

Easy

Machine Learning

Business Context

Meta Integrity needs a binary classifier to predict whether a newly created Facebook ad account will trigger a policy enforcement action within 14 days. You must decide when a regularized logistic regression is preferable to a tree-based model for this production use case, then build and evaluate both.

Dataset

You are given a tabular dataset of ad-account snapshots collected at account creation time.

Feature Group	Count	Examples
Numeric activity	14	account_age_hours, payment_attempts_24h, spend_velocity, device_count
Categorical metadata	9	country, business_type, payment_method_type, signup_surface
Aggregated integrity signals	8	prior_disabled_assets, linked_entity_risk_score, ip_reputation_bucket
Simple interaction candidates	5	spend_per_device, attempts_per_payment_method, linked_assets_per_admin

Size: 420K ad accounts, 36 features
Target: Binary — enforcement within 14 days (1) vs no enforcement (0)
Class balance: 8.7% positive, 91.3% negative
Missing data: 12% missing in linked-entity signals, 4% missing in payment metadata

Success Criteria

A solution is good enough if it:

achieves PR-AUC >= 0.34 on the held-out test set,
maintains recall >= 0.70 at an operating precision of at least 0.25,
and provides a defensible explanation for choosing logistic regression over a tree-based model in this setting.

Constraints

Scores are used in a reviewer-assist workflow, so interpretability and calibration matter.
Batch scoring runs every 15 minutes on fresh account creations; p95 inference should be < 20 ms per 1K rows.
The model is retrained weekly and must be stable enough for policy and ops teams to monitor coefficient or feature drift.

Deliverables

Train a regularized logistic regression baseline and a tree-based benchmark.
Compare them on PR-AUC, ROC-AUC, recall at fixed precision, calibration, and latency.
Explain when logistic regression should be preferred for this Meta integrity problem.
Describe preprocessing, feature engineering, threshold selection, and validation strategy.
Recommend a production model and justify the tradeoff between accuracy, interpretability, and operational simplicity.

Choose Logistic Regression for Integrity Risk

Easy

Machine Learning

Business Context

Dataset

You are given a tabular dataset of ad-account snapshots collected at account creation time.

Feature Group	Count	Examples
Numeric activity	14	account_age_hours, payment_attempts_24h, spend_velocity, device_count
Categorical metadata	9	country, business_type, payment_method_type, signup_surface
Aggregated integrity signals	8	prior_disabled_assets, linked_entity_risk_score, ip_reputation_bucket
Simple interaction candidates	5	spend_per_device, attempts_per_payment_method, linked_assets_per_admin

Size: 420K ad accounts, 36 features
Target: Binary — enforcement within 14 days (1) vs no enforcement (0)
Class balance: 8.7% positive, 91.3% negative
Missing data: 12% missing in linked-entity signals, 4% missing in payment metadata

Success Criteria

A solution is good enough if it:

achieves PR-AUC >= 0.34 on the held-out test set,
maintains recall >= 0.70 at an operating precision of at least 0.25,
and provides a defensible explanation for choosing logistic regression over a tree-based model in this setting.

Constraints

Scores are used in a reviewer-assist workflow, so interpretability and calibration matter.
Batch scoring runs every 15 minutes on fresh account creations; p95 inference should be < 20 ms per 1K rows.
The model is retrained weekly and must be stable enough for policy and ops teams to monitor coefficient or feature drift.

Deliverables

Train a regularized logistic regression baseline and a tree-based benchmark.
Compare them on PR-AUC, ROC-AUC, recall at fixed precision, calibration, and latency.
Explain when logistic regression should be preferred for this Meta integrity problem.
Describe preprocessing, feature engineering, threshold selection, and validation strategy.
Recommend a production model and justify the tradeoff between accuracy, interpretability, and operational simplicity.

Your Answer

Choose Logistic Regression for Integrity Risk

Easy

Machine Learning

Business Context

Dataset

You are given a tabular dataset of ad-account snapshots collected at account creation time.

Feature Group	Count	Examples
Numeric activity	14	account_age_hours, payment_attempts_24h, spend_velocity, device_count
Categorical metadata	9	country, business_type, payment_method_type, signup_surface
Aggregated integrity signals	8	prior_disabled_assets, linked_entity_risk_score, ip_reputation_bucket
Simple interaction candidates	5	spend_per_device, attempts_per_payment_method, linked_assets_per_admin

Size: 420K ad accounts, 36 features
Target: Binary — enforcement within 14 days (1) vs no enforcement (0)
Class balance: 8.7% positive, 91.3% negative
Missing data: 12% missing in linked-entity signals, 4% missing in payment metadata

Success Criteria

A solution is good enough if it:

achieves PR-AUC >= 0.34 on the held-out test set,
maintains recall >= 0.70 at an operating precision of at least 0.25,
and provides a defensible explanation for choosing logistic regression over a tree-based model in this setting.

Constraints

Scores are used in a reviewer-assist workflow, so interpretability and calibration matter.
Batch scoring runs every 15 minutes on fresh account creations; p95 inference should be < 20 ms per 1K rows.
The model is retrained weekly and must be stable enough for policy and ops teams to monitor coefficient or feature drift.

Deliverables

Train a regularized logistic regression baseline and a tree-based benchmark.
Compare them on PR-AUC, ROC-AUC, recall at fixed precision, calibration, and latency.
Explain when logistic regression should be preferred for this Meta integrity problem.
Describe preprocessing, feature engineering, threshold selection, and validation strategy.
Recommend a production model and justify the tradeoff between accuracy, interpretability, and operational simplicity.

Choose Logistic Regression for Integrity Risk

Easy

Machine Learning

Business Context

Dataset

You are given a tabular dataset of ad-account snapshots collected at account creation time.

Feature Group	Count	Examples
Numeric activity	14	account_age_hours, payment_attempts_24h, spend_velocity, device_count
Categorical metadata	9	country, business_type, payment_method_type, signup_surface
Aggregated integrity signals	8	prior_disabled_assets, linked_entity_risk_score, ip_reputation_bucket
Simple interaction candidates	5	spend_per_device, attempts_per_payment_method, linked_assets_per_admin

Size: 420K ad accounts, 36 features
Target: Binary — enforcement within 14 days (1) vs no enforcement (0)
Class balance: 8.7% positive, 91.3% negative
Missing data: 12% missing in linked-entity signals, 4% missing in payment metadata

Success Criteria

A solution is good enough if it:

achieves PR-AUC >= 0.34 on the held-out test set,
maintains recall >= 0.70 at an operating precision of at least 0.25,
and provides a defensible explanation for choosing logistic regression over a tree-based model in this setting.

Constraints

Scores are used in a reviewer-assist workflow, so interpretability and calibration matter.
Batch scoring runs every 15 minutes on fresh account creations; p95 inference should be < 20 ms per 1K rows.
The model is retrained weekly and must be stable enough for policy and ops teams to monitor coefficient or feature drift.

Deliverables

Train a regularized logistic regression baseline and a tree-based benchmark.
Compare them on PR-AUC, ROC-AUC, recall at fixed precision, calibration, and latency.
Explain when logistic regression should be preferred for this Meta integrity problem.
Describe preprocessing, feature engineering, threshold selection, and validation strategy.
Recommend a production model and justify the tradeoff between accuracy, interpretability, and operational simplicity.

Your Answer

Choose Logistic Regression for Integrity Risk | Dataford Interview Questions - Dataford - Ace your Interview