Predict Equipment Failure in Manufacturing

Business Context

NorthRiver Manufacturing operates 1,200 industrial pumps across 18 plants and wants to predict whether a machine will fail within the next 7 days so maintenance can be scheduled before unplanned downtime occurs. False negatives are costly because a missed failure can stop a production line, but too many false positives create unnecessary maintenance work.

Dataset

You are given a historical predictive maintenance dataset built from sensor readings, maintenance logs, and machine metadata.

Feature Group	Count	Examples
Sensor aggregates	18	temperature_mean_24h, vibration_std_6h, pressure_max_12h, current_draw_trend
Operating context	7	load_pct, ambient_temp, shift_type, runtime_hours
Maintenance history	6	days_since_last_service, parts_replaced_90d, prior_failures_12m
Asset metadata	5	machine_type, manufacturer, install_age_days, plant_id
Derived trend features	8	rolling_slope_vibration_3d, temp_to_load_ratio, anomaly_count_24h

Size: 96K machine-day records across 24 months, 44 features
Target: Binary label indicating failure within the next 7 days
Class balance: 6.5% positive, 93.5% negative
Missing data: ~12% missing in some sensor windows due to telemetry dropouts; 4% missing in maintenance fields for newly onboarded assets

Success Criteria

A strong solution should achieve recall >= 0.80 on failure events while keeping precision >= 0.35 and PR-AUC >= 0.40 on a held-out time-based test set. The candidate should explain how model complexity affects underfitting vs overfitting in this maintenance setting.

Constraints

Predictions run every 6 hours in batch for all active machines
Maintenance managers need a reasonably interpretable model and feature importance
Training data is temporal, so leakage must be avoided
Inference should remain under 200 ms per machine in offline batch scoring

Deliverables

Build and compare at least one low-variance model and one higher-capacity model
Explain the bias-variance trade-off for predictive maintenance and how it appears in train vs validation performance
Design a leakage-safe validation strategy for temporal data
Recommend a final model and decision threshold for operations use
Describe how you would monitor drift and retrain the model in production

Business Context

Dataset

You are given a historical predictive maintenance dataset built from sensor readings, maintenance logs, and machine metadata.

Feature Group	Count	Examples
Sensor aggregates	18	temperature_mean_24h, vibration_std_6h, pressure_max_12h, current_draw_trend
Operating context	7	load_pct, ambient_temp, shift_type, runtime_hours
Maintenance history	6	days_since_last_service, parts_replaced_90d, prior_failures_12m
Asset metadata	5	machine_type, manufacturer, install_age_days, plant_id
Derived trend features	8	rolling_slope_vibration_3d, temp_to_load_ratio, anomaly_count_24h

Size: 96K machine-day records across 24 months, 44 features
Target: Binary label indicating failure within the next 7 days
Class balance: 6.5% positive, 93.5% negative
Missing data: ~12% missing in some sensor windows due to telemetry dropouts; 4% missing in maintenance fields for newly onboarded assets

Success Criteria

Constraints

Predictions run every 6 hours in batch for all active machines
Maintenance managers need a reasonably interpretable model and feature importance
Training data is temporal, so leakage must be avoided
Inference should remain under 200 ms per machine in offline batch scoring

Deliverables

Build and compare at least one low-variance model and one higher-capacity model
Explain the bias-variance trade-off for predictive maintenance and how it appears in train vs validation performance
Design a leakage-safe validation strategy for temporal data
Recommend a final model and decision threshold for operations use
Describe how you would monitor drift and retrain the model in production

Business Context

Dataset

You are given a historical predictive maintenance dataset built from sensor readings, maintenance logs, and machine metadata.

Feature Group	Count	Examples
Sensor aggregates	18	temperature_mean_24h, vibration_std_6h, pressure_max_12h, current_draw_trend
Operating context	7	load_pct, ambient_temp, shift_type, runtime_hours
Maintenance history	6	days_since_last_service, parts_replaced_90d, prior_failures_12m
Asset metadata	5	machine_type, manufacturer, install_age_days, plant_id
Derived trend features	8	rolling_slope_vibration_3d, temp_to_load_ratio, anomaly_count_24h

Size: 96K machine-day records across 24 months, 44 features
Target: Binary label indicating failure within the next 7 days
Class balance: 6.5% positive, 93.5% negative
Missing data: ~12% missing in some sensor windows due to telemetry dropouts; 4% missing in maintenance fields for newly onboarded assets

Success Criteria

Constraints

Predictions run every 6 hours in batch for all active machines
Maintenance managers need a reasonably interpretable model and feature importance
Training data is temporal, so leakage must be avoided
Inference should remain under 200 ms per machine in offline batch scoring

Deliverables

Build and compare at least one low-variance model and one higher-capacity model
Explain the bias-variance trade-off for predictive maintenance and how it appears in train vs validation performance
Design a leakage-safe validation strategy for temporal data
Recommend a final model and decision threshold for operations use
Describe how you would monitor drift and retrain the model in production

Business Context

Dataset

You are given a historical predictive maintenance dataset built from sensor readings, maintenance logs, and machine metadata.

Feature Group	Count	Examples
Sensor aggregates	18	temperature_mean_24h, vibration_std_6h, pressure_max_12h, current_draw_trend
Operating context	7	load_pct, ambient_temp, shift_type, runtime_hours
Maintenance history	6	days_since_last_service, parts_replaced_90d, prior_failures_12m
Asset metadata	5	machine_type, manufacturer, install_age_days, plant_id
Derived trend features	8	rolling_slope_vibration_3d, temp_to_load_ratio, anomaly_count_24h

Size: 96K machine-day records across 24 months, 44 features
Target: Binary label indicating failure within the next 7 days
Class balance: 6.5% positive, 93.5% negative
Missing data: ~12% missing in some sensor windows due to telemetry dropouts; 4% missing in maintenance fields for newly onboarded assets

Success Criteria

Constraints

Predictions run every 6 hours in batch for all active machines
Maintenance managers need a reasonably interpretable model and feature importance
Training data is temporal, so leakage must be avoided
Inference should remain under 200 ms per machine in offline batch scoring

Deliverables

Build and compare at least one low-variance model and one higher-capacity model
Explain the bias-variance trade-off for predictive maintenance and how it appears in train vs validation performance
Design a leakage-safe validation strategy for temporal data
Recommend a final model and decision threshold for operations use
Describe how you would monitor drift and retrain the model in production

Interview Guides

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Predict Equipment Failure in Manufacturing

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Your Answer

Predict Equipment Failure in Manufacturing

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Predict Equipment Failure in Manufacturing

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Your Answer