Business Context
AeroGrid operates 1,200 industrial gas turbines across power plants in North America. Unplanned turbine downtime is expensive, so the reliability team wants a model that predicts whether a turbine will require a maintenance intervention within the next 7 days, using historical sensor and maintenance data.
Dataset
The training data is built at the turbine-day level from SCADA logs, alarm events, and maintenance records.
| Feature Group | Count | Examples |
|---|
| Sensor aggregates | 28 | avg_bearing_temp_1h, max_vibration_24h, exhaust_temp_std_6h, oil_pressure_min_12h |
| Operating context | 9 | ambient_temp, load_pct, startup_count_7d, runtime_hours_since_overhaul |
| Alarm/event features | 7 | alarm_count_24h, critical_alarm_flag, trip_events_30d |
| Asset metadata | 6 | turbine_model, site_id, fuel_type, turbine_age_years |
| Maintenance history | 5 | days_since_last_service, prior_failure_count_180d, replaced_component_type |
- Size: 410K turbine-days across 36 months, 55 features
- Target: Binary label indicating whether a turbine has a failure or maintenance-triggering fault within the next 7 days
- Class balance: Highly imbalanced, about 4.6% positive
- Missing data: 12% missing in some sensor windows due to telemetry gaps; 6% missing in maintenance-history fields for newly onboarded turbines
Success Criteria
A production-ready model should achieve strong early-warning performance: PR-AUC above 0.45, recall above 75% at precision of at least 35%, and top-decile lift above 4x versus random ranking.
Constraints
- Predictions run every hour in batch and must score all active turbines in under 5 minutes
- Reliability engineers need interpretable drivers behind each alert
- Data leakage from future maintenance actions or post-failure signals is not allowed
- Model retraining should be feasible monthly on standard cloud CPU instances
Deliverables
- Build a binary classification pipeline to predict 7-day turbine failure risk.
- Explain feature engineering choices for time-windowed sensor data.
- Justify model selection and validation strategy for temporal data.
- Evaluate the model with metrics appropriate for rare-event prediction.
- Describe how you would set an alert threshold for maintenance operations.