
A
You’re interviewing for an ML engineering role at VoltGrid, a logistics-and-energy operator that runs 18,000 industrial pumps and compressors across 420 facilities (ports, warehouses, and fuel depots). Unplanned failures can halt operations and trigger environmental compliance incidents. A single pump failure costs $30K–$120K in downtime and emergency repair, but unnecessary preventive maintenance also has a real cost: each truck roll is ~$900 plus lost utilization.
VoltGrid has instrumented assets with high-frequency sensors and maintains a CMMS (maintenance ticketing) system. The VP of Reliability wants a model that can predict whether an asset will fail in the next 7 days, so planners can schedule maintenance during low-demand windows.
The core challenge: failures are extremely rare and labels are noisy (some failures are never logged; some tickets are mislabeled as “failure”). You must propose a robust approach that works in production.
You are given a feature table built from raw telemetry and maintenance logs. Each row is an asset-day snapshot.
| Feature Group | Count | Examples | Notes |
|---|---|---|---|
| Sensor aggregates | 28 | mean_vibration_1h, rms_vibration_24h, temp_max_24h, pressure_std_6h | Aggregated from 1 Hz telemetry into rolling windows |
| Trend / change features | 14 | vib_slope_7d, temp_delta_24h, pressure_drift_3d | Computed per asset; sensitive to missingness |
| Operating context | 9 | load_pct, rpm_avg_24h, start_stop_count_24h, ambient_temp | Context shifts across facilities |
| Asset metadata | 8 | asset_type, manufacturer, install_age_days, facility_id | High-cardinality facility_id |
| Maintenance history | 11 | days_since_last_service, last_service_type, tickets_90d, parts_replaced_180d | Derived from CMMS |
Additional details:
fail_7d = 1 if a confirmed failure occurs within the next 7 days, else 0Your model will be used to generate a daily risk list.
You’re interviewing for an ML engineering role at VoltGrid, a logistics-and-energy operator that runs 18,000 industrial pumps and compressors across 420 facilities (ports, warehouses, and fuel depots). Unplanned failures can halt operations and trigger environmental compliance incidents. A single pump failure costs $30K–$120K in downtime and emergency repair, but unnecessary preventive maintenance also has a real cost: each truck roll is ~$900 plus lost utilization.
VoltGrid has instrumented assets with high-frequency sensors and maintains a CMMS (maintenance ticketing) system. The VP of Reliability wants a model that can predict whether an asset will fail in the next 7 days, so planners can schedule maintenance during low-demand windows.
The core challenge: failures are extremely rare and labels are noisy (some failures are never logged; some tickets are mislabeled as “failure”). You must propose a robust approach that works in production.
You are given a feature table built from raw telemetry and maintenance logs. Each row is an asset-day snapshot.
| Feature Group | Count | Examples | Notes |
|---|---|---|---|
| Sensor aggregates | 28 | mean_vibration_1h, rms_vibration_24h, temp_max_24h, pressure_std_6h | Aggregated from 1 Hz telemetry into rolling windows |
| Trend / change features | 14 | vib_slope_7d, temp_delta_24h, pressure_drift_3d | Computed per asset; sensitive to missingness |
| Operating context | 9 | load_pct, rpm_avg_24h, start_stop_count_24h, ambient_temp | Context shifts across facilities |
| Asset metadata | 8 | asset_type, manufacturer, install_age_days, facility_id | High-cardinality facility_id |
| Maintenance history | 11 | days_since_last_service, last_service_type, tickets_90d, parts_replaced_180d | Derived from CMMS |
Additional details:
fail_7d = 1 if a confirmed failure occurs within the next 7 days, else 0Your model will be used to generate a daily risk list.