AeroSight monitors telemetry from 1,200 commercial aircraft and wants a model for engine health analytics. Depending on the downstream use case, the team may need either a discrete operational decision or a continuous estimate, so you must decide when classification is appropriate versus regression.
You are given historical flight-level and maintenance data collected over 24 months.
| Feature Group | Count | Examples |
|---|---|---|
| Sensor aggregates | 18 | exhaust_gas_temp_mean, vibration_std, oil_pressure_min, fuel_flow_mean |
| Flight context | 9 | route_length_km, cruise_altitude_ft, outside_air_temp, aircraft_age_years |
| Maintenance history | 6 | days_since_last_inspection, prior_fault_count_90d, component_cycles |
| Categorical metadata | 5 | engine_model, airline_region, mission_type, airport_class |
| Target options depend on the business question: |
Classification target: maintenance_required_7d (1 if the engine required unscheduled maintenance within 7 days, else 0)
Regression target: remaining_cycles_to_service (continuous estimate of cycles until next required service)
Size: 96K flight records, 38 features
Class balance: 11% positive for maintenance_required_7d
Missing data: 8% missing in some sensor aggregates due to intermittent telemetry dropouts; 3% missing in maintenance logs
A good solution should clearly justify when to frame the problem as classification versus regression, build one model for each target, and compare them using appropriate metrics. For classification, target F1 >= 0.68 and recall >= 0.75 on the positive class. For regression, target MAE <= 18 cycles.
maintenance_required_7d.remaining_cycles_to_service.