Autodesk’s internal data platform supports analytics and ML workloads for products such as Fusion 360. The data engineering team wants a model that predicts whether a scheduled pipeline run will fail before execution completes, so operators can prioritize intervention and reduce downstream SLA misses.
You are given historical metadata for 420,000 pipeline runs collected from Autodesk Data Platform orchestration logs over 18 months. Each row represents one scheduled job run, with the target indicating whether the run failed within the execution window.
| Feature Group | Count | Examples |
|---|---|---|
| Orchestration metadata | 12 | schedule_hour, retry_count, upstream_dependency_count, backfill_flag |
| Runtime statistics | 10 | avg_runtime_7d, p95_runtime_30d, input_row_count, output_row_count |
| Data quality signals | 8 | null_rate_delta, schema_change_flag, freshness_delay_minutes |
| Infrastructure signals | 9 | cluster_type, worker_count, cpu_utilization_prev_run, memory_spill_flag |
| Ownership and domain | 6 | team_name, pipeline_tier, source_system, region |
failed_run (1 if the pipeline run failed, 0 otherwise)A good solution should identify likely failures early enough to support alert routing and triage. Aim for PR-AUC above 0.45, recall above 75% at precision above 35%, and clear feature-level explanations that a data engineering team can act on.
failed_run.