Dataford
Interview Guides
Upgrade
All questions/Machine Learning/Engineer Features for Delivery Delay Prediction

Engineer Features for Delivery Delay Prediction

Easy
Machine Learning
Asked at 1 company1Supervised LearningCross-ValidationFeature Engineering
Also asked at
Avenue Code

Problem

Business Context

Avenue Code’s delivery operations team wants a lightweight model to predict whether a client delivery ticket will miss its promised SLA. The goal is not only to train a classifier, but to demonstrate strong feature engineering on messy operational data that includes timestamps, categorical fields, and partially missing inputs.

Dataset

You are given a historical dataset of delivery tickets exported from an Avenue Code internal operations workflow. Each row represents one ticket at the moment it was assigned to a delivery squad.

Feature GroupCountExamples
Numeric operational metrics12estimated_hours, prior_revisions, team_load, client_tenure_months
Categorical attributes9region, service_line, priority, squad_id, client_segment
Temporal fields6created_at, assigned_at, due_at, day_of_week, hour_of_day
Text-derived flags4has_urgent_keyword, request_length, has_attachment, contains_change_request
Target1sla_missed
  • Size: 48K tickets over 18 months, 31 raw features
  • Target: Binary classification — whether the ticket missed SLA
  • Class balance: 21% positive, 79% negative
  • Missing data: 18% missing in estimated_hours, 11% in client_tenure_months, 7% in text-derived fields

Success Criteria

A good solution should improve materially over a raw-feature baseline by using thoughtful feature engineering. Aim for ROC-AUC >= 0.82 and F1 >= 0.60 on the held-out test set, while keeping the pipeline interpretable enough for operations managers to review the main drivers.

Constraints

  • Batch scoring only; inference should complete in under 5 minutes for 10K daily tickets
  • Avoid leakage from future timestamps or post-assignment information
  • Prefer features that can be recomputed reliably in production
  • Keep the solution simple enough to maintain by a small ML platform team

Deliverables

  1. Build a reproducible feature engineering pipeline for numeric, categorical, and temporal data.
  2. Train at least one baseline model and one improved model using engineered features.
  3. Explain which engineered features are most useful and why.
  4. Evaluate the model with appropriate classification metrics and threshold selection.
  5. Identify leakage risks and productionization considerations for the feature pipeline.

Problem

Business Context

Avenue Code’s delivery operations team wants a lightweight model to predict whether a client delivery ticket will miss its promised SLA. The goal is not only to train a classifier, but to demonstrate strong feature engineering on messy operational data that includes timestamps, categorical fields, and partially missing inputs.

Dataset

You are given a historical dataset of delivery tickets exported from an Avenue Code internal operations workflow. Each row represents one ticket at the moment it was assigned to a delivery squad.

Feature GroupCountExamples
Numeric operational metrics12estimated_hours, prior_revisions, team_load, client_tenure_months
Categorical attributes9region, service_line, priority, squad_id, client_segment
Temporal fields6created_at, assigned_at, due_at, day_of_week, hour_of_day
Text-derived flags4has_urgent_keyword, request_length, has_attachment, contains_change_request
Target1sla_missed
  • Size: 48K tickets over 18 months, 31 raw features
  • Target: Binary classification — whether the ticket missed SLA
  • Class balance: 21% positive, 79% negative
  • Missing data: 18% missing in estimated_hours, 11% in client_tenure_months, 7% in text-derived fields

Success Criteria

A good solution should improve materially over a raw-feature baseline by using thoughtful feature engineering. Aim for ROC-AUC >= 0.82 and F1 >= 0.60 on the held-out test set, while keeping the pipeline interpretable enough for operations managers to review the main drivers.

Constraints

  • Batch scoring only; inference should complete in under 5 minutes for 10K daily tickets
  • Avoid leakage from future timestamps or post-assignment information
  • Prefer features that can be recomputed reliably in production
  • Keep the solution simple enough to maintain by a small ML platform team

Deliverables

  1. Build a reproducible feature engineering pipeline for numeric, categorical, and temporal data.
  2. Train at least one baseline model and one improved model using engineered features.
  3. Explain which engineered features are most useful and why.
  4. Evaluate the model with appropriate classification metrics and threshold selection.
  5. Identify leakage risks and productionization considerations for the feature pipeline.
Your answer
Try one AI text evaluation on us
Get structured feedback, scored against a 4-axis rubric. Premium unlocks unlimited.
0 wordstarget ~200
Up next
SteampunkEngineer Features for ServiceNow EscalationsMediumCollaberaEngineer Features for Consultant AttritionMediumDONE by NONEEngineer Features for Seller ChurnMedium
Next question