Business Context
FitVision trains a neural network to predict daily calorie burn from wearable sensor data for 8 million active users. The ML team wants to standardize optimizer selection and needs a clear, empirical comparison of Adam vs SGD under realistic training constraints.
Dataset
You are given a supervised regression dataset built from 12 months of wearable telemetry and profile features.
| Feature Group | Count | Examples |
|---|
| Continuous sensor aggregates | 18 | avg_heart_rate, resting_hr, steps_24h, active_minutes |
| User profile | 6 | age, weight_kg, height_cm, sex, device_type |
| Sleep and recovery | 7 | sleep_hours, sleep_efficiency, hrv_score |
| Temporal/context | 5 | day_of_week, weekend_flag, workout_count_7d |
| | |
- Size: 240K user-day records, 36 input features
- Target: Daily calorie burn (continuous)
- Data quality: 6% missing in recovery metrics, 2% missing in sensor aggregates due to sync failures
- Outliers: Some extreme workout days and device glitches
Success Criteria
A strong solution should:
- Compare Adam and SGD on the same model architecture and preprocessing pipeline
- Reach test RMSE < 115 kcal and show a justified optimizer choice
- Report convergence speed, final validation performance, and training stability
- Explain when Adam is preferable and when SGD may generalize better
Constraints
- Training must finish within 30 minutes on a single CPU or modest GPU
- The final model will be retrained weekly
- Inference latency must stay under 20 ms per example
- The approach should be simple enough for another engineer to maintain
Deliverables
- Build a neural-network regression pipeline with identical architecture for both optimizers
- Implement preprocessing for missing values, categorical encoding, and feature scaling
- Train and compare Adam vs SGD using the same train/validation/test split
- Evaluate with RMSE, MAE, and convergence behavior across epochs
- Recommend one optimizer for production and explain the tradeoffs in learning rate sensitivity, convergence speed, and generalization