Business Context
SnapCart, a grocery delivery app, uses an image classifier to identify products from shelf photos uploaded by store associates. The current teacher model is accurate but too large and slow for on-device inference, so the ML team wants a smaller student model trained with knowledge distillation.
Dataset
The training set contains labeled product images collected from stores across 12 regions.
| Feature Group | Count | Examples |
|---|
| Image pixels | 1 input tensor | 224x224 RGB product images |
| Metadata | 3 | store_region, camera_type, lighting_bucket |
| Label | 1 target | product_category_id |
- Size: 1.2M training images, 120K validation images, 120K test images
- Classes: 250 product categories
- Class balance: Moderately imbalanced; top 20 classes represent 58% of samples, tail classes have fewer than 1,000 images each
- Missing data: 8% missing metadata fields; images include blur, glare, and partial occlusion
Success Criteria
A good solution should preserve most of the teacher's accuracy while reducing model size and latency enough for mobile deployment. Target at least 92% top-1 accuracy, 95% of teacher accuracy, and p95 inference latency under 25 ms on a mid-range device.
Constraints
- Student model must be under 20 MB after export
- Inference runs on-device, so latency and memory matter more than absolute peak accuracy
- The solution should explain when distillation helps beyond standard supervised training
- Training budget allows one large offline teacher and weekly student retraining
Deliverables
- Define knowledge distillation and explain the role of teacher logits, soft targets, and temperature.
- Build a student training pipeline that combines hard-label cross-entropy with distillation loss.
- Compare a distilled student against the same student trained only on ground-truth labels.
- Report top-1 accuracy, macro-F1, model size, and latency.
- Describe production tradeoffs, including calibration, tail-class performance, and deployment constraints.