Distill Image Classifier for Mobile

Business Context

SnapCart, a grocery delivery app, uses an image classifier to identify products from shelf photos uploaded by store associates. The current teacher model is accurate but too large and slow for on-device inference, so the ML team wants a smaller student model trained with knowledge distillation.

Dataset

The training set contains labeled product images collected from stores across 12 regions.

Feature Group	Count	Examples
Image pixels	1 input tensor	224x224 RGB product images
Metadata	3	store_region, camera_type, lighting_bucket
Label	1 target	product_category_id

Size: 1.2M training images, 120K validation images, 120K test images
Classes: 250 product categories
Class balance: Moderately imbalanced; top 20 classes represent 58% of samples, tail classes have fewer than 1,000 images each
Missing data: 8% missing metadata fields; images include blur, glare, and partial occlusion

Success Criteria

A good solution should preserve most of the teacher's accuracy while reducing model size and latency enough for mobile deployment. Target at least 92% top-1 accuracy, 95% of teacher accuracy, and p95 inference latency under 25 ms on a mid-range device.

Constraints

Student model must be under 20 MB after export
Inference runs on-device, so latency and memory matter more than absolute peak accuracy
The solution should explain when distillation helps beyond standard supervised training
Training budget allows one large offline teacher and weekly student retraining

Deliverables

Define knowledge distillation and explain the role of teacher logits, soft targets, and temperature.
Build a student training pipeline that combines hard-label cross-entropy with distillation loss.
Compare a distilled student against the same student trained only on ground-truth labels.
Report top-1 accuracy, macro-F1, model size, and latency.
Describe production tradeoffs, including calibration, tail-class performance, and deployment constraints.

Business Context

Dataset

The training set contains labeled product images collected from stores across 12 regions.

Feature Group	Count	Examples
Image pixels	1 input tensor	224x224 RGB product images
Metadata	3	store_region, camera_type, lighting_bucket
Label	1 target	product_category_id

Size: 1.2M training images, 120K validation images, 120K test images
Classes: 250 product categories
Class balance: Moderately imbalanced; top 20 classes represent 58% of samples, tail classes have fewer than 1,000 images each
Missing data: 8% missing metadata fields; images include blur, glare, and partial occlusion

Success Criteria

Constraints

Student model must be under 20 MB after export
Inference runs on-device, so latency and memory matter more than absolute peak accuracy
The solution should explain when distillation helps beyond standard supervised training
Training budget allows one large offline teacher and weekly student retraining

Deliverables

Define knowledge distillation and explain the role of teacher logits, soft targets, and temperature.
Build a student training pipeline that combines hard-label cross-entropy with distillation loss.
Compare a distilled student against the same student trained only on ground-truth labels.
Report top-1 accuracy, macro-F1, model size, and latency.
Describe production tradeoffs, including calibration, tail-class performance, and deployment constraints.

Business Context

Dataset

The training set contains labeled product images collected from stores across 12 regions.

Feature Group	Count	Examples
Image pixels	1 input tensor	224x224 RGB product images
Metadata	3	store_region, camera_type, lighting_bucket
Label	1 target	product_category_id

Size: 1.2M training images, 120K validation images, 120K test images
Classes: 250 product categories
Class balance: Moderately imbalanced; top 20 classes represent 58% of samples, tail classes have fewer than 1,000 images each
Missing data: 8% missing metadata fields; images include blur, glare, and partial occlusion

Success Criteria

Constraints

Student model must be under 20 MB after export
Inference runs on-device, so latency and memory matter more than absolute peak accuracy
The solution should explain when distillation helps beyond standard supervised training
Training budget allows one large offline teacher and weekly student retraining

Deliverables

Define knowledge distillation and explain the role of teacher logits, soft targets, and temperature.
Build a student training pipeline that combines hard-label cross-entropy with distillation loss.
Compare a distilled student against the same student trained only on ground-truth labels.
Report top-1 accuracy, macro-F1, model size, and latency.
Describe production tradeoffs, including calibration, tail-class performance, and deployment constraints.

Business Context

Dataset

The training set contains labeled product images collected from stores across 12 regions.

Feature Group	Count	Examples
Image pixels	1 input tensor	224x224 RGB product images
Metadata	3	store_region, camera_type, lighting_bucket
Label	1 target	product_category_id

Size: 1.2M training images, 120K validation images, 120K test images
Classes: 250 product categories
Class balance: Moderately imbalanced; top 20 classes represent 58% of samples, tail classes have fewer than 1,000 images each
Missing data: 8% missing metadata fields; images include blur, glare, and partial occlusion

Success Criteria

Constraints

Student model must be under 20 MB after export
Inference runs on-device, so latency and memory matter more than absolute peak accuracy
The solution should explain when distillation helps beyond standard supervised training
Training budget allows one large offline teacher and weekly student retraining

Deliverables

Define knowledge distillation and explain the role of teacher logits, soft targets, and temperature.
Build a student training pipeline that combines hard-label cross-entropy with distillation loss.
Compare a distilled student against the same student trained only on ground-truth labels.
Report top-1 accuracy, macro-F1, model size, and latency.
Describe production tradeoffs, including calibration, tail-class performance, and deployment constraints.

Interview Guides

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Distill Image Classifier for Mobile

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Your Answer

Distill Image Classifier for Mobile

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Distill Image Classifier for Mobile

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Your Answer