Business Context
ShopSight trains image classification models to detect catalog quality issues before products go live. The ML team needs a repeatable way to choose between CPU, single-GPU, multi-GPU, and TPU training environments for different model sizes and delivery deadlines.
Dataset
You are given a product-image classification dataset used for offline training experiments.
| Feature Group | Count | Examples |
|---|
| Image inputs | 1 | 224x224 RGB product photos |
| Labels | 1 | ok, blurry, wrong_background, watermark, duplicate |
| Metadata | 6 | category, seller_tier, image_source, upload_region, width, height |
| Training logs | 8 | step_time_ms, gpu_utilization, memory_gb, throughput_img_s, epoch_time_min |
- Size: 1.2M images, 5 classes, average compressed image size 180 KB
- Target: Multiclass classification of catalog image quality issues
- Class balance: Moderately imbalanced — 62%
ok, remaining 38% split across 4 defect classes
- Missing data: ~3% missing metadata fields; no missing labels
Success Criteria
A good solution should recommend compute resources for at least three training scenarios (baseline CNN, ResNet-50 fine-tuning, larger ViT-style model) and justify the choice using measurable tradeoffs:
- Validation macro-F1 = 0.84 for the production candidate
- End-to-end training time under 8 hours for the selected production setup
- Estimated infrastructure cost per full training run under $250
- Inference artifact must be deployable to a GPU-backed batch scoring job
Constraints
- Budget is limited; overprovisioning expensive accelerators is discouraged
- Model retraining happens weekly, so turnaround time matters
- The team needs a decision framework, not just the highest-performing model
- Candidate should consider memory limits, mixed precision, distributed training overhead, and engineering complexity
Deliverables
- Build a benchmark pipeline comparing CPU, GPU, and accelerator-oriented training configurations.
- Train at least two model families and measure throughput, training time, validation quality, and estimated cost.
- Recommend when to use CPU, single GPU, multi-GPU, or TPU for this workload.
- Explain how dataset size, batch size, model architecture, and input pipeline affect compute selection.
- Propose a production training setup and a lighter fallback option.