Choose Compute for Vision Training

Business Context

ShopSight trains image classification models to detect catalog quality issues before products go live. The ML team needs a repeatable way to choose between CPU, single-GPU, multi-GPU, and TPU training environments for different model sizes and delivery deadlines.

Dataset

You are given a product-image classification dataset used for offline training experiments.

Feature Group	Count	Examples
Image inputs	1	224x224 RGB product photos
Labels	1	ok, blurry, wrong_background, watermark, duplicate
Metadata	6	category, seller_tier, image_source, upload_region, width, height
Training logs	8	step_time_ms, gpu_utilization, memory_gb, throughput_img_s, epoch_time_min

Size: 1.2M images, 5 classes, average compressed image size 180 KB
Target: Multiclass classification of catalog image quality issues
Class balance: Moderately imbalanced — 62% ok, remaining 38% split across 4 defect classes
Missing data: ~3% missing metadata fields; no missing labels

Success Criteria

A good solution should recommend compute resources for at least three training scenarios (baseline CNN, ResNet-50 fine-tuning, larger ViT-style model) and justify the choice using measurable tradeoffs:

Validation macro-F1 = 0.84 for the production candidate
End-to-end training time under 8 hours for the selected production setup
Estimated infrastructure cost per full training run under $250
Inference artifact must be deployable to a GPU-backed batch scoring job

Constraints

Budget is limited; overprovisioning expensive accelerators is discouraged
Model retraining happens weekly, so turnaround time matters
The team needs a decision framework, not just the highest-performing model
Candidate should consider memory limits, mixed precision, distributed training overhead, and engineering complexity

Deliverables

Build a benchmark pipeline comparing CPU, GPU, and accelerator-oriented training configurations.
Train at least two model families and measure throughput, training time, validation quality, and estimated cost.
Recommend when to use CPU, single GPU, multi-GPU, or TPU for this workload.
Explain how dataset size, batch size, model architecture, and input pipeline affect compute selection.
Propose a production training setup and a lighter fallback option.

Business Context

Dataset

You are given a product-image classification dataset used for offline training experiments.

Feature Group	Count	Examples
Image inputs	1	224x224 RGB product photos
Labels	1	ok, blurry, wrong_background, watermark, duplicate
Metadata	6	category, seller_tier, image_source, upload_region, width, height
Training logs	8	step_time_ms, gpu_utilization, memory_gb, throughput_img_s, epoch_time_min

Size: 1.2M images, 5 classes, average compressed image size 180 KB
Target: Multiclass classification of catalog image quality issues
Class balance: Moderately imbalanced — 62% ok, remaining 38% split across 4 defect classes
Missing data: ~3% missing metadata fields; no missing labels

Success Criteria

Validation macro-F1 = 0.84 for the production candidate
End-to-end training time under 8 hours for the selected production setup
Estimated infrastructure cost per full training run under $250
Inference artifact must be deployable to a GPU-backed batch scoring job

Constraints

Budget is limited; overprovisioning expensive accelerators is discouraged
Model retraining happens weekly, so turnaround time matters
The team needs a decision framework, not just the highest-performing model
Candidate should consider memory limits, mixed precision, distributed training overhead, and engineering complexity

Deliverables

Build a benchmark pipeline comparing CPU, GPU, and accelerator-oriented training configurations.
Train at least two model families and measure throughput, training time, validation quality, and estimated cost.
Recommend when to use CPU, single GPU, multi-GPU, or TPU for this workload.
Explain how dataset size, batch size, model architecture, and input pipeline affect compute selection.
Propose a production training setup and a lighter fallback option.

Business Context

Dataset

You are given a product-image classification dataset used for offline training experiments.

Feature Group	Count	Examples
Image inputs	1	224x224 RGB product photos
Labels	1	ok, blurry, wrong_background, watermark, duplicate
Metadata	6	category, seller_tier, image_source, upload_region, width, height
Training logs	8	step_time_ms, gpu_utilization, memory_gb, throughput_img_s, epoch_time_min

Size: 1.2M images, 5 classes, average compressed image size 180 KB
Target: Multiclass classification of catalog image quality issues
Class balance: Moderately imbalanced — 62% ok, remaining 38% split across 4 defect classes
Missing data: ~3% missing metadata fields; no missing labels

Success Criteria

Validation macro-F1 = 0.84 for the production candidate
End-to-end training time under 8 hours for the selected production setup
Estimated infrastructure cost per full training run under $250
Inference artifact must be deployable to a GPU-backed batch scoring job

Constraints

Budget is limited; overprovisioning expensive accelerators is discouraged
Model retraining happens weekly, so turnaround time matters
The team needs a decision framework, not just the highest-performing model
Candidate should consider memory limits, mixed precision, distributed training overhead, and engineering complexity

Deliverables

Build a benchmark pipeline comparing CPU, GPU, and accelerator-oriented training configurations.
Train at least two model families and measure throughput, training time, validation quality, and estimated cost.
Recommend when to use CPU, single GPU, multi-GPU, or TPU for this workload.
Explain how dataset size, batch size, model architecture, and input pipeline affect compute selection.
Propose a production training setup and a lighter fallback option.

Business Context

Dataset

You are given a product-image classification dataset used for offline training experiments.

Feature Group	Count	Examples
Image inputs	1	224x224 RGB product photos
Labels	1	ok, blurry, wrong_background, watermark, duplicate
Metadata	6	category, seller_tier, image_source, upload_region, width, height
Training logs	8	step_time_ms, gpu_utilization, memory_gb, throughput_img_s, epoch_time_min

Size: 1.2M images, 5 classes, average compressed image size 180 KB
Target: Multiclass classification of catalog image quality issues
Class balance: Moderately imbalanced — 62% ok, remaining 38% split across 4 defect classes
Missing data: ~3% missing metadata fields; no missing labels

Success Criteria

Validation macro-F1 = 0.84 for the production candidate
End-to-end training time under 8 hours for the selected production setup
Estimated infrastructure cost per full training run under $250
Inference artifact must be deployable to a GPU-backed batch scoring job

Constraints

Budget is limited; overprovisioning expensive accelerators is discouraged
Model retraining happens weekly, so turnaround time matters
The team needs a decision framework, not just the highest-performing model
Candidate should consider memory limits, mixed precision, distributed training overhead, and engineering complexity

Deliverables

Build a benchmark pipeline comparing CPU, GPU, and accelerator-oriented training configurations.
Train at least two model families and measure throughput, training time, validation quality, and estimated cost.
Recommend when to use CPU, single GPU, multi-GPU, or TPU for this workload.
Explain how dataset size, batch size, model architecture, and input pipeline affect compute selection.
Propose a production training setup and a lighter fallback option.

Interview Guides

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Choose Compute for Vision Training

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Your Answer

Choose Compute for Vision Training

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Choose Compute for Vision Training

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Your Answer