Business Context
ShopVision, an e-commerce catalog platform, wants a junior ML engineer to explain and prototype a convolutional neural network (CNN) for classifying product images into catalog categories. The goal is not just to define CNN layers, but to show how the architecture maps to a real image classification workflow and how you would evaluate it in practice.
Dataset
You are given a labeled image dataset of product photos collected from the mobile seller app.
| Feature Group | Count | Examples |
|---|
| Image pixels | 150,000 images | RGB images resized to 128x128 |
| Metadata (optional, not required) | 3 | upload_device, aspect_ratio, brightness_score |
| Labels | 12 classes | shoes, bags, watches, shirts, electronics |
- Size: 150K images, 12 target classes
- Target: Multiclass product category
- Class balance: Moderately imbalanced; largest class is 18%, smallest is 4%
- Missing data: ~2% corrupted or unreadable images; some images have inconsistent lighting/backgrounds
Success Criteria
A good solution should:
- Achieve top-1 accuracy >= 82% on a held-out test set
- Clearly explain the role of convolution, activation, pooling, and fully connected layers
- Show a training pipeline that can run on a single GPU within a reasonable time budget
- Include overfitting controls and a clear evaluation approach
Constraints
- Inference latency should stay under 30 ms/image in batch serving
- The model should be simple enough for an interview explanation
- Training budget is limited to a single mid-range GPU
- The team prefers an architecture that is easy to debug and extend
Deliverables
- Explain the core CNN architecture and why each layer is used.
- Build a baseline CNN for multiclass image classification.
- Describe preprocessing and augmentation choices.
- Evaluate the model with appropriate classification metrics.
- Discuss tradeoffs between a simple custom CNN and transfer learning.