Business Context
You’re interviewing with VoltForge, a manufacturer of EV battery modules supplying two major automakers. A missed defect can trigger costly recalls and safety incidents, while too many false alarms slow the production line and increase scrap. VoltForge operates 12 assembly lines, producing ~1.8M modules/month. Each module is photographed at multiple stations; the goal is to flag surface defects (micro-cracks, contamination, misaligned welds) from images.
The challenge: VoltForge has millions of labeled images for an older product line (Gen-2), but only a small labeled dataset for the new line (Gen-3) due to a recent tooling change and a new camera sensor. You must propose and implement a transfer learning approach that achieves strong performance quickly, and explain why it’s better than training from scratch.
Dataset
You are given two datasets:
| Dataset | Size | Labeling | Classes | Notes |
|---|
| Gen-2 (source) | 4.2M images | High quality, mature labels | 5 defect types + “no defect” | Older lighting/camera, stable process |
| Gen-3 (target) | 38K images | Limited labels, some noise | Same label set | New sensor + different reflections |
Additional details:
- Image resolution: 256×256 RGB crops centered on weld seams.
- Target variable: Multi-class classification (6 classes).
- Class balance (Gen-3): Imbalanced — ~92% “no defect”; rarest defect ~0.3%.
- Missing/corrupt data: ~1–2% corrupted image files; ~8% have partial occlusion due to motion blur.
- Domain shift: Gen-3 images have different color temperature and specular highlights.
Success Criteria
Your solution is considered successful if on a held-out Gen-3 test set it achieves:
- Macro F1 ≥ 0.70 across the 6 classes (to avoid ignoring rare defects)
- Recall ≥ 0.85 on the union of defect classes (safety-driven)
- False positive rate ≤ 3% on “no defect” (line throughput constraint)
- Inference latency ≤ 30 ms/image on an NVIDIA T4 (batch size 32)
Constraints
- You have one 16GB GPU for training and a 48-hour experimentation window.
- Model must be deployable in an on-prem environment; avoid overly heavy architectures.
- You must be able to explain to manufacturing engineers what transfer learning is doing (high-level interpretability), and how you’ll monitor drift.
- You cannot use Gen-3 test labels for any tuning.
Deliverables
- Explain transfer learning in this context (what is transferred, why it helps) and compare:
- training from scratch on Gen-3
- using a pretrained ImageNet backbone
- pretraining on Gen-2 then fine-tuning on Gen-3
- Propose a training plan (freezing strategy, learning rates, augmentation, class imbalance handling).
- Provide a PyTorch implementation that trains a transfer learning model and reports the required metrics.
- Describe an evaluation protocol (splits/CV, thresholding, calibration if needed) that avoids leakage.
- Outline a production plan: deployment format, monitoring for domain shift, and retraining triggers.