Business Context
SnapArt, a startup specializing in AI-generated art, has developed a deep learning model to classify images into various artistic styles. However, the model suffers from overfitting due to a limited dataset of only 5,000 labeled images. The product team aims to improve classification accuracy by leveraging data augmentation techniques to artificially increase the dataset size and enhance generalization.
Dataset
| Feature Group | Count | Examples |
|---|
| Image Data | 5,000 | artistic_style_1.jpg, artistic_style_2.jpg |
- Size: 5,000 images, each 256x256 pixels
- Target: Categorical — artistic style (e.g., Impressionism, Cubism, Abstract)
- Class balance: Approximately balanced across 5 classes (1,000 images per class)
- Missing data: No missing labels, but images may have varying quality
Requirements
- Implement at least three different data augmentation techniques (e.g., rotation, flipping, color jitter).
- Train a convolutional neural network (CNN) on the augmented dataset and evaluate its performance.
- Provide a comparison of model performance with and without data augmentation.
- Discuss how the augmentation techniques affect model training time and accuracy.
Constraints
- The final model must achieve at least 85% accuracy on a validation set of 1,000 images.
- The training process should not exceed 2 hours on standard GPU hardware.