Business Context
TechCorp, a manufacturer of electronic devices, faces a challenge with its quality control process. The company has collected data from 100K units produced each month, but the data is often noisy due to sensor inaccuracies and human error during data entry. The goal is to build a classification model to predict whether a unit is defective, aiding in proactive quality assurance measures.
Dataset
| Feature Group | Count | Examples |
|---|
| Sensor Data | 15 | temperature, humidity, vibration, voltage |
| Production Info | 10 | production_line, shift, operator_id, machine_id |
| Quality Metrics | 5 | inspection_score, repair_time, defect_type |
- Size: 100K units, 30 features
- Target: Binary — defective (1) vs non-defective (0)
- Class balance: 5% positive (defective), 95% negative (non-defective)
- Missing data: 10% missing in sensor features, 2% in production info
Requirements
- Build a classification model that predicts defective units with high accuracy.
- Achieve at least 75% recall while ensuring precision does not drop below 60%.
- Implement strategies to handle noisy data effectively.
- Provide a detailed analysis of feature importance to inform quality control decisions.
- Explain your choice of model and evaluation strategy.
Constraints
- Model must be interpretable enough for quality engineers to understand the predictions.
- Inference must run in under 1 second per unit to integrate with the production line.