Business Context
OptiManufacture, a mid-sized manufacturing company with annual revenues of $50M, aims to optimize its production process to reduce costs and improve yield. The company has traditionally relied on first-principles models based on physical laws governing production but recognizes the potential of data-driven approaches to enhance accuracy and adapt to variability in operations.
Dataset
| Feature Group | Count | Examples |
|---|
| Sensor Data | 100K | temperature, pressure, humidity, vibration |
| Process Info | 50 | machine_id, operation_time, maintenance_history |
| Quality Metrics | 20 | defect_rate, yield_percentage, downtime_hours |
- Size: 150K records, 170 features
- Target: Continuous variable — yield percentage of the product
- Class balance: Continuous data, no class imbalance
- Missing data: 10% missing in sensor readings, 5% in process info due to equipment failures
Requirements
- Develop a hybrid model combining first-principles and data-driven approaches to predict yield percentage.
- Achieve a minimum R² score of 0.85 on validation data.
- Provide a detailed explanation of how the first-principles model informs the data-driven model and vice versa.
- Implement feature engineering strategies to enhance model performance.
- Address missing data appropriately and justify your approach.
Constraints
- The model must be interpretable to allow engineers to understand the predictions.
- Inference time must not exceed 2 seconds per record to support real-time decision-making.