Optimize Factory Defect Models Under Constraints

Business Context

VoltForge manufactures industrial motor controllers across 12 production lines and inspects roughly 1.8 million units per month. The quality team wants a model that predicts whether a unit will fail final inspection, but the solution must be optimized for plant constraints: false negatives are expensive, inference must run on edge devices, and engineers need understandable drivers of risk.

Dataset

You are given one year of unit-level production data collected from MES, sensor logs, and operator records.

Feature Group	Count	Examples
Process measurements	18	solder_temp_mean, torque_std, cycle_time_sec, humidity_pct
Equipment metadata	7	line_id, machine_id, tool_version, maintenance_age_days
Material / supplier	6	supplier_id, lot_age_days, pcb_batch_grade
Operator / shift	5	shift, operator_tenure_days, overtime_flag
Quality history	6	prior_line_defect_rate_7d, rework_rate_30d, calibration_gap_days

Rows: 420K manufactured units, 42 features
Target: failed_final_inspection (1 = defective, 0 = passed)
Class balance: 4.6% defective, 95.4% non-defective
Missing data: ~12% missing in sensor-derived features during maintenance windows; ~4% missing in operator fields for temporary staff

Success Criteria

A good solution should:

achieve recall >= 0.85 on defective units,
maintain precision >= 0.30 to avoid overwhelming manual review,
keep batch scoring latency under 5 minutes for 200K units on a CPU-only edge server,
provide feature-level explanations usable by manufacturing engineers.

Constraints

No GPU at inference time
Model retrains weekly
Predictions are used to trigger secondary inspection, so threshold selection must reflect inspection capacity
The plant prefers simpler models if performance is within 2-3 points of a more complex alternative

Deliverables

Build and compare at least two classification approaches suitable for tabular industrial data.
Design preprocessing for mixed feature types and missing values.
Choose an evaluation strategy that reflects class imbalance and temporal production drift.
Select an operating threshold based on plant inspection capacity and defect recall.
Explain the final model choice in terms of accuracy, latency, interpretability, and operational fit.

Business Context

Dataset

You are given one year of unit-level production data collected from MES, sensor logs, and operator records.

Feature Group	Count	Examples
Process measurements	18	solder_temp_mean, torque_std, cycle_time_sec, humidity_pct
Equipment metadata	7	line_id, machine_id, tool_version, maintenance_age_days
Material / supplier	6	supplier_id, lot_age_days, pcb_batch_grade
Operator / shift	5	shift, operator_tenure_days, overtime_flag
Quality history	6	prior_line_defect_rate_7d, rework_rate_30d, calibration_gap_days

Rows: 420K manufactured units, 42 features
Target: failed_final_inspection (1 = defective, 0 = passed)
Class balance: 4.6% defective, 95.4% non-defective
Missing data: ~12% missing in sensor-derived features during maintenance windows; ~4% missing in operator fields for temporary staff

Success Criteria

A good solution should:

achieve recall >= 0.85 on defective units,
maintain precision >= 0.30 to avoid overwhelming manual review,
keep batch scoring latency under 5 minutes for 200K units on a CPU-only edge server,
provide feature-level explanations usable by manufacturing engineers.

Constraints

No GPU at inference time
Model retrains weekly
Predictions are used to trigger secondary inspection, so threshold selection must reflect inspection capacity
The plant prefers simpler models if performance is within 2-3 points of a more complex alternative

Deliverables

Build and compare at least two classification approaches suitable for tabular industrial data.
Design preprocessing for mixed feature types and missing values.
Choose an evaluation strategy that reflects class imbalance and temporal production drift.
Select an operating threshold based on plant inspection capacity and defect recall.
Explain the final model choice in terms of accuracy, latency, interpretability, and operational fit.

Business Context

Dataset

You are given one year of unit-level production data collected from MES, sensor logs, and operator records.

Feature Group	Count	Examples
Process measurements	18	solder_temp_mean, torque_std, cycle_time_sec, humidity_pct
Equipment metadata	7	line_id, machine_id, tool_version, maintenance_age_days
Material / supplier	6	supplier_id, lot_age_days, pcb_batch_grade
Operator / shift	5	shift, operator_tenure_days, overtime_flag
Quality history	6	prior_line_defect_rate_7d, rework_rate_30d, calibration_gap_days

Rows: 420K manufactured units, 42 features
Target: failed_final_inspection (1 = defective, 0 = passed)
Class balance: 4.6% defective, 95.4% non-defective
Missing data: ~12% missing in sensor-derived features during maintenance windows; ~4% missing in operator fields for temporary staff

Success Criteria

A good solution should:

achieve recall >= 0.85 on defective units,
maintain precision >= 0.30 to avoid overwhelming manual review,
keep batch scoring latency under 5 minutes for 200K units on a CPU-only edge server,
provide feature-level explanations usable by manufacturing engineers.

Constraints

No GPU at inference time
Model retrains weekly
Predictions are used to trigger secondary inspection, so threshold selection must reflect inspection capacity
The plant prefers simpler models if performance is within 2-3 points of a more complex alternative

Deliverables

Build and compare at least two classification approaches suitable for tabular industrial data.
Design preprocessing for mixed feature types and missing values.
Choose an evaluation strategy that reflects class imbalance and temporal production drift.
Select an operating threshold based on plant inspection capacity and defect recall.
Explain the final model choice in terms of accuracy, latency, interpretability, and operational fit.

Business Context

Dataset

You are given one year of unit-level production data collected from MES, sensor logs, and operator records.

Feature Group	Count	Examples
Process measurements	18	solder_temp_mean, torque_std, cycle_time_sec, humidity_pct
Equipment metadata	7	line_id, machine_id, tool_version, maintenance_age_days
Material / supplier	6	supplier_id, lot_age_days, pcb_batch_grade
Operator / shift	5	shift, operator_tenure_days, overtime_flag
Quality history	6	prior_line_defect_rate_7d, rework_rate_30d, calibration_gap_days

Rows: 420K manufactured units, 42 features
Target: failed_final_inspection (1 = defective, 0 = passed)
Class balance: 4.6% defective, 95.4% non-defective
Missing data: ~12% missing in sensor-derived features during maintenance windows; ~4% missing in operator fields for temporary staff

Success Criteria

A good solution should:

achieve recall >= 0.85 on defective units,
maintain precision >= 0.30 to avoid overwhelming manual review,
keep batch scoring latency under 5 minutes for 200K units on a CPU-only edge server,
provide feature-level explanations usable by manufacturing engineers.

Constraints

No GPU at inference time
Model retrains weekly
Predictions are used to trigger secondary inspection, so threshold selection must reflect inspection capacity
The plant prefers simpler models if performance is within 2-3 points of a more complex alternative

Deliverables

Build and compare at least two classification approaches suitable for tabular industrial data.
Design preprocessing for mixed feature types and missing values.
Choose an evaluation strategy that reflects class imbalance and temporal production drift.
Select an operating threshold based on plant inspection capacity and defect recall.
Explain the final model choice in terms of accuracy, latency, interpretability, and operational fit.

Interview Guides

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Optimize Factory Defect Models Under Constraints

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Your Answer

Optimize Factory Defect Models Under Constraints

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Optimize Factory Defect Models Under Constraints

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Your Answer