NovaThera runs a high-throughput biology platform that executes ~40,000 wet-lab experiments per month. The R&D leadership team wants a model that predicts whether an assay run will succeed, but the senior experimentalist approving deployment is skeptical of black-box AI and requires a clear, evidence-based explanation of the model.
You are given historical assay-run data collected over 24 months.
| Feature Group | Count | Examples |
|---|---|---|
| Assay setup | 12 | assay_type, reagent_lot, plate_format, incubation_minutes |
| Instrument telemetry | 18 | temperature_mean, pressure_std, dispense_error_rate, calibration_score |
| Sample metadata | 10 | sample_source, concentration_ng_ml, storage_days, operator_experience |
| Quality controls | 8 | control_signal_mean, control_cv, contamination_flag, baseline_drift |
| Derived features | 6 | temp_range, signal_to_noise, reagent_age_days, run_order_within_batch |
A good solution should achieve strong predictive performance while producing explanations a senior experimentalist can validate against domain knowledge. The model should reach ROC-AUC >= 0.86, PR-AUC >= 0.74, and provide both global and per-run explanations.