Interview Guides

Alignment vs Evaluation Tradeoff

Easy

Model Evaluation

Context

An OpenAI research team is reviewing a new GPT-4.1-based assistant variant for customer-facing deployment in ChatGPT. The model was updated with additional instruction tuning and policy-focused preference optimization to improve refusal behavior and reduce unsafe outputs, but product teams report that task completion may have regressed.

Current Performance

Metric	Baseline Model	New Aligned Model	Change
Helpfulness win rate (human eval)	71%	66%	-5 pts
Safety violation rate	2.8%	0.9%	-1.9 pts
Over-refusal rate on benign prompts	6%	18%	+12 pts
Factual accuracy on eval set	84%	81%	-3 pts
Calibration error	0.07	0.11	+0.04
Task completion rate	88%	79%	-9 pts

The Problem

The team wants to distinguish whether these results indicate better alignment, worse model quality, or both. Your task is to explain how model alignment differs from model evaluation, and use the metrics above to diagnose what happened in this release.

Requirements

Explain the conceptual difference between alignment and evaluation in the context of OpenAI model development.
Interpret the metric changes and identify which ones reflect alignment outcomes versus general model performance.
Diagnose the most likely failure mode in the new model.
Recommend how you would evaluate future releases so alignment improvements do not mask regressions in usefulness.
Propose concrete next steps for model or policy improvement.

Constraints

The model is intended for broad consumer use in ChatGPT.
Safety regressions are unacceptable.
Product leadership will not accept a large drop in task completion or a tripling of over-refusals.

Alignment vs Evaluation Tradeoff

Easy

Model Evaluation

Context

Current Performance

Metric	Baseline Model	New Aligned Model	Change
Helpfulness win rate (human eval)	71%	66%	-5 pts
Safety violation rate	2.8%	0.9%	-1.9 pts
Over-refusal rate on benign prompts	6%	18%	+12 pts
Factual accuracy on eval set	84%	81%	-3 pts
Calibration error	0.07	0.11	+0.04
Task completion rate	88%	79%	-9 pts

The Problem

Requirements

Explain the conceptual difference between alignment and evaluation in the context of OpenAI model development.
Interpret the metric changes and identify which ones reflect alignment outcomes versus general model performance.
Diagnose the most likely failure mode in the new model.
Recommend how you would evaluate future releases so alignment improvements do not mask regressions in usefulness.
Propose concrete next steps for model or policy improvement.

Constraints

The model is intended for broad consumer use in ChatGPT.
Safety regressions are unacceptable.
Product leadership will not accept a large drop in task completion or a tripling of over-refusals.

Your Answer

Alignment vs Evaluation Tradeoff

Easy

Model Evaluation

Context

Current Performance

Metric	Baseline Model	New Aligned Model	Change
Helpfulness win rate (human eval)	71%	66%	-5 pts
Safety violation rate	2.8%	0.9%	-1.9 pts
Over-refusal rate on benign prompts	6%	18%	+12 pts
Factual accuracy on eval set	84%	81%	-3 pts
Calibration error	0.07	0.11	+0.04
Task completion rate	88%	79%	-9 pts

The Problem

Requirements

Explain the conceptual difference between alignment and evaluation in the context of OpenAI model development.
Interpret the metric changes and identify which ones reflect alignment outcomes versus general model performance.
Diagnose the most likely failure mode in the new model.
Recommend how you would evaluate future releases so alignment improvements do not mask regressions in usefulness.
Propose concrete next steps for model or policy improvement.

Constraints

The model is intended for broad consumer use in ChatGPT.
Safety regressions are unacceptable.
Product leadership will not accept a large drop in task completion or a tripling of over-refusals.

Alignment vs Evaluation Tradeoff

Easy

Model Evaluation

Context

Current Performance

Metric	Baseline Model	New Aligned Model	Change
Helpfulness win rate (human eval)	71%	66%	-5 pts
Safety violation rate	2.8%	0.9%	-1.9 pts
Over-refusal rate on benign prompts	6%	18%	+12 pts
Factual accuracy on eval set	84%	81%	-3 pts
Calibration error	0.07	0.11	+0.04
Task completion rate	88%	79%	-9 pts

The Problem

Requirements

Explain the conceptual difference between alignment and evaluation in the context of OpenAI model development.
Interpret the metric changes and identify which ones reflect alignment outcomes versus general model performance.
Diagnose the most likely failure mode in the new model.
Recommend how you would evaluate future releases so alignment improvements do not mask regressions in usefulness.
Propose concrete next steps for model or policy improvement.

Constraints

The model is intended for broad consumer use in ChatGPT.
Safety regressions are unacceptable.
Product leadership will not accept a large drop in task completion or a tripling of over-refusals.