NovaLearn evaluates an LLM that solves physics word problems and shows full mathematical derivations. Reviewers found that some final answers are numerically close to correct, but the reasoning chain may contain a physics-law misuse. Your task is to evaluate whether the model's derivation correctly applies the inverse-square law and identify the exact step where it fails.
The team audited 1,200 derivation traces on radiation, gravity, and light-intensity problems. A derivation is marked correct only if both the final answer and the reasoning steps are valid.
| Metric | Value |
|---|---|
| Step-level accuracy | 0.91 |
| Final-answer accuracy | 0.84 |
| Precision on "derivation error" flag | 0.88 |
| Recall on "derivation error" flag | 0.69 |
| F1 score | 0.77 |
| Cases involving inverse-square law | 320 |
| Inverse-square-law cases with reasoning error | 96 |
In inverse-square-law questions, the model often produces a plausible final number while making a subtle reasoning mistake such as scaling by $1/r instead of \1/r^2$, or inverting the ratio between two distances. The evaluation challenge is to diagnose where the derivation becomes invalid, not just whether the final answer is wrong.