You've shipped a model that looked strong in testing, but it is not holding up in production. You need to figure out why the offline results were misleading and how to get the model back to acceptable performance.
Describe a situation where a model performed well in testing but failed in production. How did you diagnose and fix the issue?