
You are tuning a supervised learning model that will be deployed in production. Several candidate settings improve validation score, but the team needs a repeatable way to pick hyperparameters without overfitting to the holdout set.
How would you tune hyperparameters for a production machine learning model?