You have trained a medical imaging model and now need to show that its performance will hold up when data comes from different hospitals or scanner vendors. The team is concerned that a strong internal validation result may not reflect real-world variation in patient mix, acquisition protocols, and device settings.
How would you validate a model so that it generalizes across different hospitals or imaging devices?