You are preparing a supervised learning dataset and notice that some fields are missing, inconsistent, or clearly noisy. You want a clean training pipeline that improves model quality without introducing leakage.
How would you handle missing or noisy data in a machine learning dataset?