Business Context
HomePriceAI, a real estate analytics company, provides predictive insights on housing prices to real estate agents and buyers. Recently, their regression model for predicting house prices has been suffering from high variance, leading to inconsistent predictions across different datasets. The goal is to enhance model generalization to improve accuracy and reliability in various market conditions.
Dataset
| Feature Group | Count | Examples |
|---|
| Numeric Features | 10 | square_footage, num_bedrooms, num_bathrooms, year_built |
| Categorical Features | 5 | neighborhood, property_type, condition, heating_type |
| Temporal Features | 3 | days_on_market, last_renovation_date, listing_date |
- Size: 10,000 observations, 18 features
- Target: Continuous variable — house price in USD
- Class balance: Not applicable (regression task)
- Missing data: 8% missing in
last_renovation_date, 2% in heating_type
Requirements
- Identify and implement specific techniques to reduce model variance.
- Use regularization methods (L1/L2) to improve model performance.
- Explore ensemble methods (e.g., bagging, boosting) to enhance predictions.
- Provide a strategy for hyperparameter tuning to optimize model performance.
- Evaluate the model using appropriate metrics and discuss results.
Constraints
- The model must maintain interpretability for real estate agents.
- Inference time should not exceed 2 seconds for individual predictions.
- The solution should be scalable to accommodate future increases in data volume.