HomeValue Analytics builds automated valuation models for regional real-estate brokers. The pricing team wants a regression model that generalizes well to new listings and can explain why regularization improves performance over an unregularized baseline.
You are given a tabular dataset of 62,000 residential property sales from the last 24 months across 8 metro areas. The goal is to predict the final sale price from listing, property, and neighborhood features.
| Feature Group | Count | Examples |
|---|---|---|
| Property attributes | 14 | square_feet, bedrooms, bathrooms, lot_size, year_built |
| Listing details | 6 | listing_price, days_on_market_at_offer, seller_type |
| Neighborhood | 9 | zip_code, school_rating, crime_index, distance_to_transit |
| Derived indicators | 7 | renovated_flag, price_per_sqft_listed, home_age |
A good solution should outperform an unregularized linear regression baseline and demonstrate, with validation results, why regularization is useful for reducing overfitting. Target test RMSE below $42K and stable train/validation error gaps.