HomeValueNow, a residential real estate platform, wants to use supervised learning for two related tasks: estimating a home's sale price and predicting whether a listing will sell above asking price. The team wants you to explain when to use regression versus classification, then build baseline models for both tasks on the same dataset.
You are given a historical dataset of 120,000 home listings from the last 24 months across 12 metro areas.
| Feature Group | Count | Examples |
|---|---|---|
| Numerical property features | 14 | square_feet, lot_size, bedrooms, bathrooms, age_years, hoa_fee |
| Categorical listing features | 9 | city, zip_code, property_type, condition, school_rating_bucket |
| Listing and market features | 8 | asking_price, days_on_market_at_snapshot, month_listed, mortgage_rate |
| Derived agent/seller features | 5 | agent_experience_years, prior_sale_count, seller_type |
sale_price (continuous USD value)sold_above_asking (1 if sale_price > asking_price, else 0)hoa_fee, 8% in lot_size, 6% in agent_experience_yearsA strong solution should:
sale_pricesold_above_asking