Dataford
Interview Guides
Upgrade
All questions/Machine Learning/Predict Apartment Prices with Linear Regression

Predict Apartment Prices with Linear Regression

Easy
Machine Learning
RegressionSupervised LearningFeature Engineering

Problem

Business Context

HomeValue, a residential real-estate marketplace, wants a simple and interpretable model to estimate apartment sale prices for listing guidance in one major city. The pricing team needs a baseline regression model that can be retrained weekly and explained to non-technical stakeholders.

Dataset

You are given a historical dataset of apartment sales from the last 24 months.

Feature GroupCountExamples
Property attributes8square_feet, bedrooms, bathrooms, floor_number, building_age_years
Location5neighborhood, distance_to_metro_km, school_rating, zip_code, latitude_bucket
Listing metadata4days_on_market, renovated_flag, parking_spaces, hoa_fee
Market context3month_of_sale, mortgage_rate, local_price_index
  • Size: 52K sold apartments, 20 input features
  • Target: Final sale price in USD
  • Feature types: Numerical and categorical
  • Missing data: ~6% missing in hoa_fee, ~4% in school_rating, ~2% in parking_spaces
  • Outliers: Luxury penthouses create a long right tail in price

Success Criteria

A solution is considered good enough if it:

  • Achieves RMSE below $45,000 on a held-out test set
  • Achieves MAE below $28,000
  • Produces interpretable coefficients or feature effects for the pricing team

Constraints

  • Prefer a linear-model-based approach for interpretability
  • Training should complete in minutes on a laptop
  • Batch inference for 10K listings should finish in under 1 second
  • The solution must avoid target leakage from post-sale fields

Deliverables

  1. Build a linear regression pipeline in Python for price prediction.
  2. Preprocess missing values and categorical variables correctly.
  3. Evaluate the model using regression metrics on train/validation/test splits.
  4. Explain how you would interpret coefficients and diagnose underfitting or multicollinearity.
  5. Suggest at least one improvement beyond plain linear regression, such as Ridge or Lasso regularization.

Problem

Business Context

HomeValue, a residential real-estate marketplace, wants a simple and interpretable model to estimate apartment sale prices for listing guidance in one major city. The pricing team needs a baseline regression model that can be retrained weekly and explained to non-technical stakeholders.

Dataset

You are given a historical dataset of apartment sales from the last 24 months.

Feature GroupCountExamples
Property attributes8square_feet, bedrooms, bathrooms, floor_number, building_age_years
Location5neighborhood, distance_to_metro_km, school_rating, zip_code, latitude_bucket
Listing metadata4days_on_market, renovated_flag, parking_spaces, hoa_fee
Market context3month_of_sale, mortgage_rate, local_price_index
  • Size: 52K sold apartments, 20 input features
  • Target: Final sale price in USD
  • Feature types: Numerical and categorical
  • Missing data: ~6% missing in hoa_fee, ~4% in school_rating, ~2% in parking_spaces
  • Outliers: Luxury penthouses create a long right tail in price

Success Criteria

A solution is considered good enough if it:

  • Achieves RMSE below $45,000 on a held-out test set
  • Achieves MAE below $28,000
  • Produces interpretable coefficients or feature effects for the pricing team

Constraints

  • Prefer a linear-model-based approach for interpretability
  • Training should complete in minutes on a laptop
  • Batch inference for 10K listings should finish in under 1 second
  • The solution must avoid target leakage from post-sale fields

Deliverables

  1. Build a linear regression pipeline in Python for price prediction.
  2. Preprocess missing values and categorical variables correctly.
  3. Evaluate the model using regression metrics on train/validation/test splits.
  4. Explain how you would interpret coefficients and diagnose underfitting or multicollinearity.
  5. Suggest at least one improvement beyond plain linear regression, such as Ridge or Lasso regularization.
Your answer
Try one AI text evaluation on us
Get structured feedback, scored against a 4-axis rubric. Premium unlocks unlimited.
0 wordstarget ~200
Up next
HouseCanaryRegularize House Price RegressionEasyURegularize House Price RegressionEasyTrain House Prices with Gradient DescentEasy
Next question