Interview Guides

Predict Apartment Prices with Linear Regression

Easy

Machine Learning

Business Context

HomeValue, a residential real-estate marketplace, wants a simple and interpretable model to estimate apartment sale prices for listing guidance in one major city. The pricing team needs a baseline regression model that can be retrained weekly and explained to non-technical stakeholders.

Dataset

You are given a historical dataset of apartment sales from the last 24 months.

Feature Group	Count	Examples
Property attributes	8	square_feet, bedrooms, bathrooms, floor_number, building_age_years
Location	5	neighborhood, distance_to_metro_km, school_rating, zip_code, latitude_bucket
Listing metadata	4	days_on_market, renovated_flag, parking_spaces, hoa_fee
Market context	3	month_of_sale, mortgage_rate, local_price_index

Size: 52K sold apartments, 20 input features
Target: Final sale price in USD
Feature types: Numerical and categorical
Missing data: ~6% missing in hoa_fee, ~4% in school_rating, ~2% in parking_spaces
Outliers: Luxury penthouses create a long right tail in price

Success Criteria

A solution is considered good enough if it:

Achieves RMSE below $45,000 on a held-out test set
Achieves MAE below $28,000
Produces interpretable coefficients or feature effects for the pricing team

Constraints

Prefer a linear-model-based approach for interpretability
Training should complete in minutes on a laptop
Batch inference for 10K listings should finish in under 1 second
The solution must avoid target leakage from post-sale fields

Deliverables

Build a linear regression pipeline in Python for price prediction.
Preprocess missing values and categorical variables correctly.
Evaluate the model using regression metrics on train/validation/test splits.
Explain how you would interpret coefficients and diagnose underfitting or multicollinearity.
Suggest at least one improvement beyond plain linear regression, such as Ridge or Lasso regularization.

Predict Apartment Prices with Linear Regression

Easy

Machine Learning

Business Context

Dataset

You are given a historical dataset of apartment sales from the last 24 months.

Feature Group	Count	Examples
Property attributes	8	square_feet, bedrooms, bathrooms, floor_number, building_age_years
Location	5	neighborhood, distance_to_metro_km, school_rating, zip_code, latitude_bucket
Listing metadata	4	days_on_market, renovated_flag, parking_spaces, hoa_fee
Market context	3	month_of_sale, mortgage_rate, local_price_index

Size: 52K sold apartments, 20 input features
Target: Final sale price in USD
Feature types: Numerical and categorical
Missing data: ~6% missing in hoa_fee, ~4% in school_rating, ~2% in parking_spaces
Outliers: Luxury penthouses create a long right tail in price

Success Criteria

A solution is considered good enough if it:

Achieves RMSE below $45,000 on a held-out test set
Achieves MAE below $28,000
Produces interpretable coefficients or feature effects for the pricing team

Constraints

Prefer a linear-model-based approach for interpretability
Training should complete in minutes on a laptop
Batch inference for 10K listings should finish in under 1 second
The solution must avoid target leakage from post-sale fields

Deliverables

Build a linear regression pipeline in Python for price prediction.
Preprocess missing values and categorical variables correctly.
Evaluate the model using regression metrics on train/validation/test splits.
Explain how you would interpret coefficients and diagnose underfitting or multicollinearity.
Suggest at least one improvement beyond plain linear regression, such as Ridge or Lasso regularization.

Your Answer

Predict Apartment Prices with Linear Regression

Easy

Machine Learning

Business Context

Dataset

You are given a historical dataset of apartment sales from the last 24 months.

Feature Group	Count	Examples
Property attributes	8	square_feet, bedrooms, bathrooms, floor_number, building_age_years
Location	5	neighborhood, distance_to_metro_km, school_rating, zip_code, latitude_bucket
Listing metadata	4	days_on_market, renovated_flag, parking_spaces, hoa_fee
Market context	3	month_of_sale, mortgage_rate, local_price_index

Size: 52K sold apartments, 20 input features
Target: Final sale price in USD
Feature types: Numerical and categorical
Missing data: ~6% missing in hoa_fee, ~4% in school_rating, ~2% in parking_spaces
Outliers: Luxury penthouses create a long right tail in price

Success Criteria

A solution is considered good enough if it:

Achieves RMSE below $45,000 on a held-out test set
Achieves MAE below $28,000
Produces interpretable coefficients or feature effects for the pricing team

Constraints

Prefer a linear-model-based approach for interpretability
Training should complete in minutes on a laptop
Batch inference for 10K listings should finish in under 1 second
The solution must avoid target leakage from post-sale fields

Deliverables

Build a linear regression pipeline in Python for price prediction.
Preprocess missing values and categorical variables correctly.
Evaluate the model using regression metrics on train/validation/test splits.
Explain how you would interpret coefficients and diagnose underfitting or multicollinearity.
Suggest at least one improvement beyond plain linear regression, such as Ridge or Lasso regularization.

Predict Apartment Prices with Linear Regression

Easy

Machine Learning

Business Context

Dataset

You are given a historical dataset of apartment sales from the last 24 months.

Feature Group	Count	Examples
Property attributes	8	square_feet, bedrooms, bathrooms, floor_number, building_age_years
Location	5	neighborhood, distance_to_metro_km, school_rating, zip_code, latitude_bucket
Listing metadata	4	days_on_market, renovated_flag, parking_spaces, hoa_fee
Market context	3	month_of_sale, mortgage_rate, local_price_index

Size: 52K sold apartments, 20 input features
Target: Final sale price in USD
Feature types: Numerical and categorical
Missing data: ~6% missing in hoa_fee, ~4% in school_rating, ~2% in parking_spaces
Outliers: Luxury penthouses create a long right tail in price

Success Criteria

A solution is considered good enough if it:

Achieves RMSE below $45,000 on a held-out test set
Achieves MAE below $28,000
Produces interpretable coefficients or feature effects for the pricing team

Constraints

Prefer a linear-model-based approach for interpretability
Training should complete in minutes on a laptop
Batch inference for 10K listings should finish in under 1 second
The solution must avoid target leakage from post-sale fields

Deliverables

Build a linear regression pipeline in Python for price prediction.
Preprocess missing values and categorical variables correctly.
Evaluate the model using regression metrics on train/validation/test splits.
Explain how you would interpret coefficients and diagnose underfitting or multicollinearity.
Suggest at least one improvement beyond plain linear regression, such as Ridge or Lasso regularization.