Dataford
Interview Guides
Upgrade
All questions/Machine Learning/Choose Classification vs Regression for Pricing

Choose Classification vs Regression for Pricing

Easy
Machine Learning
Asked at 1 company1Supervised LearningDecision TreesFeature Engineering
Also asked at
MYND

Problem

Business Context

HomeValueNow, a residential real estate platform, wants to use supervised learning for two related tasks: estimating a home's sale price and predicting whether a listing will sell above asking price. The team wants you to explain when to use regression versus classification, then build baseline models for both tasks on the same dataset.

Dataset

You are given a historical dataset of 120,000 home listings from the last 24 months across 12 metro areas.

Feature GroupCountExamples
Numerical property features14square_feet, lot_size, bedrooms, bathrooms, age_years, hoa_fee
Categorical listing features9city, zip_code, property_type, condition, school_rating_bucket
Listing and market features8asking_price, days_on_market_at_snapshot, month_listed, mortgage_rate
Derived agent/seller features5agent_experience_years, prior_sale_count, seller_type
  • Regression target: sale_price (continuous USD value)
  • Classification target: sold_above_asking (1 if sale_price > asking_price, else 0)
  • Class balance: 41% positive, 59% negative for classification
  • Missing data: 12% missing in hoa_fee, 8% in lot_size, 6% in agent_experience_years

Success Criteria

A strong solution should:

  • Clearly explain supervised learning and the difference between classification and regression
  • Build one regression model and one classification model using the same feature set where appropriate
  • Achieve MAE < $32,000 for sale price prediction and ROC-AUC > 0.80 for above-asking prediction
  • Show how evaluation metrics differ by problem type

Constraints

  • Predictions will run in batch once per day for up to 200,000 active listings
  • The business team needs interpretable drivers, not only raw predictions
  • The solution should be simple enough to maintain by a small data team

Deliverables

  1. Define supervised learning and explain when to use classification versus regression
  2. Build a regression pipeline for sale_price
  3. Build a classification pipeline for sold_above_asking
  4. Compare metrics, thresholding, and business tradeoffs across both tasks
  5. Recommend which model should be deployed first and why

Problem

Business Context

HomeValueNow, a residential real estate platform, wants to use supervised learning for two related tasks: estimating a home's sale price and predicting whether a listing will sell above asking price. The team wants you to explain when to use regression versus classification, then build baseline models for both tasks on the same dataset.

Dataset

You are given a historical dataset of 120,000 home listings from the last 24 months across 12 metro areas.

Feature GroupCountExamples
Numerical property features14square_feet, lot_size, bedrooms, bathrooms, age_years, hoa_fee
Categorical listing features9city, zip_code, property_type, condition, school_rating_bucket
Listing and market features8asking_price, days_on_market_at_snapshot, month_listed, mortgage_rate
Derived agent/seller features5agent_experience_years, prior_sale_count, seller_type
  • Regression target: sale_price (continuous USD value)
  • Classification target: sold_above_asking (1 if sale_price > asking_price, else 0)
  • Class balance: 41% positive, 59% negative for classification
  • Missing data: 12% missing in hoa_fee, 8% in lot_size, 6% in agent_experience_years

Success Criteria

A strong solution should:

  • Clearly explain supervised learning and the difference between classification and regression
  • Build one regression model and one classification model using the same feature set where appropriate
  • Achieve MAE < $32,000 for sale price prediction and ROC-AUC > 0.80 for above-asking prediction
  • Show how evaluation metrics differ by problem type

Constraints

  • Predictions will run in batch once per day for up to 200,000 active listings
  • The business team needs interpretable drivers, not only raw predictions
  • The solution should be simple enough to maintain by a small data team

Deliverables

  1. Define supervised learning and explain when to use classification versus regression
  2. Build a regression pipeline for sale_price
  3. Build a classification pipeline for sold_above_asking
  4. Compare metrics, thresholding, and business tradeoffs across both tasks
  5. Recommend which model should be deployed first and why
Your answer
Try one AI text evaluation on us
Get structured feedback, scored against a 4-axis rubric. Premium unlocks unlimited.
0 wordstarget ~200
Up next
HouseCanaryRegularize House Price RegressionEasyPredict Apartment Prices with Linear RegressionEasyURegularize House Price RegressionEasy
Next question