RidePulse is a large ride-hailing marketplace (~8M weekly active riders across the US). The pricing team is evaluating a new “smart surge” algorithm and wants a quick model of price elasticity: how much ride requests drop when the effective price increases. A data scientist fits an OLS linear regression on a random sample of sessions to estimate the relationship between demand and price.
The model is used to make a high-stakes decision: whether to roll out smart surge nationwide (expected to move weekly revenue by 1–3%). Leadership asks you to sanity-check whether the linear regression assumptions are plausible and whether the inference (p-values / confidence intervals) can be trusted.
You fit the following OLS model at the city-hour level:
where rain and event are binary indicators.
You’re given the regression output and a few diagnostic summaries:
| Item | Value |
|---|---|
| Observations (n) | 2,400 |
| Parameters incl. intercept (p) | 4 |
| OLS estimate for (log-price) | -1.18 |
| Conventional (non-robust) SE for | 0.21 |
| Heteroskedasticity-robust (HC1) SE for | 0.34 |
| Correlation between $ | \hat\epsilon |
| Durbin–Watson statistic (residuals ordered by time within city) | 1.05 |
| Mean of residuals | ~0.000 |
| Mean of fitted residuals by city (min, median, max) | (-0.22, 0.01, 0.19) |
Assume the coefficient estimate is unchanged; only the standard error differs depending on assumptions.
Answer the following, focusing on assumptions of linear regression and how violations affect business conclusions.