Business Context
You’re a senior data scientist at StreamWave, a subscription video streaming service with 18M monthly active users across the US and Canada. Marketing spend is ~$45M/month across TV, Paid Search, Paid Social, YouTube, Display, and Affiliate. The CFO is pushing to reallocate budget next quarter and wants a defensible estimate of incremental revenue and marginal ROI by channel.
Historically, the team has relied on last-click attribution, but it over-credits Paid Search and under-credits TV/YouTube. You are asked to build a Media Mix Model (MMM) from scratch using observational time-series data.
Problem Statement
Design an MMM approach that can (a) estimate the incremental contribution of each channel while accounting for adstock/carryover, diminishing returns, and seasonality, and (b) produce uncertainty estimates that are credible enough for finance.
To make the discussion concrete, assume you have 104 weeks of weekly data. You will:
- Specify the minimum data you need (what tables/fields, grain, and key joins).
- Propose a baseline MMM specification (equation + transformations).
- Using the provided simplified regression output, test whether TV has statistically significant incremental impact at α=0.05.
- Compute a 95% confidence interval for TV’s incremental revenue per $1K spend (holding other variables fixed).
- Translate the result into a budget recommendation and list the top modeling risks.
Given Data (Simplified)
You fit an OLS MMM on weekly revenue (in $M) with controls and transformed media:
Revenuet=β0+βTV⋅TVtadstock+βSearch⋅Searchtlog+βSocial⋅Socialtlog+γ⋅PriceIndext+δ⋅Promot+Seasonalityt+ϵt
Where:
- TVtadstock is weekly TV spend in $K after adstock (carryover).
- Searchtlog,Socialtlog are log(1+spend) transforms (diminishing returns).
- Seasonality is captured via week-of-year Fourier terms.
Regression output for TV term only (others omitted here):
| Term | Estimate | Std. Error | Notes |
|---|
| βTV | 0.00162 | 0.00071 | Revenue in MperK adstocked TV spend |
Assume:
- n=104 weeks
- Total parameters in the model p=18 (including intercept + controls + seasonality)
- Residuals are approximately normal; you will use a t-test with df=n−p
Requirements
- Data requirements: list the datasets/fields you need (media, outcomes, controls, and metadata), including how you would handle:
- channel definitions (impressions vs spend)
- geo granularity (national vs DMA) and why it matters
- offline conversions and delayed revenue recognition
- Model design: specify adstock and saturation choices and justify them.
- Hypothesis test: H0:βTV=0 vs H1:βTVeq0. Compute the t-statistic and p-value.
- 95% CI for βTV and interpret it as incremental revenue per \1K TV spend.
- Business interpretation: what would you tell the CFO, and what are the top caveats (confounding, multicollinearity, autocorrelation, measurement error, and policy changes like pricing/promos)?
Assumptions and Constraints
- Weekly aggregation; spend is measured accurately, but TV GRPs are not available (spend only).
- No randomized geo experiments exist for this period.
- Potential autocorrelation in ϵt; you may comment on Newey–West/HAC as a robustness check.