Business Context
StreamHub, a video streaming app, noticed that users who enable push notifications appear to watch more content. The product manager wants to know whether notifications cause higher engagement or whether the relationship is driven by more engaged users being more likely to opt in.
Problem Statement
Use the observational data below to distinguish correlation from causation. Quantify the naive relationship between notification opt-in and watch time, then assess whether the relationship remains after controlling for prior engagement.
Given Data
| Segment | Notification Opt-In | Users | Avg Prior-Week Watch Hours | Avg Current-Week Watch Hours |
|---|
| High prior engagement | Yes | 800 | 12.0 | 13.2 |
| High prior engagement | No | 200 | 12.0 | 12.8 |
| Low prior engagement | Yes | 200 | 2.0 | 2.4 |
| Low prior engagement | No | 800 | 2.0 | 2.2 |
Assume the within-segment standard deviation of current-week watch hours is 4.0 hours for all four groups.
Requirements
- Compute the overall average current-week watch hours for notification opt-in users and non-opt-in users.
- Calculate the naive difference in means and explain why it is correlation, not necessarily causation.
- Compute the within-segment treatment effect for high- and low-engagement users.
- Estimate the adjusted causal effect by weighting the segment-level effects.
- Run a hypothesis test for the adjusted effect using the provided standard deviation and determine whether it is statistically significant at α=0.05.
- Explain what additional evidence would be needed to make a stronger causal claim.
Assumptions
- Users were not randomly assigned to opt in; notification choice is self-selected.
- Prior-week watch hours is a confounder affecting both opt-in behavior and future watch time.
- Segment-level means are representative, and user outcomes are independent within groups.