Business Context
StreamCart, a subscription video platform, tested a redesigned signup onboarding flow intended to increase paid-start conversion. The product manager called the experiment a failure because the primary metric was not significant after 10 days, despite an apparent lift in the dashboard.
Problem Statement
Analyze whether the A/B test truly failed statistically, and explain what should be done next.
Given Data
| Group | Users Exposed | Paid Starts | Conversion Rate |
|---|
| Control (old onboarding) | 52,400 | 6,026 | 11.50% |
| Treatment (new onboarding) | 51,900 | 6,179 | 11.91% |
Additional context:
| Metric | Value |
|---|
| Significance level | 0.05 |
| Test type | Two-sided |
| Planned minimum detectable effect | 0.80 percentage points |
| Historical baseline conversion | 11.5% |
| Planned power | 80% |
Requirements
- State the null and alternative hypotheses for the conversion-rate comparison.
- Compute the observed absolute lift and relative lift.
- Run a two-proportion z-test using the pooled standard error.
- Calculate the two-sided p-value and a 95% confidence interval for the treatment-control difference.
- Decide whether the result is statistically significant at α=0.05.
- Explain why the test may be considered a "failed" experiment even if treatment is directionally positive.
- Recommend what the team should do afterward: ship, stop, rerun, or redesign the experiment.
Assumptions
- Users were randomly assigned at the user level.
- Each exposed user is counted once.
- No sample-ratio mismatch or instrumentation bug is present.
- The normal approximation is appropriate because both groups have large sample sizes and many conversions/non-conversions.