You ran an A/B test on a new onboarding flow and used onboarding completion within 7 days as the primary metric. The control group had 8,400 users with 2,142 completions, and the treatment group had 8,100 users with 2,145 completions. Before launch, you planned for 80% power at a 5% significance level to detect a minimum effect of 2 percentage points, but the test was stopped after 10 days because traffic was lower than expected.
How would you determine whether this test truly failed or was simply inconclusive, what statistical evidence supports your conclusion, and what would you do next?