Context
The QuickBooks Payments team wants to test a new invoice reminder experience in QuickBooks Online: when a small business sends an invoice, the payer receives a redesigned reminder email and hosted payment page intended to increase paid invoices.
Hypothesis Seed
The team believes the new reminder flow will increase invoice payment conversion by making it easier for customers to complete payment. However, payments have clear network effects: a payer may receive invoices from multiple businesses, and a business may have many repeat payers. If the same payer is exposed to both control and treatment across different invoices, interference could bias the estimate.
Constraints
- Eligible traffic: 180,000 invoices/day sent by 45,000 QuickBooks merchants
- Average invoice payment rate within 14 days: 62%
- Average of 1.8 invoices per payer per month; 22% of payers transact with more than one QuickBooks merchant
- Maximum experiment window: 28 days to reach a launch decision before peak billing season
- False positives are costly because a bad reminder flow could reduce payment trust and increase support contacts; false negatives are acceptable up to one quarter delay
- Engineering can support either merchant-level randomization or payer-level hashing, but not a complex marketplace-wide graph clustering solution in this cycle
Deliverables
- Define the hypothesis, primary metric, guardrails, and a realistic MDE for this QuickBooks Payments test.
- Choose the unit of randomization and explain how your design addresses network interference, SUTVA concerns, and repeated invoices.
- Calculate the required sample size and expected duration using the provided traffic assumptions.
- Pre-register the analysis plan: statistical test, peeking policy, SRM checks, and treatment of secondary metrics / multiple comparisons.
- State a clear ship / don't-ship rule that respects both the primary metric and guardrails, including what you would do if results are statistically significant but operationally small.