StreamCart is building a propensity model to predict which users will purchase a premium subscription within 7 days after seeing an in-app upsell. The team has strong offline model metrics, but product leadership is not convinced those metrics translate into real business value or justify an online launch.
| Metric | Candidate Model | Current Heuristic Baseline | Change |
|---|---|---|---|
| AUC-ROC | 0.81 | 0.69 | +0.12 |
| Log Loss | 0.41 | 0.53 | -0.12 |
| Precision @ Top 10% | 0.24 | 0.15 | +0.09 |
| Recall @ Top 10% | 0.38 | 0.27 | +0.11 |
| Lift @ Top 10% | 3.0x | 1.9x | +1.1x |
| Calibration Error | 0.03 | 0.11 | -0.08 |
| Avg predicted conversion (top decile) | 22% | 14% | +8 pts |
| Actual conversion (top decile, holdout) | 24% | 15% | +9 pts |
The model looks better offline, but the business decision is whether targeting the top-scored users will create enough incremental subscription revenue to offset notification cost and user fatigue. You need to design a validation approach that connects offline metrics to expected business impact before running a full A/B test.