Business Context
StreamCart produces a daily revenue-per-order report used by finance and operations. A recent pipeline run showed a sharp jump in average order value, and the team suspects a few extreme transactions are distorting the metric.
Problem Statement
You need to evaluate how outliers should be handled in the reporting pipeline and quantify the impact of different summary statistics. Use the sample of daily order values below to identify outliers with the IQR rule, compare the mean before and after excluding outliers, and compute a 95% confidence interval for the cleaned mean.
Given Data
A sample of 15 order values (USD) from one day:
| Order Values |
|---|
| 42 |
| 45 |
| 47 |
| 48 |
| 49 |
| 50 |
| 51 |
| 52 |
| 53 |
| 54 |
| 55 |
| 56 |
| 58 |
| 60 |
| 220 |
Assume the reporting team currently publishes the arithmetic mean as the headline KPI.
Requirements
- Sort the data and compute Q1, Q3, and IQR
- Use the 1.5 × IQR rule to identify outliers
- Calculate the mean with all observations included
- Calculate the mean after excluding detected outliers
- Compute the sample standard deviation of the cleaned data
- Construct a 95% confidence interval for the cleaned mean using the t-distribution
- Recommend how the pipeline should handle outliers for reporting and monitoring
Assumptions
- This sample is representative of one reporting day
- The cleaned observations are approximately independent
- After removing clear outliers, the mean can be summarized with a t-based confidence interval
- The goal is reporting robustness, not fraud detection or root-cause diagnosis