Business Context
You’re a data engineer at a fintech payments processor that settles millions of dollars per day across thousands of merchants. Finance produces a daily revenue report by exporting transactions into Excel, while the data warehouse is considered the source of truth for settled revenue. Recently, Finance flagged that their Excel-based daily totals don’t match the warehouse totals for several days in January 2025, and leadership wants the top discrepancy days prioritized for investigation.
A common root cause is that Excel exports can contain duplicate rows for the same transaction within the same export run (e.g., a user re-runs an export or copy/paste duplicates lines). Another cause is missing transactions on one side.
Task
Write a SQL query to reconcile daily gross settled revenue between:
office_exports (Excel exports used by Finance)
transactions (warehouse source of truth)
Requirements
- Consider only
office_exports.tool = 'Excel'.
- Restrict to dates in January 2025 (
2025-01-01 inclusive to 2025-02-01 exclusive).
- De-duplicate Excel export lines so the same
(export_id, transaction_id) is only counted once.
- Compute daily totals:
excel_gross_usd: sum of de-duplicated Excel amounts by report_date
warehouse_gross_usd: sum of transactions.amount_usd for status = 'SETTLED' grouped by DATE(occurred_at)
- FULL OUTER JOIN the two daily totals so days present in only one source still appear.
- Compute:
delta_usd = excel_gross_usd - warehouse_gross_usd
delta_pct = delta_usd / warehouse_gross_usd (NULL if warehouse_gross_usd = 0)
- Filter to material discrepancies where
ABS(delta_usd) >= 100.
- Rank days by largest absolute discrepancy (1 = largest) and return only the top 3 discrepancy ranks.
- Order results by
discrepancy_rank, then report_date.
Table Definitions
transactions
| column | type | description |
|---|
| transaction_id | BIGINT | Primary key |
| merchant_id | BIGINT | Merchant identifier |
| occurred_at | TIMESTAMP | Time the transaction occurred (UTC) |
| amount_usd | DECIMAL(12,2) | Transaction amount in USD |
| status | VARCHAR(20) | Transaction status (e.g., SETTLED, DECLINED, REFUNDED) |
office_exports
| column | type | description |
|---|
| export_id | BIGINT | Identifier for a single export run |
| tool | VARCHAR(20) | Office tool used (e.g., Excel, PowerPoint) |
| exported_at | TIMESTAMP | When the export was generated (UTC) |
| report_date | DATE | Date Finance is reporting for |
| transaction_id | BIGINT | Transaction included in the export (FK-like to transactions.transaction_id) |
| amount_usd | DECIMAL(12,2) | Amount as exported (should match warehouse) |
Sample Data
transactions
| transaction_id | merchant_id | occurred_at | amount_usd | status |
|---|
| 1001 | 10 | 2025-01-05 10:10:00 | 300.00 | SETTLED |
| 1002 | 10 | 2025-01-05 12:05:00 | 250.00 | SETTLED |
| 1003 | 11 | 2025-01-10 09:00:00 | 500.00 | SETTLED |
| 1004 | 12 | 2025-01-10 09:30:00 | 400.00 | SETTLED |
| 1005 | 12 | 2025-01-20 15:00:00 | 600.00 | SETTLED |
office_exports
| export_id | tool | exported_at | report_date | transaction_id | amount_usd |
|---|
| 9001 | Excel | 2025-01-06 08:00:00 | 2025-01-05 | 1001 | 300.00 |
| 9001 | Excel | 2025-01-06 08:00:00 | 2025-01-05 | 1001 | 300.00 |
| 9001 | Excel | 2025-01-06 08:00:00 | 2025-01-05 | 1002 | 250.00 |
| 9002 | Excel | 2025-01-11 08:00:00 | 2025-01-10 | 1003 | 500.00 |
| 9003 | Excel | 2025-01-21 08:00:00 | 2025-01-20 | 1005 | 450.00 |
Expected Output (from sample data)
| report_date | excel_gross_usd | warehouse_gross_usd | delta_usd | delta_pct | discrepancy_rank |
|---|
| 2025-01-10 | 500.00 | 900.00 | -400.00 | -0.4444444444 | 1 |
| 2025-01-20 | 450.00 | 600.00 | -150.00 | -0.25 | 2 |