You’re on the data engineering team at a fintech wallet that processes millions of card and bank payments per day through multiple processors (e.g., Stripe and Adyen). Due to upstream retries and late-arriving updates, the raw ingestion table can contain multiple rows per payment_id, and key fields like timestamps and amounts arrive as messy strings (ISO timestamps, US-formatted timestamps, currency symbols, commas, and negative signs).
The finance analytics team is building a daily revenue dashboard for January 2025. If you don’t deduplicate and standardize these raw fields correctly, the dashboard will overstate revenue and understate refunds—creating real reporting and reconciliation risk.
Write a SQL query (PostgreSQL) that cleans the raw payments data and returns daily settled, refunded, and net settled USD for January 2025.
raw_payments by keeping only the most recently ingested row per payment_id (largest ingested_at).event_time_raw into a timestamp supporting both formats:
2025-01-05T10:15:00Z01/05/2025 11:00:00amount_raw into a numeric USD amount by removing USD, $, commas, and spaces; preserve the negative sign if present.status_raw into:
SETTLED for settled, paidREFUNDED for refunded, refund, chargebackOTHER otherwiseevent_datesettled_usd (sum of settled amounts)refunded_usd (sum of absolute refunded amounts)net_settled_usd = settled_usd - refunded_usdevent_date ascending.raw_payments| column | type | description |
|---|---|---|
| payment_id | VARCHAR(64) | Payment identifier; not unique in raw due to replays/updates |
| user_id | BIGINT | Wallet user id |
| processor | VARCHAR(32) | Payment processor (e.g., stripe, adyen) |
| event_time_raw | VARCHAR(64) | Raw timestamp string in mixed formats |
| amount_raw | VARCHAR(64) | Raw amount string; may include symbols, commas, currency text |
| currency | VARCHAR(3) | Currency code; assume USD for this question |
| status_raw | VARCHAR(32) | Raw status value from upstream |
| ingested_at | TIMESTAMP | Warehouse ingestion time; used to keep most recent record per payment_id |
| payment_id | user_id | processor | event_time_raw | amount_raw | currency | status_raw | ingested_at |
|---|---|---|---|---|---|---|---|
| p_1001 | 501 | stripe | 2025-01-05T10:15:00Z | $1,200.50 | USD | settled | 2025-01-05 10:16:00 |
| p_1001 | 501 | stripe | 2025-01-05T10:15:00Z | $1,200.50 | USD | SETTLED | 2025-01-05 10:20:00 |
| p_1002 | 502 | adyen | 01/05/2025 11:00:00 | USD 75.00 | USD | paid | 2025-01-05 11:01:00 |
| p_1003 | 503 | stripe | 2025-01-06T09:00:00Z | -$20.00 | USD | refunded | 2025-01-06 09:05:00 |
| p_1004 | 504 | adyen | 01/06/2025 12:30:00 | 100.00 | USD | chargeback | 2025-01-06 12:35:00 |
| event_date | settled_usd | refunded_usd | net_settled_usd |
|---|---|---|---|
| 2025-01-05 | 1275.50 | 0.00 | 1275.50 |
| 2025-01-06 | 0.00 | 120.00 | -120.00 |