Task
You are given PSCU transaction data with inconsistent merchant names, mixed status values, missing member references, and duplicate transaction rows. Write a PostgreSQL query that cleans the data and returns, for each merchant category, the number of valid posted transactions, the number of unique active members involved, and the total posted amount.
Treat merchant names case-insensitively after trimming spaces, map status values such as posted, POSTED, and settled to a clean POSTED status, exclude reversed and declined activity, keep only transactions tied to active members, and deduplicate rows by keeping the latest record for the same member, merchant, amount, and transaction date.
Schema
pscu_members
| column | type | description |
|---|
| member_id | INT | Member identifier |
| member_name | VARCHAR(100) | Member name |
| member_status | VARCHAR(20) | Member status |
| | |
pscu_merchants
| column | type | description |
|---|
| merchant_id | INT | Merchant identifier |
| merchant_name | VARCHAR(100) | Merchant name |
| merchant_category | VARCHAR(50) | Merchant category |
| | |
pscu_transactions
| column | type | description |
|---|
| txn_id | INT | Transaction identifier |
| member_id | INT | Referenced member |
| merchant_name_raw | VARCHAR(100) | Raw merchant text from source |
| txn_date | DATE | Transaction date |
| amount | DECIMAL(10,2) | Transaction amount |
| txn_status | VARCHAR(20) | Raw transaction status |
| load_ts | TIMESTAMP | Record load timestamp |
Sample data
Representative rows include " amazon", "AMAZON", "Netflix ", "posted", "settled", "DECLINED", missing member_id, and duplicate posted rows loaded at different timestamps.
Expected output
| merchant_category | valid_posted_txn_count | active_member_count | total_posted_amount |
|---|
| Digital Services | 1 | 1 | 15.99 |
| Grocery | 2 | 2 | 95.10 |
| Retail | 2 | 2 | 145.50 |