Task
You are given expense data exported from CU Denver Finance reporting, where department names, categories, and amounts contain inconsistent entries. Write a PostgreSQL query that cleans the data and returns, for each department, the number of valid expense rows and the total valid amount.
Treat department names case-insensitively and map them through the reference table. Standardize categories so values like travel, TRAVEL, and Travel are treated the same. Parse amounts stored as text by removing $, commas, and spaces; rows with blank, NULL, non-numeric, or negative amounts should be excluded from the valid totals. Also exclude rows whose department cannot be matched to the reference table. Return only departments with at least 2 valid cleaned rows, ordered by total valid amount descending and then department name.
Schema
| Table | Column | Type | Description |
|---|
| expense_staging | expense_id | INT | Raw expense row ID |
| expense_staging | dept_name_raw | VARCHAR(100) | Department name from source file |
| expense_staging | category_raw | VARCHAR(50) | Raw expense category |
| expense_staging | amount_raw | VARCHAR(30) | Raw amount stored as text |
| expense_staging | expense_date | DATE | Expense date |
| dept_reference | dept_code | VARCHAR(10) | Department code |
| dept_reference | dept_name_standard | VARCHAR(100) | Standard department name |
Sample data
| expense_id | dept_name_raw | category_raw | amount_raw | expense_date |
|---|
| 1 | finance | travel | $1,200.50 | 2024-01-10 |
| 2 | FINANCE | Travel | 300 | 2024-01-12 |
| 4 | Budget Office | supplies | -50 | 2024-01-18 |
| 8 | Student Affairs | travel | abc | 2024-02-01 |
Expected output
| department_name | valid_expense_rows | total_valid_amount |
|---|
| Finance | 3 | 1700.50 |
| Student Affairs | 2 | 700.00 |