Business Context
A US state benefits agency runs a Curam-based eligibility and payments platform serving ~8M residents. The production system processes hundreds of thousands of case actions per day across programs like SNAP and Medicaid. After a recent release, the on-call data engineer is asked to quickly triage whether there is an error spike tied to a specific release and which subsystem and program type are most impacted.
The platform emits application error logs that include the deployment active at the time of the error and the case impacted. Leadership wants a daily rollup that can be pasted into an incident channel and used to decide whether to roll back.
Task
Write a SQL query that produces a daily triage report for production ERROR logs during the incident window 2026-02-01 through 2026-02-03 (inclusive).
Requirements
- Filter to
app_error_logs.environment = 'prod' and severity = 'ERROR'.
- Group results by event_date (date portion of
occurred_at) and release_version.
- For each (event_date, release_version), return:
total_errors: total number of error rows
affected_cases: number of distinct case_id impacted
- Also return:
top_component: the component with the most error rows that day for that release (break ties by component name ascending)
top_case_type: the case type with the most distinct affected cases that day for that release (break ties by case_type ascending)
- Order output by
event_date, then release_version.
Tables
cases
| column | type | description |
|---|
| case_id | BIGINT | Unique case identifier (PK) |
| case_type | VARCHAR(50) | Program/case category (e.g., SNAP, Medicaid) |
| region_code | VARCHAR(10) | Operating region/office code |
| created_at | TIMESTAMP | Case creation timestamp |
deployments
| column | type | description |
|---|
| deployment_id | BIGINT | Unique deployment identifier (PK) |
| release_version | VARCHAR(20) | Release tag deployed to an environment |
| environment | VARCHAR(10) | Environment name (prod, stage) |
| deployed_at | TIMESTAMP | Deployment timestamp |
app_error_logs
| column | type | description |
|---|
| error_id | BIGINT | Unique error log row (PK) |
| occurred_at | TIMESTAMP | When the error occurred |
| environment | VARCHAR(10) | Environment name (prod, stage) |
| deployment_id | BIGINT | Deployment active when error occurred |
| case_id | BIGINT | Case impacted by the error |
| component | VARCHAR(80) | Subsystem/module (Eligibility, Payments, etc.) |
| severity | VARCHAR(10) | Log severity (ERROR/WARN/INFO) |
| error_code | VARCHAR(30) | Stable code used for grouping similar errors |
Sample Data
cases
| case_id | case_type | region_code | created_at |
|---|
| 1001 | SNAP | NE-01 | 2026-01-15 09:10:00 |
| 1002 | Medicaid | NE-01 | 2026-01-20 11:05:00 |
| 1003 | TANF | SW-02 | 2026-02-01 08:00:00 |
| 1004 | SNAP | SW-02 | 2026-02-02 10:30:00 |
deployments
| deployment_id | release_version | environment | deployed_at |
|---|
| 501 | 7.0.1 | prod | 2026-01-28 02:00:00 |
| 502 | 7.0.2 | prod | 2026-02-02 01:00:00 |
| 601 | 7.0.2 | stage | 2026-02-01 01:00:00 |
app_error_logs
| error_id | occurred_at | environment | deployment_id | case_id | component | severity | error_code |
|---|
| 9001 | 2026-02-01 09:00:00 | prod | 501 | 1001 | Eligibility | ERROR | ELG-500 |
| 9002 | 2026-02-01 09:05:00 | prod | 501 | 1002 | Eligibility | ERROR | ELG-500 |
| 9003 | 2026-02-01 10:00:00 | prod | 501 | 1001 | Payments | ERROR | PAY-201 |
| 9004 | 2026-02-02 09:00:00 | prod | 502 | 1004 | Payments | ERROR | PAY-201 |
| 9005 | 2026-02-02 09:10:00 | prod | 502 | 1004 | Payments | ERROR | PAY-201 |
Expected Output
| event_date | release_version | total_errors | affected_cases | top_component | top_case_type |
|---|
| 2026-02-01 | 7.0.1 | 3 | 2 | Eligibility | SNAP |
| 2026-02-02 | 7.0.2 | 2 | 1 | Payments | SNAP |