Task
A Databricks support analytics team stores ticket data in Delta Lake and theme extraction output from a PySpark pipeline in separate tables. Write a SQL query to summarize the most important support themes for open or recently resolved tickets.
Your query should use the extracted theme labels produced upstream by PySpark, but the task here is to analyze them in PostgreSQL.
Requirements
- Return one row per
theme_name for tickets created in April 2024.
- Include only tickets where
status is open or resolved.
- Count distinct tickets per theme as
ticket_count.
- Calculate
high_priority_count where ticket priority is P1 or P2.
- Return only themes appearing in at least 2 distinct tickets.
- Sort by
ticket_count descending, then high_priority_count descending, then theme_name ascending.
Table Definitions
support_tickets
| column | type | description |
|---|
| ticket_id | INT | Unique support ticket ID |
| customer_id | INT | Customer account ID |
| created_at | DATE | Ticket creation date |
| status | VARCHAR(20) | Ticket status |
| priority | VARCHAR(5) | Ticket priority |
| product_area | VARCHAR(50) | Databricks product area |
| | |
ticket_themes
| column | type | description |
|---|
| theme_id | INT | Unique extracted theme row |
| ticket_id | INT | Related ticket |
| theme_name | VARCHAR(100) | Theme extracted by a PySpark job |
| confidence_score | DECIMAL(4,2) | Extraction confidence |
| | |
customers
| column | type | description |
|---|
| customer_id | INT | Customer account ID |
| account_name | VARCHAR(100) | Customer name |
| segment | VARCHAR(30) | Customer segment |
| | |
Sample Data
support_tickets
| ticket_id | customer_id | created_at | status | priority | product_area |
|---|
| 101 | 1 | 2024-04-03 | open | P1 | Unity Catalog |
| 102 | 2 | 2024-04-06 | resolved | P2 | Model Serving |
| 103 | 3 | 2024-04-08 | pending | P3 | Vector Search |
| 104 | 2 | 2024-04-12 | open | P2 | Delta Lake |
| 105 | 4 | 2024-04-15 | resolved | P4 | Foundation Model APIs |
| 106 | 5 | 2024-03-29 | open | P1 | DBRX |
| | | | | |
ticket_themes
| theme_id | ticket_id | theme_name | confidence_score |
|---|
| 1001 | 101 | permissions | 0.95 |
| 1002 | 101 | catalog sync | 0.72 |
| 1003 | 102 | endpoint latency | 0.91 |
| 1004 | 104 | permissions | 0.88 |
| 1005 | 105 | prompt timeout | 0.67 |
| 1006 | 105 | null | 0.40 |
| | | |
Expected Output
| theme_name | ticket_count | high_priority_count |
|---|
| permissions | 2 | 2 |
| endpoint latency | 2 | 2 |
| vector index sync | 2 | 1 |