
A Databricks support analytics team stores ticket data in Delta Lake and theme extraction output from a PySpark pipeline in separate tables. Write a SQL query to summarize the most important support themes for open or recently resolved tickets.
Your query should use the extracted theme labels produced upstream by PySpark, but the task here is to analyze them in PostgreSQL.
theme_name for tickets created in April 2024.status is open or resolved.ticket_count.high_priority_count where ticket priority is P1 or P2.ticket_count descending, then high_priority_count descending, then theme_name ascending.support_tickets| column | type | description |
|---|---|---|
| ticket_id | INT | Unique support ticket ID |
| customer_id | INT | Customer account ID |
| created_at | DATE | Ticket creation date |
| status | VARCHAR(20) | Ticket status |
| priority | VARCHAR(5) | Ticket priority |
| product_area | VARCHAR(50) | Databricks product area |
ticket_themes| column | type | description |
|---|---|---|
| theme_id | INT | Unique extracted theme row |
| ticket_id | INT | Related ticket |
| theme_name | VARCHAR(100) | Theme extracted by a PySpark job |
| confidence_score | DECIMAL(4,2) | Extraction confidence |
customers| column | type | description |
|---|---|---|
| customer_id | INT | Customer account ID |
| account_name | VARCHAR(100) | Customer name |
| segment | VARCHAR(30) | Customer segment |
support_tickets| ticket_id | customer_id | created_at | status | priority | product_area |
|---|---|---|---|---|---|
| 101 | 1 | 2024-04-03 | open | P1 | Unity Catalog |
| 102 | 2 | 2024-04-06 | resolved | P2 | Model Serving |
| 103 | 3 | 2024-04-08 | pending | P3 | Vector Search |
| 104 | 2 | 2024-04-12 | open | P2 | Delta Lake |
| 105 | 4 | 2024-04-15 | resolved | P4 | Foundation Model APIs |
| 106 | 5 | 2024-03-29 | open | P1 | DBRX |
ticket_themes| theme_id | ticket_id | theme_name | confidence_score |
|---|---|---|---|
| 1001 | 101 | permissions | 0.95 |
| 1002 | 101 | catalog sync | 0.72 |
| 1003 | 102 | endpoint latency | 0.91 |
| 1004 | 104 | permissions | 0.88 |
| 1005 | 105 | prompt timeout | 0.67 |
| 1006 | 105 | null | 0.40 |
| theme_name | ticket_count | high_priority_count |
|---|---|---|
| permissions | 2 | 2 |
| endpoint latency | 2 | 2 |
| vector index sync | 2 | 1 |
| Column | Type | Description |
|---|---|---|
| ticket_idPK | INT | Unique support ticket identifier |
| customer_id | INT | Customer account identifier |
| created_at | DATE | Date the support ticket was created |
| status | VARCHAR(20) | Current ticket status |
| priority | VARCHAR(5) | Ticket priority such as P1 or P2 |
| product_area | VARCHAR(50) | Databricks product area mentioned in the ticket |
| Column | Type | Description |
|---|---|---|
| theme_idPK | INT | Unique extracted theme row identifier |
| ticket_id | INT | Related support ticket identifier |
| theme_name | VARCHAR(100) | Theme extracted from ticket text by a PySpark pipeline |
| confidence_score | DECIMAL(4,2) | Confidence score for the extracted theme |
| Column | Type | Description |
|---|---|---|
| customer_idPK | INT | Customer account identifier |
| account_name | VARCHAR(100) | Customer account name |
| segment | VARCHAR(30) | Customer segment |
{"customers":[["1","Acme Retail","Enterprise"],["2","Northwind Health","Enterprise"],["3","BluePeak Labs","Mid-Market"],["4","Crest Finance","Enterprise"],["5","Helio Media","SMB"],["6","Orbit Logistics","Mid-Market"],["7","Pioneer Energy","Enterprise"],["8","Summit AI","Startup"],["9","Lakehouse Foods","SMB"],["10","Vertex Bio",null],["11","Crest Finance","Consumer Tech"],["12","Keystone Travel","Retail"],["13","Pioneer Energy","SMB"],["14","Beacon Robotics","null"],["15","Granite Bank","MarketOutput[["endpoint latency","2","2"],["permissions","2","2"],["vector index sync","2","1"]]