Interview Guides

Aggregate Support Ticket Themes | Dataford Interview Questions - Dataford - Ace your Interview

Aggregate Support Ticket Themes

Medium

SQL & Data Manipulation

Asked at 1 company1AggregationsGroup ByData Wrangling

Also asked at

Problem

Task

A Databricks support analytics team stores ticket data in Delta Lake and theme extraction output from a PySpark pipeline in separate tables. Write a SQL query to summarize the most important support themes for open or recently resolved tickets.

Your query should use the extracted theme labels produced upstream by PySpark, but the task here is to analyze them in PostgreSQL.

Requirements

Return one row per theme_name for tickets created in April 2024.
Include only tickets where status is open or resolved.
Count distinct tickets per theme as ticket_count.
Calculate high_priority_count where ticket priority is P1 or P2.
Return only themes appearing in at least 2 distinct tickets.
Sort by ticket_count descending, then high_priority_count descending, then theme_name ascending.

Table Definitions

`support_tickets`

column	type	description
ticket_id	INT	Unique support ticket ID
customer_id	INT	Customer account ID
created_at	DATE	Ticket creation date
status	VARCHAR(20)	Ticket status
priority	VARCHAR(5)	Ticket priority
product_area	VARCHAR(50)	Databricks product area

`ticket_themes`

column	type	description
theme_id	INT	Unique extracted theme row
ticket_id	INT	Related ticket
theme_name	VARCHAR(100)	Theme extracted by a PySpark job
confidence_score	DECIMAL(4,2)	Extraction confidence

`customers`

column	type	description
customer_id	INT	Customer account ID
account_name	VARCHAR(100)	Customer name
segment	VARCHAR(30)	Customer segment

Sample Data

`support_tickets`

ticket_id	customer_id	created_at	status	priority	product_area
101	1	2024-04-03	open	P1	Unity Catalog
102	2	2024-04-06	resolved	P2	Model Serving
103	3	2024-04-08	pending	P3	Vector Search
104	2	2024-04-12	open	P2	Delta Lake
105	4	2024-04-15	resolved	P4	Foundation Model APIs
106	5	2024-03-29	open	P1	DBRX

`ticket_themes`

theme_id	ticket_id	theme_name	confidence_score
1001	101	permissions	0.95
1002	101	catalog sync	0.72
1003	102	endpoint latency	0.91
1004	104	permissions	0.88
1005	105	prompt timeout	0.67
1006	105	null	0.40

Expected Output

theme_name	ticket_count	high_priority_count
permissions	2	2
endpoint latency	2	2
vector index sync	2	1

Schema

support_tickets

Column	Type	Description
ticket_idPK	INT	Unique support ticket identifier
customer_id	INT	Customer account identifier
created_at	DATE	Date the support ticket was created
status	VARCHAR(20)	Current ticket status
priority	VARCHAR(5)	Ticket priority such as P1 or P2
product_area	VARCHAR(50)	Databricks product area mentioned in the ticket

ticket_themes

Column	Type	Description
theme_idPK	INT	Unique extracted theme row identifier
ticket_id	INT	Related support ticket identifier
theme_name	VARCHAR(100)	Theme extracted from ticket text by a PySpark pipeline
confidence_score	DECIMAL(4,2)	Confidence score for the extracted theme

customers

Column	Type	Description
customer_idPK	INT	Customer account identifier
account_name	VARCHAR(100)	Customer account name
segment	VARCHAR(30)	Customer segment

Sample Data

Example 1

Input

{"customers":[["1","Acme Retail","Enterprise"],["2","Northwind Health","Enterprise"],["3","BluePeak Labs","Mid-Market"],["4","Crest Finance","Enterprise"],["5","Helio Media","SMB"],["6","Orbit Logistics","Mid-Market"],["7","Pioneer Energy","Enterprise"],["8","Summit AI","Startup"],["9","Lakehouse Foods","SMB"],["10","Vertex Bio",null],["11","Crest Finance","Consumer Tech"],["12","Keystone Travel","Retail"],["13","Pioneer Energy","SMB"],["14","Beacon Robotics","null"],["15","Granite Bank","Market

Output[["endpoint latency","2","2"],["permissions","2","2"],["vector index sync","2","1"]]

Problem

Task

Your query should use the extracted theme labels produced upstream by PySpark, but the task here is to analyze them in PostgreSQL.

Requirements

Return one row per theme_name for tickets created in April 2024.
Include only tickets where status is open or resolved.
Count distinct tickets per theme as ticket_count.
Calculate high_priority_count where ticket priority is P1 or P2.
Return only themes appearing in at least 2 distinct tickets.
Sort by ticket_count descending, then high_priority_count descending, then theme_name ascending.

Table Definitions

`support_tickets`

column	type	description
ticket_id	INT	Unique support ticket ID
customer_id	INT	Customer account ID
created_at	DATE	Ticket creation date
status	VARCHAR(20)	Ticket status
priority	VARCHAR(5)	Ticket priority
product_area	VARCHAR(50)	Databricks product area

`ticket_themes`

column	type	description
theme_id	INT	Unique extracted theme row
ticket_id	INT	Related ticket
theme_name	VARCHAR(100)	Theme extracted by a PySpark job
confidence_score	DECIMAL(4,2)	Extraction confidence

`customers`

column	type	description
customer_id	INT	Customer account ID
account_name	VARCHAR(100)	Customer name
segment	VARCHAR(30)	Customer segment

Sample Data

`support_tickets`

ticket_id	customer_id	created_at	status	priority	product_area
101	1	2024-04-03	open	P1	Unity Catalog
102	2	2024-04-06	resolved	P2	Model Serving
103	3	2024-04-08	pending	P3	Vector Search
104	2	2024-04-12	open	P2	Delta Lake
105	4	2024-04-15	resolved	P4	Foundation Model APIs
106	5	2024-03-29	open	P1	DBRX

`ticket_themes`

theme_id	ticket_id	theme_name	confidence_score
1001	101	permissions	0.95
1002	101	catalog sync	0.72
1003	102	endpoint latency	0.91
1004	104	permissions	0.88
1005	105	prompt timeout	0.67
1006	105	null	0.40

Expected Output

theme_name	ticket_count	high_priority_count
permissions	2	2
endpoint latency	2	2
vector index sync	2	1

Schema

support_tickets

Column	Type	Description
ticket_idPK	INT	Unique support ticket identifier
customer_id	INT	Customer account identifier
created_at	DATE	Date the support ticket was created
status	VARCHAR(20)	Current ticket status
priority	VARCHAR(5)	Ticket priority such as P1 or P2
product_area	VARCHAR(50)	Databricks product area mentioned in the ticket

ticket_themes

Column	Type	Description
theme_idPK	INT	Unique extracted theme row identifier
ticket_id	INT	Related support ticket identifier
theme_name	VARCHAR(100)	Theme extracted from ticket text by a PySpark pipeline
confidence_score	DECIMAL(4,2)	Confidence score for the extracted theme

customers

Column	Type	Description
customer_idPK	INT	Customer account identifier
account_name	VARCHAR(100)	Customer account name
segment	VARCHAR(30)	Customer segment

Sample Data

Example 1

Input

{"customers":[["1","Acme Retail","Enterprise"],["2","Northwind Health","Enterprise"],["3","BluePeak Labs","Mid-Market"],["4","Crest Finance","Enterprise"],["5","Helio Media","SMB"],["6","Orbit Logistics","Mid-Market"],["7","Pioneer Energy","Enterprise"],["8","Summit AI","Startup"],["9","Lakehouse Foods","SMB"],["10","Vertex Bio",null],["11","Crest Finance","Consumer Tech"],["12","Keystone Travel","Retail"],["13","Pioneer Energy","SMB"],["14","Beacon Robotics","null"],["15","Granite Bank","Market

Output[["endpoint latency","2","2"],["permissions","2","2"],["vector index sync","2","1"]]

Practice SQL

PostgreSQL

Open on desktop for the full SQL editor with syntax highlighting and autocomplete.

Up next

LSummarize Lyft Support Ticket TrendsEasy ASummarize Customer Support OutcomesMedium

Rank Common Support Issues by PracticeMedium

Next question

Aggregate Support Ticket Themes

Medium

SQL & Data Manipulation

Asked at 1 company1AggregationsGroup ByData Wrangling

Also asked at

Problem

Task

Your query should use the extracted theme labels produced upstream by PySpark, but the task here is to analyze them in PostgreSQL.

Requirements

Return one row per theme_name for tickets created in April 2024.
Include only tickets where status is open or resolved.
Count distinct tickets per theme as ticket_count.
Calculate high_priority_count where ticket priority is P1 or P2.
Return only themes appearing in at least 2 distinct tickets.
Sort by ticket_count descending, then high_priority_count descending, then theme_name ascending.

Table Definitions

`support_tickets`

column	type	description
ticket_id	INT	Unique support ticket ID
customer_id	INT	Customer account ID
created_at	DATE	Ticket creation date
status	VARCHAR(20)	Ticket status
priority	VARCHAR(5)	Ticket priority
product_area	VARCHAR(50)	Databricks product area

`ticket_themes`

column	type	description
theme_id	INT	Unique extracted theme row
ticket_id	INT	Related ticket
theme_name	VARCHAR(100)	Theme extracted by a PySpark job
confidence_score	DECIMAL(4,2)	Extraction confidence

`customers`

column	type	description
customer_id	INT	Customer account ID
account_name	VARCHAR(100)	Customer name
segment	VARCHAR(30)	Customer segment

Sample Data

`support_tickets`

ticket_id	customer_id	created_at	status	priority	product_area
101	1	2024-04-03	open	P1	Unity Catalog
102	2	2024-04-06	resolved	P2	Model Serving
103	3	2024-04-08	pending	P3	Vector Search
104	2	2024-04-12	open	P2	Delta Lake
105	4	2024-04-15	resolved	P4	Foundation Model APIs
106	5	2024-03-29	open	P1	DBRX

`ticket_themes`

theme_id	ticket_id	theme_name	confidence_score
1001	101	permissions	0.95
1002	101	catalog sync	0.72
1003	102	endpoint latency	0.91
1004	104	permissions	0.88
1005	105	prompt timeout	0.67
1006	105	null	0.40

Expected Output

theme_name	ticket_count	high_priority_count
permissions	2	2
endpoint latency	2	2
vector index sync	2	1

Schema

support_tickets

Column	Type	Description
ticket_idPK	INT	Unique support ticket identifier
customer_id	INT	Customer account identifier
created_at	DATE	Date the support ticket was created
status	VARCHAR(20)	Current ticket status
priority	VARCHAR(5)	Ticket priority such as P1 or P2
product_area	VARCHAR(50)	Databricks product area mentioned in the ticket

ticket_themes

Column	Type	Description
theme_idPK	INT	Unique extracted theme row identifier
ticket_id	INT	Related support ticket identifier
theme_name	VARCHAR(100)	Theme extracted from ticket text by a PySpark pipeline
confidence_score	DECIMAL(4,2)	Confidence score for the extracted theme

customers

Column	Type	Description
customer_idPK	INT	Customer account identifier
account_name	VARCHAR(100)	Customer account name
segment	VARCHAR(30)	Customer segment

Sample Data

Example 1

Input

{"customers":[["1","Acme Retail","Enterprise"],["2","Northwind Health","Enterprise"],["3","BluePeak Labs","Mid-Market"],["4","Crest Finance","Enterprise"],["5","Helio Media","SMB"],["6","Orbit Logistics","Mid-Market"],["7","Pioneer Energy","Enterprise"],["8","Summit AI","Startup"],["9","Lakehouse Foods","SMB"],["10","Vertex Bio",null],["11","Crest Finance","Consumer Tech"],["12","Keystone Travel","Retail"],["13","Pioneer Energy","SMB"],["14","Beacon Robotics","null"],["15","Granite Bank","Market

Output[["endpoint latency","2","2"],["permissions","2","2"],["vector index sync","2","1"]]

Problem

Task

Your query should use the extracted theme labels produced upstream by PySpark, but the task here is to analyze them in PostgreSQL.

Requirements

Return one row per theme_name for tickets created in April 2024.
Include only tickets where status is open or resolved.
Count distinct tickets per theme as ticket_count.
Calculate high_priority_count where ticket priority is P1 or P2.
Return only themes appearing in at least 2 distinct tickets.
Sort by ticket_count descending, then high_priority_count descending, then theme_name ascending.

Table Definitions

`support_tickets`

column	type	description
ticket_id	INT	Unique support ticket ID
customer_id	INT	Customer account ID
created_at	DATE	Ticket creation date
status	VARCHAR(20)	Ticket status
priority	VARCHAR(5)	Ticket priority
product_area	VARCHAR(50)	Databricks product area

`ticket_themes`

column	type	description
theme_id	INT	Unique extracted theme row
ticket_id	INT	Related ticket
theme_name	VARCHAR(100)	Theme extracted by a PySpark job
confidence_score	DECIMAL(4,2)	Extraction confidence

`customers`

column	type	description
customer_id	INT	Customer account ID
account_name	VARCHAR(100)	Customer name
segment	VARCHAR(30)	Customer segment

Sample Data

`support_tickets`

ticket_id	customer_id	created_at	status	priority	product_area
101	1	2024-04-03	open	P1	Unity Catalog
102	2	2024-04-06	resolved	P2	Model Serving
103	3	2024-04-08	pending	P3	Vector Search
104	2	2024-04-12	open	P2	Delta Lake
105	4	2024-04-15	resolved	P4	Foundation Model APIs
106	5	2024-03-29	open	P1	DBRX

`ticket_themes`

theme_id	ticket_id	theme_name	confidence_score
1001	101	permissions	0.95
1002	101	catalog sync	0.72
1003	102	endpoint latency	0.91
1004	104	permissions	0.88
1005	105	prompt timeout	0.67
1006	105	null	0.40

Expected Output

theme_name	ticket_count	high_priority_count
permissions	2	2
endpoint latency	2	2
vector index sync	2	1

Schema

support_tickets

Column	Type	Description
ticket_idPK	INT	Unique support ticket identifier
customer_id	INT	Customer account identifier
created_at	DATE	Date the support ticket was created
status	VARCHAR(20)	Current ticket status
priority	VARCHAR(5)	Ticket priority such as P1 or P2
product_area	VARCHAR(50)	Databricks product area mentioned in the ticket

ticket_themes

Column	Type	Description
theme_idPK	INT	Unique extracted theme row identifier
ticket_id	INT	Related support ticket identifier
theme_name	VARCHAR(100)	Theme extracted from ticket text by a PySpark pipeline
confidence_score	DECIMAL(4,2)	Confidence score for the extracted theme

customers

Column	Type	Description
customer_idPK	INT	Customer account identifier
account_name	VARCHAR(100)	Customer account name
segment	VARCHAR(30)	Customer segment

Sample Data

Example 1

Input

{"customers":[["1","Acme Retail","Enterprise"],["2","Northwind Health","Enterprise"],["3","BluePeak Labs","Mid-Market"],["4","Crest Finance","Enterprise"],["5","Helio Media","SMB"],["6","Orbit Logistics","Mid-Market"],["7","Pioneer Energy","Enterprise"],["8","Summit AI","Startup"],["9","Lakehouse Foods","SMB"],["10","Vertex Bio",null],["11","Crest Finance","Consumer Tech"],["12","Keystone Travel","Retail"],["13","Pioneer Energy","SMB"],["14","Beacon Robotics","null"],["15","Granite Bank","Market

Output[["endpoint latency","2","2"],["permissions","2","2"],["vector index sync","2","1"]]

Practice SQL

PostgreSQL

Open on desktop for the full SQL editor with syntax highlighting and autocomplete.

Up next

LSummarize Lyft Support Ticket TrendsEasy ASummarize Customer Support OutcomesMedium

Rank Common Support Issues by PracticeMedium

Next question