Analyze Steampunk Support Intelligence

Business Context

Steampunk wants to apply advanced analytics, NLP, and LLMs to customer-facing text that matters operationally: support tickets, implementation notes, and user feedback from Steampunk Spotter and related delivery workflows. Your goal is to design an NLP pipeline that turns unstructured text into actionable signals for routing, trend detection, and executive reporting.

Data

Volume: 850,000 historical tickets and comments collected over 18 months
Text length: 20-1,200 tokens per record; median 140 tokens
Language: English only for v1
Labels available:
- Issue category (12 classes)
- Business impact (low/medium/high)
- Partial entity annotations for product, environment, agency, and feature names on ~90,000 records
Distribution: Highly imbalanced; the top 3 categories represent ~62% of tickets
Noise: Duplicated boilerplate, stack traces, URLs, markdown, and inconsistent product naming

Success Criteria

A strong solution should achieve macro-F1 >= 0.82 on issue category classification, F1 >= 0.88 on entity extraction for key operational entities, and produce an LLM-generated summary that is grounded in retrieved evidence and useful for weekly operations reviews.

Constraints

Inference for classification + NER must stay under 250 ms per ticket in batch-serving conditions
The solution must run in a secure environment; no external API dependency is required
Outputs must be auditable and robust to prompt injection in ticket text

Requirements

Build a multi-task NLP workflow for issue classification, entity extraction, and retrieval-grounded summarization.
Define a realistic preprocessing pipeline for noisy enterprise text.
Implement modern Python components using transformers, spaCy, and sentence embeddings.
Explain how you would fine-tune, validate, and monitor the system.
Describe how advanced analytics and LLM outputs would be tied to decisions that matter, such as escalation, backlog prioritization, and recurring-issue detection.

Business Context

Data

Volume: 850,000 historical tickets and comments collected over 18 months

Text length: 20-1,200 tokens per record; median 140 tokens

Language: English only for v1

Labels available:

Issue category (12 classes)
Business impact (low/medium/high)
Partial entity annotations for product, environment, agency, and feature names on ~90,000 records

Distribution: Highly imbalanced; the top 3 categories represent ~62% of tickets

Noise: Duplicated boilerplate, stack traces, URLs, markdown, and inconsistent product naming

Requirements

Build a multi-task NLP workflow for issue classification, entity extraction, and retrieval-grounded summarization.

Define a realistic preprocessing pipeline for noisy enterprise text.

Implement modern Python components using transformers, spaCy, and sentence embeddings.

Explain how you would fine-tune, validate, and monitor the system.

Describe how advanced analytics and LLM outputs would be tied to decisions that matter, such as escalation, backlog prioritization, and recurring-issue detection.

Business Context

Data

Volume: 850,000 historical tickets and comments collected over 18 months

Text length: 20-1,200 tokens per record; median 140 tokens

Language: English only for v1

Labels available:

Issue category (12 classes)
Business impact (low/medium/high)
Partial entity annotations for product, environment, agency, and feature names on ~90,000 records

Distribution: Highly imbalanced; the top 3 categories represent ~62% of tickets

Noise: Duplicated boilerplate, stack traces, URLs, markdown, and inconsistent product naming

Requirements

Build a multi-task NLP workflow for issue classification, entity extraction, and retrieval-grounded summarization.

Define a realistic preprocessing pipeline for noisy enterprise text.

Implement modern Python components using transformers, spaCy, and sentence embeddings.

Explain how you would fine-tune, validate, and monitor the system.

Describe how advanced analytics and LLM outputs would be tied to decisions that matter, such as escalation, backlog prioritization, and recurring-issue detection.

Business Context

Data

Volume: 850,000 historical tickets and comments collected over 18 months

Text length: 20-1,200 tokens per record; median 140 tokens

Language: English only for v1

Labels available:

Issue category (12 classes)
Business impact (low/medium/high)
Partial entity annotations for product, environment, agency, and feature names on ~90,000 records

Distribution: Highly imbalanced; the top 3 categories represent ~62% of tickets

Noise: Duplicated boilerplate, stack traces, URLs, markdown, and inconsistent product naming

Requirements

Build a multi-task NLP workflow for issue classification, entity extraction, and retrieval-grounded summarization.

Define a realistic preprocessing pipeline for noisy enterprise text.

Implement modern Python components using transformers, spaCy, and sentence embeddings.

Explain how you would fine-tune, validate, and monitor the system.

Describe how advanced analytics and LLM outputs would be tied to decisions that matter, such as escalation, backlog prioritization, and recurring-issue detection.

Interview Guides

Business Context

Data

Success Criteria

Constraints

Requirements

Analyze Steampunk Support Intelligence

Business Context

Data

Success Criteria

Constraints

Requirements

Your Answer

Analyze Steampunk Support Intelligence

Business Context

Data

Success Criteria

Constraints

Requirements

Analyze Steampunk Support Intelligence

Business Context

Data

Success Criteria

Constraints

Requirements

Your Answer