Classify DONE by NONE Feedback Themes

Business Context

DONE by NONE receives a steady stream of short user comments from product feedback forms, support tickets, and app store reviews. The data team wants a simple NLP pipeline that turns raw text into structured categories so product and support teams can triage issues faster.

Data Characteristics

You are given roughly 80,000 English feedback messages collected over 12 months from DONE by NONE surfaces. Messages range from 5 to 180 words (median ~28 words). Labels are manually assigned into 4 classes: Bug Report (30%), Feature Request (25%), Billing/Account (15%), and General Praise/Other (30%). Text is noisy: typos, emojis, URLs, repeated punctuation, and product-specific terms.

Success Criteria

A good solution should achieve at least 0.80 macro-F1 on a held-out test set, with precision and recall above 0.75 for Bug Report and Billing/Account. The approach should be interpretable enough for non-ML stakeholders to understand common tokens driving predictions.

Constraints

Use a lightweight solution that can run in a standard Python environment
Inference should be fast enough for batch scoring daily feedback
Prefer methods that are easy to maintain by the DONE by NONE data team
Assume only English text for this version

Requirements

Build an NLP pipeline that explains the basics clearly: tokenization, normalization, stopword handling, and feature extraction.
Train a baseline text classifier using modern Python tooling.
Show how you would preprocess noisy DONE by NONE feedback before modeling.
Evaluate the model with appropriate classification metrics and a confusion matrix.
Briefly explain when you would move from TF-IDF + linear models to transformer-based models.
Provide example predictions on sample feedback messages.

Business Context

Data Characteristics

Success Criteria

Constraints

Use a lightweight solution that can run in a standard Python environment
Inference should be fast enough for batch scoring daily feedback
Prefer methods that are easy to maintain by the DONE by NONE data team
Assume only English text for this version

Requirements

Build an NLP pipeline that explains the basics clearly: tokenization, normalization, stopword handling, and feature extraction.
Train a baseline text classifier using modern Python tooling.
Show how you would preprocess noisy DONE by NONE feedback before modeling.
Evaluate the model with appropriate classification metrics and a confusion matrix.
Briefly explain when you would move from TF-IDF + linear models to transformer-based models.
Provide example predictions on sample feedback messages.

Business Context

Data Characteristics

Success Criteria

Constraints

Use a lightweight solution that can run in a standard Python environment
Inference should be fast enough for batch scoring daily feedback
Prefer methods that are easy to maintain by the DONE by NONE data team
Assume only English text for this version

Requirements

Build an NLP pipeline that explains the basics clearly: tokenization, normalization, stopword handling, and feature extraction.
Train a baseline text classifier using modern Python tooling.
Show how you would preprocess noisy DONE by NONE feedback before modeling.
Evaluate the model with appropriate classification metrics and a confusion matrix.
Briefly explain when you would move from TF-IDF + linear models to transformer-based models.
Provide example predictions on sample feedback messages.

Business Context

Data Characteristics

Success Criteria

Constraints

Use a lightweight solution that can run in a standard Python environment
Inference should be fast enough for batch scoring daily feedback
Prefer methods that are easy to maintain by the DONE by NONE data team
Assume only English text for this version

Requirements

Build an NLP pipeline that explains the basics clearly: tokenization, normalization, stopword handling, and feature extraction.
Train a baseline text classifier using modern Python tooling.
Show how you would preprocess noisy DONE by NONE feedback before modeling.
Evaluate the model with appropriate classification metrics and a confusion matrix.
Briefly explain when you would move from TF-IDF + linear models to transformer-based models.
Provide example predictions on sample feedback messages.

Interview Guides

Business Context

Data Characteristics

Success Criteria

Constraints

Requirements

Classify DONE by NONE Feedback Themes

Business Context

Data Characteristics

Success Criteria

Constraints

Requirements

Your Answer

Classify DONE by NONE Feedback Themes

Business Context

Data Characteristics

Success Criteria

Constraints

Requirements

Classify DONE by NONE Feedback Themes

Business Context

Data Characteristics

Success Criteria

Constraints

Requirements

Your Answer