Business Context
Hexaware Talent Analytics wants to automatically categorize candidate interview responses to identify whether applicants demonstrate the core NLP proficiency expected for an AI Engineer role. The goal is to screen short free-text answers from candidates and route strong NLP-aligned responses to technical reviewers.
Data
You are given a labeled dataset of 18,000 interview responses collected from mock screening rounds.
- Task: classify each response into one of three labels:
Core NLP Proficiency, Partial/Generic AI Knowledge, or Irrelevant/Incorrect
- Text length: 20-180 words per response, median 62 words
- Language: English only
- Label distribution: 48% Core NLP Proficiency, 34% Partial/Generic AI Knowledge, 18% Irrelevant/Incorrect
- Responses contain recruiter shorthand, inconsistent punctuation, bullet fragments, and references to Python, transformers, NER, text classification, and LLM workflows
Success Criteria
A production-ready solution should achieve:
- Macro-F1 >= 0.82 on a held-out test set
- Recall >= 0.88 for
Core NLP Proficiency
- Inference latency under 80 ms per response in batch scoring
Constraints
- The model must run on a single CPU or modest GPU
- Recruiters need interpretable outputs for borderline cases
- The pipeline should support weekly retraining as new labeled responses arrive
Requirements
- Build a multi-class text classification pipeline for interview responses.
- Define a realistic preprocessing workflow for noisy short-form text.
- Implement a strong baseline in modern Python and compare it with a transformer-based model.
- Evaluate performance with class-aware metrics and explain likely failure modes.
- Describe how you would deploy the classifier for recruiter-facing screening.