Classify Generative AI Research Updates

Business Context

At NovaMind, the applied AI team tracks fast-moving generative AI updates from research blogs, model release notes, newsletters, and technical forums. Recruiters use this exercise to test whether candidates can turn an open-ended question about “staying updated” into a practical NLP system for organizing and prioritizing information.

Data

You are given a corpus of 180,000 English documents collected over 18 months from arXiv abstracts, vendor blogs, GitHub release notes, benchmark reports, and curated newsletters. Documents range from 40 to 1,200 words (median: 220). Each document is labeled into one of five update types: Model Release (28%), Research Paper (24%), Tooling/Framework (18%), Safety/Policy (12%), and Application Case Study (18%). Roughly 6% of records contain boilerplate, duplicated snippets, or malformed HTML.

Success Criteria

A good solution should achieve macro-F1 ≥ 0.84, F1 ≥ 0.88 on Safety/Policy documents, and support batch inference under 150 ms per document on a single T4 GPU. The system should be robust to noisy formatting and domain-specific terminology such as RAG, LoRA, quantization, eval harnesses, and synthetic data.

Constraints

Use only English text
Must run in a secure internal environment
Weekly retraining with newly labeled documents
Model size should remain practical for production deployment

Requirements

Build an NLP pipeline to classify generative AI update documents into the 5 categories.
Design realistic preprocessing for noisy web and technical text.
Fine-tune a modern transformer in Python.
Evaluate the model with class-aware metrics and an error analysis plan.
Explain how this classifier could support a workflow for staying updated with generative AI advancements.

Business Context

Data

Success Criteria

Constraints

Use only English text
Must run in a secure internal environment
Weekly retraining with newly labeled documents
Model size should remain practical for production deployment

Requirements

Build an NLP pipeline to classify generative AI update documents into the 5 categories.
Design realistic preprocessing for noisy web and technical text.
Fine-tune a modern transformer in Python.
Evaluate the model with class-aware metrics and an error analysis plan.
Explain how this classifier could support a workflow for staying updated with generative AI advancements.

Business Context

Data

Success Criteria

Constraints

Use only English text
Must run in a secure internal environment
Weekly retraining with newly labeled documents
Model size should remain practical for production deployment

Requirements

Build an NLP pipeline to classify generative AI update documents into the 5 categories.
Design realistic preprocessing for noisy web and technical text.
Fine-tune a modern transformer in Python.
Evaluate the model with class-aware metrics and an error analysis plan.
Explain how this classifier could support a workflow for staying updated with generative AI advancements.

Business Context

Data

Success Criteria

Constraints

Use only English text
Must run in a secure internal environment
Weekly retraining with newly labeled documents
Model size should remain practical for production deployment

Requirements

Build an NLP pipeline to classify generative AI update documents into the 5 categories.
Design realistic preprocessing for noisy web and technical text.
Fine-tune a modern transformer in Python.
Evaluate the model with class-aware metrics and an error analysis plan.
Explain how this classifier could support a workflow for staying updated with generative AI advancements.

Interview Guides

Business Context

Data

Success Criteria

Constraints

Requirements

Classify Generative AI Research Updates

Business Context

Data

Success Criteria

Constraints

Requirements

Your Answer

Classify Generative AI Research Updates

Business Context

Data

Success Criteria

Constraints

Requirements

Classify Generative AI Research Updates

Business Context

Data

Success Criteria

Constraints

Requirements

Your Answer