Detect Hallucination Risk in AI Responses

Business Context

At AcmeAI, account managers use a generative assistant to draft client-facing answers about product capabilities, pricing, and compliance. Leadership wants a lightweight NLP system that flags responses with high hallucination risk before they are shared with non-technical stakeholders.

Data

You have 180,000 historical prompt-response pairs labeled by reviewers as Low Risk, Needs Review, or High Risk for hallucination. Text is English-only. Prompts range from 10-120 words, and model responses range from 30-600 words (median 145). Labels are moderately imbalanced: 62% Low Risk, 28% Needs Review, 10% High Risk. Metadata includes model version, retrieval-used flag, and product domain (sales, legal, support), but the core task should rely on text.

Success Criteria

A good solution achieves High Risk recall >= 0.90, macro-F1 >= 0.82, and supports analyst review with interpretable signals. Inference should stay under 150 ms per response in batch scoring.

Constraints

Must run inside a secure Python service with no external API calls
Model size should be small enough for CPU deployment
Review teams need explanations they can use in stakeholder training
False negatives on High Risk are more costly than false positives

Requirements

Build a 3-class NLP classifier to score hallucination risk from prompt-response text.
Design a preprocessing pipeline that captures unsupported claims, hedging, and overconfident language.
Implement a modern Python solution with a strong baseline and a transformer-based approach.
Explain how you would evaluate class imbalance, thresholding, and calibration.
Describe how outputs would be translated into plain-language guidance for non-technical stakeholders.
Identify common failure modes, including domain shift across model versions and product areas.

Business Context

Data

Success Criteria

Constraints

Must run inside a secure Python service with no external API calls
Model size should be small enough for CPU deployment
Review teams need explanations they can use in stakeholder training
False negatives on High Risk are more costly than false positives

Requirements

Build a 3-class NLP classifier to score hallucination risk from prompt-response text.
Design a preprocessing pipeline that captures unsupported claims, hedging, and overconfident language.
Implement a modern Python solution with a strong baseline and a transformer-based approach.
Explain how you would evaluate class imbalance, thresholding, and calibration.
Describe how outputs would be translated into plain-language guidance for non-technical stakeholders.
Identify common failure modes, including domain shift across model versions and product areas.

Business Context

Data

Success Criteria

Constraints

Must run inside a secure Python service with no external API calls
Model size should be small enough for CPU deployment
Review teams need explanations they can use in stakeholder training
False negatives on High Risk are more costly than false positives

Requirements

Build a 3-class NLP classifier to score hallucination risk from prompt-response text.
Design a preprocessing pipeline that captures unsupported claims, hedging, and overconfident language.
Implement a modern Python solution with a strong baseline and a transformer-based approach.
Explain how you would evaluate class imbalance, thresholding, and calibration.
Describe how outputs would be translated into plain-language guidance for non-technical stakeholders.
Identify common failure modes, including domain shift across model versions and product areas.

Business Context

Data

Success Criteria

Constraints

Must run inside a secure Python service with no external API calls
Model size should be small enough for CPU deployment
Review teams need explanations they can use in stakeholder training
False negatives on High Risk are more costly than false positives

Requirements

Build a 3-class NLP classifier to score hallucination risk from prompt-response text.
Design a preprocessing pipeline that captures unsupported claims, hedging, and overconfident language.
Implement a modern Python solution with a strong baseline and a transformer-based approach.
Explain how you would evaluate class imbalance, thresholding, and calibration.
Describe how outputs would be translated into plain-language guidance for non-technical stakeholders.
Identify common failure modes, including domain shift across model versions and product areas.

Interview Guides

Business Context

Data

Success Criteria

Constraints

Requirements

Detect Hallucination Risk in AI Responses

Business Context

Data

Success Criteria

Constraints

Requirements

Your Answer

Detect Hallucination Risk in AI Responses

Business Context

Data

Success Criteria

Constraints

Requirements

Detect Hallucination Risk in AI Responses

Business Context

Data

Success Criteria

Constraints

Requirements

Your Answer