LinguaPrep, an English-learning platform, wants to automatically score sentence completion questions where learners choose the grammatically correct word or phrase to fill a blank. The goal is to build an NLP system that predicts the correct completion option and supports fast feedback in practice sessions.
The training set contains approximately 180,000 multiple-choice sentence completion items collected from grammar exercises and exam prep content. Each example includes a sentence with one blank, 4 candidate options, and the correct answer. Sentences are short to medium length (8-35 tokens), written in English, and cover common grammar topics such as verb tense, subject-verb agreement, articles, prepositions, and pronouns. Label distribution is balanced across answer positions, but grammar categories are unevenly distributed, with tense and preposition errors appearing most often.
A production-ready solution should achieve at least 88% accuracy on a held-out test set and maintain low latency (<50ms per question) for real-time quiz feedback. The approach should also be interpretable enough to explain common failure modes.