Business Context
VerbaFlow, a meeting transcription platform, wants to improve post-ASR grammar correction for English transcripts used in customer notes, call summaries, and compliance review. Raw transcripts are readable but contain missing punctuation, casing errors, disfluencies, and ASR-specific grammatical mistakes that reduce downstream quality.
Data
- Volume: 2.4M paired samples of raw ASR transcript and human-corrected text
- Text length: 5-1,200 tokens per segment; median 78 tokens
- Language: English only, with accents, domain jargon, and conversational speech
- Label quality: Human edits are noisy; about 12% of pairs include stylistic rewrites beyond grammar correction
- Error distribution: Missing punctuation and casing are common; agreement, tense, homophone, and segmentation errors are less frequent but higher impact
Success Criteria
A good solution should improve grammaticality without changing speaker intent, named entities, or domain terms. Target at least +8 GEC F0.5 over the current baseline, entity preservation above 98%, and p95 inference latency under 250ms for 128-token segments.
Constraints
- Must run in a secure Python inference service on a single A10 GPU
- No external API calls during training or inference
- Model updates should support weekly fine-tuning on newly corrected transcripts
Requirements
- Build a grammar correction system for noisy transcription text using modern transformer-based methods.
- Design a preprocessing pipeline that handles disfluencies, speaker markers, timestamps, and inconsistent punctuation.
- Explain how you would reduce harmful rewrites and preserve factual content.
- Implement training, validation, and offline evaluation in Python.
- Propose an error analysis framework for ASR-specific failures such as homophones, run-on sentences, and punctuation restoration.