VoxBridge, a speech QA platform, reviews customer support calls to measure whether ASR quality degrades for specific English accents. You need to build an NLP system that classifies short call transcripts into accent groups using text-only signals from ASR output.
The dataset contains 180,000 labeled call transcripts collected from voice and accent evaluation sessions over 12 months. Each transcript is 1-8 utterances long, with 20-180 tokens per sample (median: 54). All samples are in English, but the text includes ASR artifacts such as missing punctuation, homophone substitutions, disfluencies, and partial words. Labels are moderately imbalanced: General American (42%), Indian English (21%), British English (14%), Australian English (9%), African English varieties (8%), and Other English accents (6%). Around 5% of records contain noisy labels from manual review disagreements.
A good solution achieves at least 0.82 macro-F1 across accent classes and at least 0.90 recall for the "Other English accents" bucket used for escalation. The model should support batch scoring of 50,000 transcripts per day.