American Family Insurance - Colorado collects customer feedback from post-claim surveys, call-center notes, email responses, and mobile app comments. The marketing analytics team wants an NLP pipeline that scores sentiment reliably so they can identify friction in the claims experience and track brand perception by touchpoint.
You are given a historical dataset of approximately 420,000 feedback records from the last 18 months across AmFam Colorado channels. Text is mostly English, with about 7% Spanish and occasional insurance-specific abbreviations (e.g., "FNOL," "adjuster," "deductible," "total loss"). Feedback length ranges from 5 to 900 characters, with a median of 110 characters. Labels are available for 120,000 records and follow a 3-class distribution: positive (52%), neutral (28%), negative (20%). Some records contain personally identifiable information and policy references.
A strong solution should achieve macro-F1 ≥ 0.82, negative-class recall ≥ 0.88, and produce batch scores fast enough to process daily feedback within 30 minutes. Outputs should support dashboarding by channel, product line, and claim stage.