Brillio is helping an insurance client improve claim cost prediction in its analytics platform. The current linear model underestimates high-cost claims because raw operational fields are noisy, sparse, and weakly encoded.
You are given a historical claims dataset used to predict total claim payout amount for open claims 14 days after first notice of loss. The goal is to design a feature engineering approach that improves regression performance while remaining explainable enough for claims operations.
| Feature Group | Count | Examples |
|---|---|---|
| Claim attributes | 12 | claim_type, loss_cause, injury_flag, policy_tenure_months |
| Customer & policy | 10 | customer_age, vehicle_age, premium_amount, region, coverage_type |
| Early lifecycle signals | 9 | days_to_first_adjuster_contact, documents_submitted_7d, repair_shop_flag |
| Derived dates | 6 | loss_month, report_weekday, days_since_policy_start |
| Text-derived flags | 4 | attorney_mentioned, fraud_keyword_count, sentiment_score, escalation_flag |
A good solution should reduce error versus a regularized linear baseline and clearly justify which engineered features matter most. Aim for MAE < $1,850 and RMSE < $4,200 on a held-out time-based test set.