Business Context
BrightFund, a mid-sized online fundraising platform, wants to predict which past donors are likely to donate again in the next 90 days so the marketing team can prioritize outreach. The model will be used in a weekly batch campaign for roughly 300K donors.
Dataset
| Feature Group | Count | Examples |
|---|
| Donor history | 12 | lifetime_donation_count, total_donated_usd, avg_gift_amount, days_since_last_donation |
| Campaign engagement | 9 | emails_opened_30d, email_click_rate, sms_opt_in, site_visits_30d |
| Donor profile | 8 | acquisition_channel, region, donor_type, tenure_days |
| Temporal features | 6 | quarter_of_year, giving_season_flag, days_since_signup, last_campaign_response_days |
| Payment behavior | 5 | recurring_plan_flag, failed_payment_count, payment_method_type, refund_count |
- Size: 310K donor records, 40 features, 24 months of history
- Target: Binary label indicating whether a donor makes at least one donation in the next 90 days
- Class balance: 18% positive, 82% negative
- Missing data: 10% missing in engagement fields, 6% missing in donor profile fields for older imported records
Success Criteria
A good solution should achieve strong ranking quality for campaign targeting, with ROC-AUC >= 0.82, PR-AUC >= 0.45, and precision in the top 10% scored donors >= 0.50. The team also needs feature importance to explain why donors are selected.
Constraints
- Weekly batch scoring must finish in under 30 minutes.
- The model should be interpretable enough for fundraising and CRM teams.
- Retraining should be feasible monthly with standard Python tooling.
- Avoid data leakage from future donations or post-label campaign activity.
Deliverables
- Build a binary classification model to predict donation in the next 90 days.
- Define a leakage-safe train/validation/test strategy using time-based splits.
- Engineer useful donor behavior and recency features.
- Compare a simple baseline with a stronger tree-based model.
- Report evaluation metrics and recommend a production decision threshold.
- Summarize feature importance and operational tradeoffs.