ShieldGate operates a cloud Web Application Firewall (WAF) protecting 12,000 customer web applications and processing roughly 180 million HTTP requests per day. The security team wants a machine learning classifier that distinguishes legitimate human traffic from malicious bots so the WAF can block or challenge suspicious requests without hurting real users.
You are given request-level logs aggregated from 30 days of WAF traffic. Each row represents one HTTP request with engineered session and network context available at scoring time.
| Feature Group | Count | Examples |
|---|---|---|
| Request metadata | 12 | method, path_depth, query_length, content_type, response_status |
| Header and client signals | 10 | user_agent_family, accept_language_present, cookie_count, header_count |
| Behavioral features | 11 | requests_per_minute_ip, inter_request_time_ms, repeated_path_ratio, session_duration_sec |
| Network and reputation | 8 | asn, country, ip_reputation_score, proxy_flag, datacenter_ip_flag |
| Browser consistency | 6 | ua_os_browser_match, js_challenge_passed, tls_fingerprint_rarity |
A good solution should achieve strong bot recall while keeping false positives low enough for production use. Target at least 90% recall on malicious bots with precision >= 70% on the blocked/challenged class, and support threshold tuning for different customer risk profiles.