NetShield operates managed firewalls for 8,000 mid-market customers and wants a lightweight ML classifier to flag potentially malicious network connections before they are allowed through the perimeter. The security team needs a model that complements rule-based firewall policies by prioritizing suspicious traffic for blocking or analyst review.
You are given historical firewall connection logs labeled by the security operations team.
| Feature Group | Count | Examples |
|---|---|---|
| Network flow metrics | 12 | duration_ms, bytes_in, bytes_out, packets, avg_packet_size |
| Protocol and port data | 8 | protocol, src_port, dst_port, well_known_service |
| Endpoint metadata | 7 | src_ip_internal, dst_geo_region, device_type, subnet_risk_score |
| Session behavior | 9 | failed_connection_count_1h, distinct_dst_ports_24h, connection_rate_5m |
| Firewall context | 6 | rule_id, action_history, policy_zone, time_of_day |
A strong solution should achieve recall >= 0.85 on malicious traffic while keeping precision >= 0.40, since missed attacks are costly but excessive false positives create alert fatigue. The model should also provide feature importance so analysts can understand why traffic was flagged.