Classify Malicious Firewall Traffic

Business Context

NetShield operates managed firewalls for 8,000 mid-market customers and wants a lightweight ML classifier to flag potentially malicious network connections before they are allowed through the perimeter. The security team needs a model that complements rule-based firewall policies by prioritizing suspicious traffic for blocking or analyst review.

Dataset

You are given historical firewall connection logs labeled by the security operations team.

Feature Group	Count	Examples
Network flow metrics	12	duration_ms, bytes_in, bytes_out, packets, avg_packet_size
Protocol and port data	8	protocol, src_port, dst_port, well_known_service
Endpoint metadata	7	src_ip_internal, dst_geo_region, device_type, subnet_risk_score
Session behavior	9	failed_connection_count_1h, distinct_dst_ports_24h, connection_rate_5m
Firewall context	6	rule_id, action_history, policy_zone, time_of_day

Size: 420K connection records collected over 6 weeks, 42 features
Target: Binary label — malicious connection (1) vs benign connection (0)
Class balance: 6.5% malicious, 93.5% benign
Missing data: ~10% missing in geo/device enrichment fields and ~3% missing in behavioral aggregates for newly observed IPs

Success Criteria

A strong solution should achieve recall >= 0.85 on malicious traffic while keeping precision >= 0.40, since missed attacks are costly but excessive false positives create alert fatigue. The model should also provide feature importance so analysts can understand why traffic was flagged.

Constraints

Inference latency must stay under 20 ms per connection in an online scoring service
The first production version must be explainable to security analysts
Retraining can happen daily or weekly, but not continuously
The solution should avoid leakage from future behavioral aggregates

Deliverables

Build a binary classification pipeline for malicious traffic detection
Explain model choice, preprocessing, and leakage prevention
Evaluate the model with appropriate imbalanced-class metrics and threshold tuning
Show how firewall-related features are engineered from raw logs
Describe how the model would be deployed alongside existing firewall rules

Business Context

Dataset

You are given historical firewall connection logs labeled by the security operations team.

Feature Group	Count	Examples
Network flow metrics	12	duration_ms, bytes_in, bytes_out, packets, avg_packet_size
Protocol and port data	8	protocol, src_port, dst_port, well_known_service
Endpoint metadata	7	src_ip_internal, dst_geo_region, device_type, subnet_risk_score
Session behavior	9	failed_connection_count_1h, distinct_dst_ports_24h, connection_rate_5m
Firewall context	6	rule_id, action_history, policy_zone, time_of_day

Size: 420K connection records collected over 6 weeks, 42 features
Target: Binary label — malicious connection (1) vs benign connection (0)
Class balance: 6.5% malicious, 93.5% benign
Missing data: ~10% missing in geo/device enrichment fields and ~3% missing in behavioral aggregates for newly observed IPs

Success Criteria

Constraints

Inference latency must stay under 20 ms per connection in an online scoring service
The first production version must be explainable to security analysts
Retraining can happen daily or weekly, but not continuously
The solution should avoid leakage from future behavioral aggregates

Deliverables

Build a binary classification pipeline for malicious traffic detection
Explain model choice, preprocessing, and leakage prevention
Evaluate the model with appropriate imbalanced-class metrics and threshold tuning
Show how firewall-related features are engineered from raw logs
Describe how the model would be deployed alongside existing firewall rules

Business Context

Dataset

You are given historical firewall connection logs labeled by the security operations team.

Feature Group	Count	Examples
Network flow metrics	12	duration_ms, bytes_in, bytes_out, packets, avg_packet_size
Protocol and port data	8	protocol, src_port, dst_port, well_known_service
Endpoint metadata	7	src_ip_internal, dst_geo_region, device_type, subnet_risk_score
Session behavior	9	failed_connection_count_1h, distinct_dst_ports_24h, connection_rate_5m
Firewall context	6	rule_id, action_history, policy_zone, time_of_day

Size: 420K connection records collected over 6 weeks, 42 features
Target: Binary label — malicious connection (1) vs benign connection (0)
Class balance: 6.5% malicious, 93.5% benign
Missing data: ~10% missing in geo/device enrichment fields and ~3% missing in behavioral aggregates for newly observed IPs

Success Criteria

Constraints

Inference latency must stay under 20 ms per connection in an online scoring service
The first production version must be explainable to security analysts
Retraining can happen daily or weekly, but not continuously
The solution should avoid leakage from future behavioral aggregates

Deliverables

Build a binary classification pipeline for malicious traffic detection
Explain model choice, preprocessing, and leakage prevention
Evaluate the model with appropriate imbalanced-class metrics and threshold tuning
Show how firewall-related features are engineered from raw logs
Describe how the model would be deployed alongside existing firewall rules

Business Context

Dataset

You are given historical firewall connection logs labeled by the security operations team.

Feature Group	Count	Examples
Network flow metrics	12	duration_ms, bytes_in, bytes_out, packets, avg_packet_size
Protocol and port data	8	protocol, src_port, dst_port, well_known_service
Endpoint metadata	7	src_ip_internal, dst_geo_region, device_type, subnet_risk_score
Session behavior	9	failed_connection_count_1h, distinct_dst_ports_24h, connection_rate_5m
Firewall context	6	rule_id, action_history, policy_zone, time_of_day

Size: 420K connection records collected over 6 weeks, 42 features
Target: Binary label — malicious connection (1) vs benign connection (0)
Class balance: 6.5% malicious, 93.5% benign
Missing data: ~10% missing in geo/device enrichment fields and ~3% missing in behavioral aggregates for newly observed IPs

Success Criteria

Constraints

Inference latency must stay under 20 ms per connection in an online scoring service
The first production version must be explainable to security analysts
Retraining can happen daily or weekly, but not continuously
The solution should avoid leakage from future behavioral aggregates

Deliverables

Build a binary classification pipeline for malicious traffic detection
Explain model choice, preprocessing, and leakage prevention
Evaluate the model with appropriate imbalanced-class metrics and threshold tuning
Show how firewall-related features are engineered from raw logs
Describe how the model would be deployed alongside existing firewall rules

Interview Guides

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Classify Malicious Firewall Traffic

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Your Answer

Classify Malicious Firewall Traffic

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Classify Malicious Firewall Traffic

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Your Answer