Interview Guides

Detect Port Scan Sources

Medium

Coding

Problem

At Meta, security systems ingest large server logs from services such as Proxygen and edge infrastructure. Write a function that parses log lines and returns the source IP addresses that appear to be scanning for open ports.

An IP is considered suspicious if, within any sliding time window of window_seconds, it attempts connections to at least threshold distinct destination ports.

Formal Specification

Implement:

logs: a list of strings, where each string has the format:
"<timestamp> <source_ip> <destination_port>"
window_seconds: integer window size in seconds
threshold: integer minimum number of distinct destination ports in the window

Return a list of suspicious source IPs in lexicographic order.

Each timestamp is a non-negative integer. Multiple log lines may share the same timestamp. If a line is malformed, ignore it.

Examples

Example 1

Input:
logs = [
  "1 10.0.0.1 22",
  "2 10.0.0.1 80",
  "3 10.0.0.1 443",
  "4 10.0.0.2 22"
], window_seconds = 3, threshold = 3
Output:
["10.0.0.1"]

10.0.0.1 touches 3 distinct ports within timestamps 1..3.

Example 2

Input:
logs = ["1 1.1.1.1 80", "10 1.1.1.1 443", "20 1.1.1.1 8080"], window_seconds = 5, threshold = 2
Output:
[]

No 5-second window contains 2 distinct ports.

Constraints

1 <= len(logs) <= 2 * 10^5
1 <= window_seconds <= 10^6
1 <= threshold <= 65535
1 <= destination_port <= 65535
Total solution should be efficient enough for large logs

Detect Port Scan Sources

Medium

Coding

Problem

An IP is considered suspicious if, within any sliding time window of window_seconds, it attempts connections to at least threshold distinct destination ports.

Formal Specification

Implement:

logs: a list of strings, where each string has the format:
"<timestamp> <source_ip> <destination_port>"
window_seconds: integer window size in seconds
threshold: integer minimum number of distinct destination ports in the window

Return a list of suspicious source IPs in lexicographic order.

Each timestamp is a non-negative integer. Multiple log lines may share the same timestamp. If a line is malformed, ignore it.

Examples

Example 1

Input:
logs = [
  "1 10.0.0.1 22",
  "2 10.0.0.1 80",
  "3 10.0.0.1 443",
  "4 10.0.0.2 22"
], window_seconds = 3, threshold = 3
Output:
["10.0.0.1"]

10.0.0.1 touches 3 distinct ports within timestamps 1..3.

Example 2

Input:
logs = ["1 1.1.1.1 80", "10 1.1.1.1 443", "20 1.1.1.1 8080"], window_seconds = 5, threshold = 2
Output:
[]

No 5-second window contains 2 distinct ports.

Constraints

1 <= len(logs) <= 2 * 10^5
1 <= window_seconds <= 10^6
1 <= threshold <= 65535
1 <= destination_port <= 65535
Total solution should be efficient enough for large logs

Python 3.10

Detect Port Scan Sources

Medium

Coding

Problem

An IP is considered suspicious if, within any sliding time window of window_seconds, it attempts connections to at least threshold distinct destination ports.

Formal Specification

Implement:

logs: a list of strings, where each string has the format:
"<timestamp> <source_ip> <destination_port>"
window_seconds: integer window size in seconds
threshold: integer minimum number of distinct destination ports in the window

Return a list of suspicious source IPs in lexicographic order.

Each timestamp is a non-negative integer. Multiple log lines may share the same timestamp. If a line is malformed, ignore it.

Examples

Example 1

Input:
logs = [
  "1 10.0.0.1 22",
  "2 10.0.0.1 80",
  "3 10.0.0.1 443",
  "4 10.0.0.2 22"
], window_seconds = 3, threshold = 3
Output:
["10.0.0.1"]

10.0.0.1 touches 3 distinct ports within timestamps 1..3.

Example 2

Input:
logs = ["1 1.1.1.1 80", "10 1.1.1.1 443", "20 1.1.1.1 8080"], window_seconds = 5, threshold = 2
Output:
[]

No 5-second window contains 2 distinct ports.

Constraints

1 <= len(logs) <= 2 * 10^5
1 <= window_seconds <= 10^6
1 <= threshold <= 65535
1 <= destination_port <= 65535
Total solution should be efficient enough for large logs

Detect Port Scan Sources

Medium

Coding

Problem

An IP is considered suspicious if, within any sliding time window of window_seconds, it attempts connections to at least threshold distinct destination ports.

Formal Specification

Implement:

logs: a list of strings, where each string has the format:
"<timestamp> <source_ip> <destination_port>"
window_seconds: integer window size in seconds
threshold: integer minimum number of distinct destination ports in the window

Return a list of suspicious source IPs in lexicographic order.

Each timestamp is a non-negative integer. Multiple log lines may share the same timestamp. If a line is malformed, ignore it.

Examples

Example 1

Input:
logs = [
  "1 10.0.0.1 22",
  "2 10.0.0.1 80",
  "3 10.0.0.1 443",
  "4 10.0.0.2 22"
], window_seconds = 3, threshold = 3
Output:
["10.0.0.1"]

10.0.0.1 touches 3 distinct ports within timestamps 1..3.

Example 2

Input:
logs = ["1 1.1.1.1 80", "10 1.1.1.1 443", "20 1.1.1.1 8080"], window_seconds = 5, threshold = 2
Output:
[]

No 5-second window contains 2 distinct ports.

Constraints

1 <= len(logs) <= 2 * 10^5
1 <= window_seconds <= 10^6
1 <= threshold <= 65535
1 <= destination_port <= 65535
Total solution should be efficient enough for large logs

Python 3.10