Automate IOC Log Triage Pipeline

Problem

A fintech security team monitors tens of millions of authentication events per day across mobile and web. Analysts keep repeating the same triage steps: normalize messy logs, deduplicate near-identical events, and quickly identify the most suspicious source IPs. You’re building a small “custom security tool” function that automates this repetitive workflow.

You are given a list of log lines. Each line is a single string formatted as:

"<timestamp> <ip> <action> <status>"

timestamp is an integer (seconds since epoch)
ip is an IPv4 string
action is a lowercase token (e.g., login, reset, transfer)
status is either OK or FAIL

Multiple lines may be duplicated. Lines may also be semantically identical but differ in whitespace/casing for status (e.g., fail, FAIL, extra spaces). Your tool must normalize and aggregate.

Suspicion scoring

For each IP, compute a suspicion score:

score = 3 * (# of FAIL events) + 1 * (# of distinct actions that had at least one FAIL)

Return the top k IPs by descending score. Break ties by:

higher FAIL count, then
lexicographically smaller IP string.

Input / Output

Input: logs: list[str], k: int
Output: list[str] of length min(k, number_of_distinct_ips)

Normalization rules

Trim leading/trailing whitespace; collapse internal runs of whitespace to a single space.
Treat status case-insensitively (fail, FAIL, Fail all mean FAIL).
If a line cannot be parsed into exactly 4 tokens after normalization, ignore it.

Examples

Example 1

Input:
- logs = [ "1700000000 10.0.0.1 login FAIL", "1700000001 10.0.0.1 login fail", "1700000002 10.0.0.2 login FAIL", "1700000003 10.0.0.2 reset OK", "1700000004 10.0.0.2 transfer FAIL" ], k = 2
Output: ['10.0.0.2', '10.0.0.1']
Explanation:
- 10.0.0.1: FAIL=2, distinct failed actions={login} => score=3*2+1=7
- 10.0.0.2: FAIL=2, distinct failed actions={login, transfer} => score=3*2+2=8

Example 2

Input:
- logs = [" 170 x.y.z login FAIL ", "1700000000 1.1.1.1 login OK", "1700000001 1.1.1.1 reset FAIL"], k = 5
Output: ['1.1.1.1']
Explanation:
- The first line is invalid (timestamp not an int and ip not meaningful, but we only enforce token count + timestamp int here; it still fails timestamp parsing), so it’s ignored.
- 1.1.1.1: FAIL=1, distinct failed actions={reset} => score=3*1+1=4

Constraints

1 <= logs.length <= 2 * 10^5
Each log line length <= 200
0 <= k <= 10^5

Notes

Duplicated lines count as separate events (this tool is for aggregation, not deduplication).
You may assume action contains no spaces.
If k = 0, return an empty list.

Problem

You are given a list of log lines. Each line is a single string formatted as:

"<timestamp> <ip> <action> <status>"

timestamp is an integer (seconds since epoch)
ip is an IPv4 string
action is a lowercase token (e.g., login, reset, transfer)
status is either OK or FAIL

Multiple lines may be duplicated. Lines may also be semantically identical but differ in whitespace/casing for status (e.g., fail, FAIL, extra spaces). Your tool must normalize and aggregate.

Suspicion scoring

For each IP, compute a suspicion score:

score = 3 * (# of FAIL events) + 1 * (# of distinct actions that had at least one FAIL)

Return the top k IPs by descending score. Break ties by:

higher FAIL count, then
lexicographically smaller IP string.

Input / Output

Input: logs: list[str], k: int
Output: list[str] of length min(k, number_of_distinct_ips)

Normalization rules

Trim leading/trailing whitespace; collapse internal runs of whitespace to a single space.
Treat status case-insensitively (fail, FAIL, Fail all mean FAIL).
If a line cannot be parsed into exactly 4 tokens after normalization, ignore it.

Examples

Example 1

Input:
- logs = [ "1700000000 10.0.0.1 login FAIL", "1700000001 10.0.0.1 login fail", "1700000002 10.0.0.2 login FAIL", "1700000003 10.0.0.2 reset OK", "1700000004 10.0.0.2 transfer FAIL" ], k = 2
Output: ['10.0.0.2', '10.0.0.1']
Explanation:
- 10.0.0.1: FAIL=2, distinct failed actions={login} => score=3*2+1=7
- 10.0.0.2: FAIL=2, distinct failed actions={login, transfer} => score=3*2+2=8

Example 2

Input:
- logs = [" 170 x.y.z login FAIL ", "1700000000 1.1.1.1 login OK", "1700000001 1.1.1.1 reset FAIL"], k = 5
Output: ['1.1.1.1']
Explanation:
- The first line is invalid (timestamp not an int and ip not meaningful, but we only enforce token count + timestamp int here; it still fails timestamp parsing), so it’s ignored.
- 1.1.1.1: FAIL=1, distinct failed actions={reset} => score=3*1+1=4

Constraints

1 <= logs.length <= 2 * 10^5
Each log line length <= 200
0 <= k <= 10^5

Notes

Duplicated lines count as separate events (this tool is for aggregation, not deduplication).
You may assume action contains no spaces.
If k = 0, return an empty list.

Problem

You are given a list of log lines. Each line is a single string formatted as:

"<timestamp> <ip> <action> <status>"

timestamp is an integer (seconds since epoch)
ip is an IPv4 string
action is a lowercase token (e.g., login, reset, transfer)
status is either OK or FAIL

Multiple lines may be duplicated. Lines may also be semantically identical but differ in whitespace/casing for status (e.g., fail, FAIL, extra spaces). Your tool must normalize and aggregate.

Suspicion scoring

For each IP, compute a suspicion score:

score = 3 * (# of FAIL events) + 1 * (# of distinct actions that had at least one FAIL)

Return the top k IPs by descending score. Break ties by:

higher FAIL count, then
lexicographically smaller IP string.

Input / Output

Input: logs: list[str], k: int
Output: list[str] of length min(k, number_of_distinct_ips)

Normalization rules

Trim leading/trailing whitespace; collapse internal runs of whitespace to a single space.
Treat status case-insensitively (fail, FAIL, Fail all mean FAIL).
If a line cannot be parsed into exactly 4 tokens after normalization, ignore it.

Examples

Example 1

Input:
- logs = [ "1700000000 10.0.0.1 login FAIL", "1700000001 10.0.0.1 login fail", "1700000002 10.0.0.2 login FAIL", "1700000003 10.0.0.2 reset OK", "1700000004 10.0.0.2 transfer FAIL" ], k = 2
Output: ['10.0.0.2', '10.0.0.1']
Explanation:
- 10.0.0.1: FAIL=2, distinct failed actions={login} => score=3*2+1=7
- 10.0.0.2: FAIL=2, distinct failed actions={login, transfer} => score=3*2+2=8

Example 2

Input:
- logs = [" 170 x.y.z login FAIL ", "1700000000 1.1.1.1 login OK", "1700000001 1.1.1.1 reset FAIL"], k = 5
Output: ['1.1.1.1']
Explanation:
- The first line is invalid (timestamp not an int and ip not meaningful, but we only enforce token count + timestamp int here; it still fails timestamp parsing), so it’s ignored.
- 1.1.1.1: FAIL=1, distinct failed actions={reset} => score=3*1+1=4

Constraints

1 <= logs.length <= 2 * 10^5
Each log line length <= 200
0 <= k <= 10^5

Notes

Duplicated lines count as separate events (this tool is for aggregation, not deduplication).
You may assume action contains no spaces.
If k = 0, return an empty list.

Problem

You are given a list of log lines. Each line is a single string formatted as:

"<timestamp> <ip> <action> <status>"

timestamp is an integer (seconds since epoch)
ip is an IPv4 string
action is a lowercase token (e.g., login, reset, transfer)
status is either OK or FAIL

Multiple lines may be duplicated. Lines may also be semantically identical but differ in whitespace/casing for status (e.g., fail, FAIL, extra spaces). Your tool must normalize and aggregate.

Suspicion scoring

For each IP, compute a suspicion score:

score = 3 * (# of FAIL events) + 1 * (# of distinct actions that had at least one FAIL)

Return the top k IPs by descending score. Break ties by:

higher FAIL count, then
lexicographically smaller IP string.

Input / Output

Input: logs: list[str], k: int
Output: list[str] of length min(k, number_of_distinct_ips)

Normalization rules

Trim leading/trailing whitespace; collapse internal runs of whitespace to a single space.
Treat status case-insensitively (fail, FAIL, Fail all mean FAIL).
If a line cannot be parsed into exactly 4 tokens after normalization, ignore it.

Examples

Example 1

Input:
- logs = [ "1700000000 10.0.0.1 login FAIL", "1700000001 10.0.0.1 login fail", "1700000002 10.0.0.2 login FAIL", "1700000003 10.0.0.2 reset OK", "1700000004 10.0.0.2 transfer FAIL" ], k = 2
Output: ['10.0.0.2', '10.0.0.1']
Explanation:
- 10.0.0.1: FAIL=2, distinct failed actions={login} => score=3*2+1=7
- 10.0.0.2: FAIL=2, distinct failed actions={login, transfer} => score=3*2+2=8

Example 2

Input:
- logs = [" 170 x.y.z login FAIL ", "1700000000 1.1.1.1 login OK", "1700000001 1.1.1.1 reset FAIL"], k = 5
Output: ['1.1.1.1']
Explanation:
- The first line is invalid (timestamp not an int and ip not meaningful, but we only enforce token count + timestamp int here; it still fails timestamp parsing), so it’s ignored.
- 1.1.1.1: FAIL=1, distinct failed actions={reset} => score=3*1+1=4

Constraints

1 <= logs.length <= 2 * 10^5
Each log line length <= 200
0 <= k <= 10^5

Notes

Duplicated lines count as separate events (this tool is for aggregation, not deduplication).
You may assume action contains no spaces.
If k = 0, return an empty list.

Interview Guides

Problem

Suspicion scoring

Input / Output

Normalization rules

Examples

Constraints

Notes

Automate IOC Log Triage Pipeline

Problem

Suspicion scoring

Input / Output

Normalization rules

Examples

Constraints

Notes

Automate IOC Log Triage Pipeline

Problem

Suspicion scoring

Input / Output

Normalization rules

Examples

Constraints

Notes

Automate IOC Log Triage Pipeline

Problem

Suspicion scoring

Input / Output

Normalization rules

Examples

Constraints

Notes