Canonicalize and Group Stack Traces

Problem

You’re on-call for a fintech payments platform processing millions of card authorizations per day. A React checkout page occasionally crashes, and your error pipeline receives huge volumes of stack traces from different CDNs and builds (cache-busting query strings). To triage quickly, you need to deduplicate traces that are “the same” after normalization.

Each stack trace is a list of frame strings in this format:

<function>@<url>:<line>:<column>

Example frame: render@https://cdn.app.com/static/js/main.9f3a.js?build=123:120:9

Canonicalization rules

For each frame, convert it to:

<function>@<file_name>:<line>

Where:

Ignore the host and path; keep only the last path segment (file_name).
Remove any query string (anything after ?) before extracting the file name.
Keep the line number.
Ignore the column number.
Within a single trace, collapse consecutive duplicate canonical frames (e.g., A, A, A, B → A, B).

Task

Implement:

def group_stack_traces(traces: list[list[str]]) -> dict[str, int]:

Return a dictionary mapping a stringified tuple of canonical frames to the number of traces that normalize to that exact canonical sequence.

Note: The key must be the Python str() of the tuple (e.g., "('a@app.js:1', 'b@app.js:2')") to match the evaluation harness.

Examples

Example 1

Input traces = [["render@https://cdn.app.com/static/js/main.9f3a.js?build=123:120:9", "render@https://cdn.app.com/static/js/main.9f3a.js?build=123:120:9", "commitRoot@https://cdn.app.com/static/js/vendor.1a2b.js:88:1"], ["render@https://cdn.app.com/static/js/main.9f3a.js?build=999:120:3", "commitRoot@https://cdn.app.com/static/js/vendor.1a2b.js?x=y:88:99"]]

Output {"('render@main.9f3a.js:120', 'commitRoot@vendor.1a2b.js:88')": 2}

Explanation Both traces canonicalize to the same 2-frame sequence; the repeated render frame in the first trace is collapsed.

Example 2

Input traces = [["A@https://a.com/app.js:10:1", "B@https://a.com/app.js:11:1"], ["A@https://a.com/app.js:10:9", "B@https://a.com/app.js:12:1"]]

Output {"('A@app.js:10', 'B@app.js:11')": 1, "('A@app.js:10', 'B@app.js:12')": 1}

Explanation Columns are ignored, but line numbers are kept, so the second frame differs (11 vs 12).

Constraints

1 <= len(traces) <= 2 * 10^4
1 <= len(trace) <= 200
sum(len(trace) for trace in traces) <= 2 * 10^6
Each frame contains exactly one '@' and at least two ':' after the URL

Problem

Each stack trace is a list of frame strings in this format:

<function>@<url>:<line>:<column>

Example frame: render@https://cdn.app.com/static/js/main.9f3a.js?build=123:120:9

Canonicalization rules

For each frame, convert it to:

<function>@<file_name>:<line>

Where:

Ignore the host and path; keep only the last path segment (file_name).
Remove any query string (anything after ?) before extracting the file name.
Keep the line number.
Ignore the column number.
Within a single trace, collapse consecutive duplicate canonical frames (e.g., A, A, A, B → A, B).

Task

Implement:

def group_stack_traces(traces: list[list[str]]) -> dict[str, int]:

Return a dictionary mapping a stringified tuple of canonical frames to the number of traces that normalize to that exact canonical sequence.

Note: The key must be the Python str() of the tuple (e.g., "('a@app.js:1', 'b@app.js:2')") to match the evaluation harness.

Examples

Example 1

Output {"('render@main.9f3a.js:120', 'commitRoot@vendor.1a2b.js:88')": 2}

Explanation Both traces canonicalize to the same 2-frame sequence; the repeated render frame in the first trace is collapsed.

Example 2

Input traces = [["A@https://a.com/app.js:10:1", "B@https://a.com/app.js:11:1"], ["A@https://a.com/app.js:10:9", "B@https://a.com/app.js:12:1"]]

Output {"('A@app.js:10', 'B@app.js:11')": 1, "('A@app.js:10', 'B@app.js:12')": 1}

Explanation Columns are ignored, but line numbers are kept, so the second frame differs (11 vs 12).

Constraints

1 <= len(traces) <= 2 * 10^4
1 <= len(trace) <= 200
sum(len(trace) for trace in traces) <= 2 * 10^6
Each frame contains exactly one '@' and at least two ':' after the URL

Problem

Each stack trace is a list of frame strings in this format:

<function>@<url>:<line>:<column>

Example frame: render@https://cdn.app.com/static/js/main.9f3a.js?build=123:120:9

Canonicalization rules

For each frame, convert it to:

<function>@<file_name>:<line>

Where:

Ignore the host and path; keep only the last path segment (file_name).
Remove any query string (anything after ?) before extracting the file name.
Keep the line number.
Ignore the column number.
Within a single trace, collapse consecutive duplicate canonical frames (e.g., A, A, A, B → A, B).

Task

Implement:

def group_stack_traces(traces: list[list[str]]) -> dict[str, int]:

Return a dictionary mapping a stringified tuple of canonical frames to the number of traces that normalize to that exact canonical sequence.

Note: The key must be the Python str() of the tuple (e.g., "('a@app.js:1', 'b@app.js:2')") to match the evaluation harness.

Examples

Example 1

Output {"('render@main.9f3a.js:120', 'commitRoot@vendor.1a2b.js:88')": 2}

Explanation Both traces canonicalize to the same 2-frame sequence; the repeated render frame in the first trace is collapsed.

Example 2

Input traces = [["A@https://a.com/app.js:10:1", "B@https://a.com/app.js:11:1"], ["A@https://a.com/app.js:10:9", "B@https://a.com/app.js:12:1"]]

Output {"('A@app.js:10', 'B@app.js:11')": 1, "('A@app.js:10', 'B@app.js:12')": 1}

Explanation Columns are ignored, but line numbers are kept, so the second frame differs (11 vs 12).

Constraints

1 <= len(traces) <= 2 * 10^4
1 <= len(trace) <= 200
sum(len(trace) for trace in traces) <= 2 * 10^6
Each frame contains exactly one '@' and at least two ':' after the URL

Problem

Each stack trace is a list of frame strings in this format:

<function>@<url>:<line>:<column>

Example frame: render@https://cdn.app.com/static/js/main.9f3a.js?build=123:120:9

Canonicalization rules

For each frame, convert it to:

<function>@<file_name>:<line>

Where:

Ignore the host and path; keep only the last path segment (file_name).
Remove any query string (anything after ?) before extracting the file name.
Keep the line number.
Ignore the column number.
Within a single trace, collapse consecutive duplicate canonical frames (e.g., A, A, A, B → A, B).

Task

Implement:

def group_stack_traces(traces: list[list[str]]) -> dict[str, int]:

Return a dictionary mapping a stringified tuple of canonical frames to the number of traces that normalize to that exact canonical sequence.

Note: The key must be the Python str() of the tuple (e.g., "('a@app.js:1', 'b@app.js:2')") to match the evaluation harness.

Examples

Example 1

Output {"('render@main.9f3a.js:120', 'commitRoot@vendor.1a2b.js:88')": 2}

Explanation Both traces canonicalize to the same 2-frame sequence; the repeated render frame in the first trace is collapsed.

Example 2

Input traces = [["A@https://a.com/app.js:10:1", "B@https://a.com/app.js:11:1"], ["A@https://a.com/app.js:10:9", "B@https://a.com/app.js:12:1"]]

Output {"('A@app.js:10', 'B@app.js:11')": 1, "('A@app.js:10', 'B@app.js:12')": 1}

Explanation Columns are ignored, but line numbers are kept, so the second frame differs (11 vs 12).

Constraints

1 <= len(traces) <= 2 * 10^4
1 <= len(trace) <= 200
sum(len(trace) for trace in traces) <= 2 * 10^6
Each frame contains exactly one '@' and at least two ':' after the URL

Interview Guides

Problem

Canonicalization rules

Task

Examples

Example 1

Example 2

Constraints

Canonicalize and Group Stack Traces

Problem

Canonicalization rules

Task

Examples

Example 1

Example 2

Constraints

Canonicalize and Group Stack Traces

Problem

Canonicalization rules

Task

Examples

Example 1

Example 2

Constraints

Canonicalize and Group Stack Traces

Problem

Canonicalization rules

Task

Examples

Example 1

Example 2

Constraints