Normalize Agent Evaluation Tags

Problem

In a Databricks AI pipeline, mlflow.evaluate and LLM-as-Judge outputs may attach free-form issue labels such as faithfulness, groundedness, or prompt injection to each response. Write idiomatic Python to normalize these labels into a clean, deterministic list suitable for downstream use in Mosaic AI and Model Serving logs.

Implement a function that takes a list of raw label strings and returns a sorted list of unique normalized labels.

Formal Specification

Input: labels, a list of strings.
Output: a list of strings where each label is:
1. lowercased,
2. trimmed of leading/trailing whitespace,
3. internal runs of spaces, underscores, or hyphens collapsed into a single hyphen,
4. stripped of all other non-alphanumeric characters except hyphens,
5. deduplicated,
6. returned in ascending lexicographic order.
Ignore labels that become empty after normalization.

Examples

Example 1 Input: labels = [" Faithfulness ", "groundedness", "Prompt Injection", "prompt_injection"] Output: ["faithfulness", "groundedness", "prompt-injection"]

Explanation: Prompt Injection and prompt_injection normalize to the same canonical label.

Example 2 Input: labels = ["DBRX", "dbrx!!", "", " ", "model-serving"] Output: ["dbrx", "model-serving"]

Explanation: Empty values are ignored, punctuation is removed, and duplicates are collapsed.

Constraints

1 <= len(labels) <= 10^4
0 <= len(labels[i]) <= 200
Labels may contain letters, digits, spaces, underscores, hyphens, and punctuation.

Problem

Implement a function that takes a list of raw label strings and returns a sorted list of unique normalized labels.

Formal Specification

Input: labels, a list of strings.
Output: a list of strings where each label is:
1. lowercased,
2. trimmed of leading/trailing whitespace,
3. internal runs of spaces, underscores, or hyphens collapsed into a single hyphen,
4. stripped of all other non-alphanumeric characters except hyphens,
5. deduplicated,
6. returned in ascending lexicographic order.
Ignore labels that become empty after normalization.

Examples

Example 1 Input: labels = [" Faithfulness ", "groundedness", "Prompt Injection", "prompt_injection"] Output: ["faithfulness", "groundedness", "prompt-injection"]

Explanation: Prompt Injection and prompt_injection normalize to the same canonical label.

Example 2 Input: labels = ["DBRX", "dbrx!!", "", " ", "model-serving"] Output: ["dbrx", "model-serving"]

Explanation: Empty values are ignored, punctuation is removed, and duplicates are collapsed.

Constraints

1 <= len(labels) <= 10^4
0 <= len(labels[i]) <= 200
Labels may contain letters, digits, spaces, underscores, hyphens, and punctuation.

Problem

Implement a function that takes a list of raw label strings and returns a sorted list of unique normalized labels.

Formal Specification

Input: labels, a list of strings.
Output: a list of strings where each label is:
1. lowercased,
2. trimmed of leading/trailing whitespace,
3. internal runs of spaces, underscores, or hyphens collapsed into a single hyphen,
4. stripped of all other non-alphanumeric characters except hyphens,
5. deduplicated,
6. returned in ascending lexicographic order.
Ignore labels that become empty after normalization.

Examples

Example 1 Input: labels = [" Faithfulness ", "groundedness", "Prompt Injection", "prompt_injection"] Output: ["faithfulness", "groundedness", "prompt-injection"]

Explanation: Prompt Injection and prompt_injection normalize to the same canonical label.

Example 2 Input: labels = ["DBRX", "dbrx!!", "", " ", "model-serving"] Output: ["dbrx", "model-serving"]

Explanation: Empty values are ignored, punctuation is removed, and duplicates are collapsed.

Constraints

1 <= len(labels) <= 10^4
0 <= len(labels[i]) <= 200
Labels may contain letters, digits, spaces, underscores, hyphens, and punctuation.

Problem

Implement a function that takes a list of raw label strings and returns a sorted list of unique normalized labels.

Formal Specification

Input: labels, a list of strings.
Output: a list of strings where each label is:
1. lowercased,
2. trimmed of leading/trailing whitespace,
3. internal runs of spaces, underscores, or hyphens collapsed into a single hyphen,
4. stripped of all other non-alphanumeric characters except hyphens,
5. deduplicated,
6. returned in ascending lexicographic order.
Ignore labels that become empty after normalization.

Examples

Example 1 Input: labels = [" Faithfulness ", "groundedness", "Prompt Injection", "prompt_injection"] Output: ["faithfulness", "groundedness", "prompt-injection"]

Explanation: Prompt Injection and prompt_injection normalize to the same canonical label.

Example 2 Input: labels = ["DBRX", "dbrx!!", "", " ", "model-serving"] Output: ["dbrx", "model-serving"]

Explanation: Empty values are ignored, punctuation is removed, and duplicates are collapsed.

Constraints

1 <= len(labels) <= 10^4
0 <= len(labels[i]) <= 200
Labels may contain letters, digits, spaces, underscores, hyphens, and punctuation.

Interview Guides

Problem

Formal Specification

Examples

Constraints

Normalize Agent Evaluation Tags

Problem

Formal Specification

Examples

Constraints

Normalize Agent Evaluation Tags

Problem

Formal Specification

Examples

Constraints

Normalize Agent Evaluation Tags

Problem

Formal Specification

Examples

Constraints