Interview Guides

Normalize Agent Evaluation Tags | Dataford Interview Questions - Dataford - Ace your Interview

Normalize Agent Evaluation Tags

Easy

Coding

Asked at 1 company1ArraysStringsHash Tables

Also asked at

Problem

In a Databricks AI pipeline, mlflow.evaluate and LLM-as-Judge outputs may attach free-form issue labels such as faithfulness, groundedness, or prompt injection to each response. Write idiomatic Python to normalize these labels into a clean, deterministic list suitable for downstream use in Mosaic AI and Model Serving logs.

Implement a function that takes a list of raw label strings and returns a sorted list of unique normalized labels.

Formal Specification

Input: labels, a list of strings.
Output: a list of strings where each label is:
1. lowercased,
2. trimmed of leading/trailing whitespace,
3. internal runs of spaces, underscores, or hyphens collapsed into a single hyphen,
4. stripped of all other non-alphanumeric characters except hyphens,
5. deduplicated,
6. returned in ascending lexicographic order.
Ignore labels that become empty after normalization.

Examples

Example 1 Input: labels = [" Faithfulness ", "groundedness", "Prompt Injection", "prompt_injection"] Output: ["faithfulness", "groundedness", "prompt-injection"]

Explanation: Prompt Injection and prompt_injection normalize to the same canonical label.

Example 2 Input: labels = ["DBRX", "dbrx!!", "", " ", "model-serving"] Output: ["dbrx", "model-serving"]

Explanation: Empty values are ignored, punctuation is removed, and duplicates are collapsed.

Constraints

1 <= len(labels) <= 10^4
0 <= len(labels[i]) <= 200
Labels may contain letters, digits, spaces, underscores, hyphens, and punctuation.

Examples

Example 1

Inputlabels = [" Faithfulness ", "groundedness", "Prompt Injection", "prompt_injection"]Output["faithfulness", "groundedness", "prompt-injection"]WhyWhitespace and casing are normalized, and both prompt-injection variants collapse to the same canonical label.

Example 2

Inputlabels = ["DBRX", "dbrx!!", "", " ", "model-serving"]Output["dbrx", "model-serving"]WhyPunctuation is removed, empty strings are ignored, and duplicate normalized values are deduplicated.

Constraints

1 <= len(labels) <= 10^4
0 <= len(labels[i]) <= 200
Each label may contain letters, digits, spaces, underscores, hyphens, and punctuation
Return labels in ascending lexicographic order
Ignore labels that normalize to an empty string

Function Signature

def normalize_labels(labels):

Problem

Implement a function that takes a list of raw label strings and returns a sorted list of unique normalized labels.

Formal Specification

Input: labels, a list of strings.
Output: a list of strings where each label is:
1. lowercased,
2. trimmed of leading/trailing whitespace,
3. internal runs of spaces, underscores, or hyphens collapsed into a single hyphen,
4. stripped of all other non-alphanumeric characters except hyphens,
5. deduplicated,
6. returned in ascending lexicographic order.
Ignore labels that become empty after normalization.

Examples

Example 1 Input: labels = [" Faithfulness ", "groundedness", "Prompt Injection", "prompt_injection"] Output: ["faithfulness", "groundedness", "prompt-injection"]

Explanation: Prompt Injection and prompt_injection normalize to the same canonical label.

Example 2 Input: labels = ["DBRX", "dbrx!!", "", " ", "model-serving"] Output: ["dbrx", "model-serving"]

Explanation: Empty values are ignored, punctuation is removed, and duplicates are collapsed.

Constraints

1 <= len(labels) <= 10^4
0 <= len(labels[i]) <= 200
Labels may contain letters, digits, spaces, underscores, hyphens, and punctuation.

Examples

Example 1

Example 2

Inputlabels = ["DBRX", "dbrx!!", "", " ", "model-serving"]Output["dbrx", "model-serving"]WhyPunctuation is removed, empty strings are ignored, and duplicate normalized values are deduplicated.

Constraints

1 <= len(labels) <= 10^4
0 <= len(labels[i]) <= 200
Each label may contain letters, digits, spaces, underscores, hyphens, and punctuation
Return labels in ascending lexicographic order
Ignore labels that normalize to an empty string

Function Signature

def normalize_labels(labels):

Practice Python

Python 3.10

Open on desktop for the full Python editor with syntax highlighting and autocomplete.

Up next

Tokenize Databricks Agent JSON LogsMedium

Implement a Minimal MLflow Agent Evaluation Runner and Explain Why DatabricksEasy Clean Dataset RecordsEasy

Next question

Normalize Agent Evaluation Tags

Easy

Coding

Asked at 1 company1ArraysStringsHash Tables

Also asked at

Problem

Implement a function that takes a list of raw label strings and returns a sorted list of unique normalized labels.

Formal Specification

Input: labels, a list of strings.
Output: a list of strings where each label is:
1. lowercased,
2. trimmed of leading/trailing whitespace,
3. internal runs of spaces, underscores, or hyphens collapsed into a single hyphen,
4. stripped of all other non-alphanumeric characters except hyphens,
5. deduplicated,
6. returned in ascending lexicographic order.
Ignore labels that become empty after normalization.

Examples

Example 1 Input: labels = [" Faithfulness ", "groundedness", "Prompt Injection", "prompt_injection"] Output: ["faithfulness", "groundedness", "prompt-injection"]

Explanation: Prompt Injection and prompt_injection normalize to the same canonical label.

Example 2 Input: labels = ["DBRX", "dbrx!!", "", " ", "model-serving"] Output: ["dbrx", "model-serving"]

Explanation: Empty values are ignored, punctuation is removed, and duplicates are collapsed.

Constraints

1 <= len(labels) <= 10^4
0 <= len(labels[i]) <= 200
Labels may contain letters, digits, spaces, underscores, hyphens, and punctuation.

Examples

Example 1

Example 2

Inputlabels = ["DBRX", "dbrx!!", "", " ", "model-serving"]Output["dbrx", "model-serving"]WhyPunctuation is removed, empty strings are ignored, and duplicate normalized values are deduplicated.

Constraints

1 <= len(labels) <= 10^4
0 <= len(labels[i]) <= 200
Each label may contain letters, digits, spaces, underscores, hyphens, and punctuation
Return labels in ascending lexicographic order
Ignore labels that normalize to an empty string

Function Signature

def normalize_labels(labels):

Problem

Implement a function that takes a list of raw label strings and returns a sorted list of unique normalized labels.

Formal Specification

Input: labels, a list of strings.
Output: a list of strings where each label is:
1. lowercased,
2. trimmed of leading/trailing whitespace,
3. internal runs of spaces, underscores, or hyphens collapsed into a single hyphen,
4. stripped of all other non-alphanumeric characters except hyphens,
5. deduplicated,
6. returned in ascending lexicographic order.
Ignore labels that become empty after normalization.

Examples

Example 1 Input: labels = [" Faithfulness ", "groundedness", "Prompt Injection", "prompt_injection"] Output: ["faithfulness", "groundedness", "prompt-injection"]

Explanation: Prompt Injection and prompt_injection normalize to the same canonical label.

Example 2 Input: labels = ["DBRX", "dbrx!!", "", " ", "model-serving"] Output: ["dbrx", "model-serving"]

Explanation: Empty values are ignored, punctuation is removed, and duplicates are collapsed.

Constraints

1 <= len(labels) <= 10^4
0 <= len(labels[i]) <= 200
Labels may contain letters, digits, spaces, underscores, hyphens, and punctuation.

Examples

Example 1

Example 2

Inputlabels = ["DBRX", "dbrx!!", "", " ", "model-serving"]Output["dbrx", "model-serving"]WhyPunctuation is removed, empty strings are ignored, and duplicate normalized values are deduplicated.

Constraints

1 <= len(labels) <= 10^4
0 <= len(labels[i]) <= 200
Each label may contain letters, digits, spaces, underscores, hyphens, and punctuation
Return labels in ascending lexicographic order
Ignore labels that normalize to an empty string

Function Signature

def normalize_labels(labels):

Practice Python

Python 3.10

Open on desktop for the full Python editor with syntax highlighting and autocomplete.

Up next

Tokenize Databricks Agent JSON LogsMedium

Implement a Minimal MLflow Agent Evaluation Runner and Explain Why DatabricksEasy Clean Dataset RecordsEasy

Next question