Clean Dataset Records

Problem

A data team at Acme Analytics stores raw dataset rows as Python dictionaries. Write a function to clean and preprocess these records by normalizing text fields, replacing missing values, and removing duplicate rows.

Task

Given a list of records, where each record is a dictionary with string keys and values that may be strings, numbers, or None, return a new list of cleaned records.

A record should be cleaned using these rules:

Trim leading and trailing spaces from all string values.
Convert empty strings (after trimming) to "UNKNOWN".
Replace None values with "UNKNOWN".
Convert all string values to lowercase.
Remove duplicate records after cleaning. Two records are duplicates if they contain the same key-value pairs.
Preserve the order of the first occurrence of each unique cleaned record.

Input/Output

Input: records, a list of dictionaries.
Output: A list of cleaned dictionaries.

Examples

Example 1 Input: records = [{"name": " Alice ", "city": " New York ", "age": 30}, {"name": "alice", "city": "new york", "age": 30}] Output: [{"name": "alice", "city": "new york", "age": 30}] Explanation: After trimming and lowercasing, both records become identical, so only the first is kept.

Example 2 Input: records = [{"name": " ", "city": null}, {"name": "Bob", "city": " Seattle "}] Output: [{"name": "unknown", "city": "unknown"}, {"name": "bob", "city": "seattle"}] Explanation: Blank strings and None are replaced, and strings are normalized.

Constraints

1 <= len(records) <= 10^4
Each record has 1 to 20 keys
Keys are non-empty strings
String values have length at most 200

Problem

Task

Given a list of records, where each record is a dictionary with string keys and values that may be strings, numbers, or None, return a new list of cleaned records.

A record should be cleaned using these rules:

Trim leading and trailing spaces from all string values.
Convert empty strings (after trimming) to "UNKNOWN".
Replace None values with "UNKNOWN".
Convert all string values to lowercase.
Remove duplicate records after cleaning. Two records are duplicates if they contain the same key-value pairs.
Preserve the order of the first occurrence of each unique cleaned record.

Input/Output

Input: records, a list of dictionaries.
Output: A list of cleaned dictionaries.

Examples

Constraints

1 <= len(records) <= 10^4
Each record has 1 to 20 keys
Keys are non-empty strings
String values have length at most 200

Problem

Task

Given a list of records, where each record is a dictionary with string keys and values that may be strings, numbers, or None, return a new list of cleaned records.

A record should be cleaned using these rules:

Trim leading and trailing spaces from all string values.
Convert empty strings (after trimming) to "UNKNOWN".
Replace None values with "UNKNOWN".
Convert all string values to lowercase.
Remove duplicate records after cleaning. Two records are duplicates if they contain the same key-value pairs.
Preserve the order of the first occurrence of each unique cleaned record.

Input/Output

Input: records, a list of dictionaries.
Output: A list of cleaned dictionaries.

Examples

Constraints

1 <= len(records) <= 10^4
Each record has 1 to 20 keys
Keys are non-empty strings
String values have length at most 200

Problem

Task

Given a list of records, where each record is a dictionary with string keys and values that may be strings, numbers, or None, return a new list of cleaned records.

A record should be cleaned using these rules:

Trim leading and trailing spaces from all string values.
Convert empty strings (after trimming) to "UNKNOWN".
Replace None values with "UNKNOWN".
Convert all string values to lowercase.
Remove duplicate records after cleaning. Two records are duplicates if they contain the same key-value pairs.
Preserve the order of the first occurrence of each unique cleaned record.

Interview Guides

Problem

Task

Input/Output

Examples

Constraints

Clean Dataset Records

Problem

Task

Input/Output

Examples

Constraints

Clean Dataset Records

Problem

Task

Input/Output

Examples

Constraints

Clean Dataset Records

Problem

Task

Input/Output

Examples

Constraints