Dataford
Interview Guides
Upgrade
All questions/Coding/Clean and Summarize Event Records

Clean and Summarize Event Records

Easy
Coding
ArraysHash TablesSorting

Problem

Problem

A data team at Acme Analytics receives raw event records as Python dictionaries. Implement a function that cleans these records and returns a per-user summary suitable for analysis.

Given a list of event dictionaries, process each record as follows:

  1. Ignore records missing user_id, event, or value.
  2. Normalize event by trimming whitespace and converting to lowercase.
  3. Keep only records where normalized event is one of "click", "view", or "purchase".
  4. Convert value to an integer if possible; otherwise ignore the record.
  5. Aggregate valid records by user_id.

For each user, return:

  • user_id
  • total_value: sum of valid values
  • event_counts: counts for each valid event type that appeared

Return the final result as a list of dictionaries sorted by:

  1. total_value descending
  2. user_id ascending

Input/Output

  • Input: records, a list of dictionaries
  • Output: a list of summary dictionaries

Examples

Example 1 Input: records = [{"user_id":"u1","event":" Click ","value":"3"},{"user_id":"u1","event":"view","value":2},{"user_id":"u2","event":"purchase","value":"5"}] Output: [{"user_id":"u1","total_value":5,"event_counts":{"click":1,"view":1}},{"user_id":"u2","total_value":5,"event_counts":{"purchase":1}}] Explanation: Both users total 5, so u1 comes first because of ascending user_id.

Example 2 Input: records = [{"user_id":"u1","event":"signup","value":4},{"user_id":"u2","event":"view","value":"bad"}] Output: [] Explanation: Both records are invalid after filtering.

Constraints

  • 1 <= len(records) <= 10^4
  • Each record is a Python dictionary with string keys
  • value may be an integer or a numeric string
  • user_id is compared lexicographically

Examples

Example 1
Inputrecords = [{"user_id":"u1","event":" Click ","value":"3"},{"user_id":"u1","event":"view","value":2},{"user_id":"u2","event":"purchase","value":"5"}]Output[{"user_id":"u1","total_value":5,"event_counts":{"click":1,"view":1}},{"user_id":"u2","total_value":5,"event_counts":{"purchase":1}}]WhyAfter normalization and conversion, all three records are valid. Both users have total value 5, so the tie is broken by ascending `user_id`.
Example 2
Inputrecords = [{"user_id":"u1","event":"signup","value":4},{"user_id":"u2","event":"view","value":"bad"}]Output[]WhyThe first record has an unsupported event and the second has a non-numeric value, so no valid records remain.

Constraints

  • 1 <= len(records) <= 10^4
  • Each record is a dictionary with string keys
  • `event` may contain extra whitespace or mixed case
  • `value` may be an integer or a numeric string
  • Only `click`, `view`, and `purchase` are valid events

Function Signature

def summarize_records(records):

Problem

Problem

A data team at Acme Analytics receives raw event records as Python dictionaries. Implement a function that cleans these records and returns a per-user summary suitable for analysis.

Given a list of event dictionaries, process each record as follows:

  1. Ignore records missing user_id, event, or value.
  2. Normalize event by trimming whitespace and converting to lowercase.
  3. Keep only records where normalized event is one of "click", "view", or "purchase".
  4. Convert value to an integer if possible; otherwise ignore the record.
  5. Aggregate valid records by user_id.

For each user, return:

  • user_id
  • total_value: sum of valid values
  • event_counts: counts for each valid event type that appeared

Return the final result as a list of dictionaries sorted by:

  1. total_value descending
  2. user_id ascending

Input/Output

  • Input: records, a list of dictionaries
  • Output: a list of summary dictionaries

Examples

Example 1 Input: records = [{"user_id":"u1","event":" Click ","value":"3"},{"user_id":"u1","event":"view","value":2},{"user_id":"u2","event":"purchase","value":"5"}] Output: [{"user_id":"u1","total_value":5,"event_counts":{"click":1,"view":1}},{"user_id":"u2","total_value":5,"event_counts":{"purchase":1}}] Explanation: Both users total 5, so u1 comes first because of ascending user_id.

Example 2 Input: records = [{"user_id":"u1","event":"signup","value":4},{"user_id":"u2","event":"view","value":"bad"}] Output: [] Explanation: Both records are invalid after filtering.

Constraints

  • 1 <= len(records) <= 10^4
  • Each record is a Python dictionary with string keys
  • value may be an integer or a numeric string
  • user_id is compared lexicographically

Examples

Example 1
Inputrecords = [{"user_id":"u1","event":" Click ","value":"3"},{"user_id":"u1","event":"view","value":2},{"user_id":"u2","event":"purchase","value":"5"}]Output[{"user_id":"u1","total_value":5,"event_counts":{"click":1,"view":1}},{"user_id":"u2","total_value":5,"event_counts":{"purchase":1}}]WhyAfter normalization and conversion, all three records are valid. Both users have total value 5, so the tie is broken by ascending `user_id`.
Example 2
Inputrecords = [{"user_id":"u1","event":"signup","value":4},{"user_id":"u2","event":"view","value":"bad"}]Output[]WhyThe first record has an unsupported event and the second has a non-numeric value, so no valid records remain.

Constraints

  • 1 <= len(records) <= 10^4
  • Each record is a dictionary with string keys
  • `event` may contain extra whitespace or mixed case
  • `value` may be an integer or a numeric string
  • Only `click`, `view`, and `purchase` are valid events

Function Signature

def summarize_records(records):
Practice Python
Python 3.10
Open on desktop for the full Python editor with syntax highlighting and autocomplete.
Up next
Clean Dataset RecordsEasySummarize Python Sales RecordsEasyAppleOptimize Duplicate Record AggregationMedium
Next question