Real-Time Chat Event Deduplicator

Problem

You’re building the message rendering pipeline for a large consumer chat app (millions of DAUs). Mobile networks can deliver events out of order, with duplicates, and sometimes with missing sequence numbers due to packet loss. The UI must render messages in a stable order without stalling forever.

Each incoming event is a tuple (msg_id, seq, text):

msg_id is a globally unique identifier for a logical message (retries may resend the same msg_id).
seq is a server-assigned sequence number intended to define the display order.
text is the message payload.

You will ingest events one-by-one. After each ingest, return the list of events that can now be emitted (rendered) immediately.

Rules

Deduplicate by msg_id: if a msg_id was seen before, ignore the event.
First-wins per seq: if multiple different msg_ids arrive with the same seq, keep only the first one ever seen for that seq.
Maintain an internal pointer expected_seq (start at 1).
After each ingest, emit as many buffered events as possible:
- If expected_seq is present in the buffer, emit it and increment expected_seq.
- Otherwise, let smallest_buffered be the smallest seq currently buffered. If smallest_buffered > expected_seq + max_gap, treat expected_seq..smallest_buffered-1 as lost, set expected_seq = smallest_buffered, and continue draining.
- If neither condition holds, stop (wait for more events).

Task

Implement:

def dedupe_and_emit(events: list[tuple[str,int,str]], max_gap: int) -> list[list[tuple[str,int,str]]]

It should return a list where the i-th element is the list of events emitted immediately after ingesting events[i].

Examples

Example 1

Input: max_gap = 2, events = [("a",1,"hi"),("b",3,"are you there?"),("c",2,"hello")]
Output per ingest: [[ ("a",1,"hi") ], [], [ ("c",2,"hello"), ("b",3,"are you there?") ]]
Explanation: seq 3 is buffered until seq 2 arrives; then 2 and 3 drain in order.

Example 2

Input: max_gap = 1, events = [("x",1,"m1"),("y",4,"m4")]
Output per ingest: [[ ("x",1,"m1") ], [ ("y",4,"m4") ]]
Explanation: after receiving seq 4, expected_seq=2 and 4 > 2+1, so we skip 2..3 as lost and emit 4.

Notes

max_gap can be 0, meaning you may skip forward immediately when the next available buffered seq is greater than expected_seq.
Your implementation should be efficient for up to hundreds of thousands of events.

Problem

Each incoming event is a tuple (msg_id, seq, text):

msg_id is a globally unique identifier for a logical message (retries may resend the same msg_id).
seq is a server-assigned sequence number intended to define the display order.
text is the message payload.

You will ingest events one-by-one. After each ingest, return the list of events that can now be emitted (rendered) immediately.

Rules

Deduplicate by msg_id: if a msg_id was seen before, ignore the event.
First-wins per seq: if multiple different msg_ids arrive with the same seq, keep only the first one ever seen for that seq.
Maintain an internal pointer expected_seq (start at 1).
After each ingest, emit as many buffered events as possible:
- If expected_seq is present in the buffer, emit it and increment expected_seq.
- Otherwise, let smallest_buffered be the smallest seq currently buffered. If smallest_buffered > expected_seq + max_gap, treat expected_seq..smallest_buffered-1 as lost, set expected_seq = smallest_buffered, and continue draining.
- If neither condition holds, stop (wait for more events).

Task

Implement:

def dedupe_and_emit(events: list[tuple[str,int,str]], max_gap: int) -> list[list[tuple[str,int,str]]]

It should return a list where the i-th element is the list of events emitted immediately after ingesting events[i].

Examples

Example 1

Input: max_gap = 2, events = [("a",1,"hi"),("b",3,"are you there?"),("c",2,"hello")]
Output per ingest: [[ ("a",1,"hi") ], [], [ ("c",2,"hello"), ("b",3,"are you there?") ]]
Explanation: seq 3 is buffered until seq 2 arrives; then 2 and 3 drain in order.

Example 2

Input: max_gap = 1, events = [("x",1,"m1"),("y",4,"m4")]
Output per ingest: [[ ("x",1,"m1") ], [ ("y",4,"m4") ]]
Explanation: after receiving seq 4, expected_seq=2 and 4 > 2+1, so we skip 2..3 as lost and emit 4.

Notes

max_gap can be 0, meaning you may skip forward immediately when the next available buffered seq is greater than expected_seq.
Your implementation should be efficient for up to hundreds of thousands of events.

Problem

Each incoming event is a tuple (msg_id, seq, text):

msg_id is a globally unique identifier for a logical message (retries may resend the same msg_id).
seq is a server-assigned sequence number intended to define the display order.
text is the message payload.

You will ingest events one-by-one. After each ingest, return the list of events that can now be emitted (rendered) immediately.

Rules

Deduplicate by msg_id: if a msg_id was seen before, ignore the event.
First-wins per seq: if multiple different msg_ids arrive with the same seq, keep only the first one ever seen for that seq.
Maintain an internal pointer expected_seq (start at 1).
After each ingest, emit as many buffered events as possible:
- If expected_seq is present in the buffer, emit it and increment expected_seq.
- Otherwise, let smallest_buffered be the smallest seq currently buffered. If smallest_buffered > expected_seq + max_gap, treat expected_seq..smallest_buffered-1 as lost, set expected_seq = smallest_buffered, and continue draining.
- If neither condition holds, stop (wait for more events).

Task

Implement:

def dedupe_and_emit(events: list[tuple[str,int,str]], max_gap: int) -> list[list[tuple[str,int,str]]]

It should return a list where the i-th element is the list of events emitted immediately after ingesting events[i].

Examples

Example 1

Input: max_gap = 2, events = [("a",1,"hi"),("b",3,"are you there?"),("c",2,"hello")]
Output per ingest: [[ ("a",1,"hi") ], [], [ ("c",2,"hello"), ("b",3,"are you there?") ]]
Explanation: seq 3 is buffered until seq 2 arrives; then 2 and 3 drain in order.

Example 2

Input: max_gap = 1, events = [("x",1,"m1"),("y",4,"m4")]
Output per ingest: [[ ("x",1,"m1") ], [ ("y",4,"m4") ]]
Explanation: after receiving seq 4, expected_seq=2 and 4 > 2+1, so we skip 2..3 as lost and emit 4.

Notes

max_gap can be 0, meaning you may skip forward immediately when the next available buffered seq is greater than expected_seq.
Your implementation should be efficient for up to hundreds of thousands of events.

Problem

Each incoming event is a tuple (msg_id, seq, text):

msg_id is a globally unique identifier for a logical message (retries may resend the same msg_id).
seq is a server-assigned sequence number intended to define the display order.
text is the message payload.

You will ingest events one-by-one. After each ingest, return the list of events that can now be emitted (rendered) immediately.

Rules

Deduplicate by msg_id: if a msg_id was seen before, ignore the event.
First-wins per seq: if multiple different msg_ids arrive with the same seq, keep only the first one ever seen for that seq.
Maintain an internal pointer expected_seq (start at 1).
After each ingest, emit as many buffered events as possible:
- If expected_seq is present in the buffer, emit it and increment expected_seq.
- Otherwise, let smallest_buffered be the smallest seq currently buffered. If smallest_buffered > expected_seq + max_gap, treat expected_seq..smallest_buffered-1 as lost, set expected_seq = smallest_buffered, and continue draining.
- If neither condition holds, stop (wait for more events).

Task

Implement:

def dedupe_and_emit(events: list[tuple[str,int,str]], max_gap: int) -> list[list[tuple[str,int,str]]]

It should return a list where the i-th element is the list of events emitted immediately after ingesting events[i].

Examples

Example 1

Input: max_gap = 2, events = [("a",1,"hi"),("b",3,"are you there?"),("c",2,"hello")]
Output per ingest: [[ ("a",1,"hi") ], [], [ ("c",2,"hello"), ("b",3,"are you there?") ]]
Explanation: seq 3 is buffered until seq 2 arrives; then 2 and 3 drain in order.

Example 2

Input: max_gap = 1, events = [("x",1,"m1"),("y",4,"m4")]
Output per ingest: [[ ("x",1,"m1") ], [ ("y",4,"m4") ]]
Explanation: after receiving seq 4, expected_seq=2 and 4 > 2+1, so we skip 2..3 as lost and emit 4.

Notes

max_gap can be 0, meaning you may skip forward immediately when the next available buffered seq is greater than expected_seq.
Your implementation should be efficient for up to hundreds of thousands of events.

Interview Guides

Problem

Rules

Task

Examples

Notes

Real-Time Chat Event Deduplicator

Problem

Rules

Task

Examples

Notes

Real-Time Chat Event Deduplicator

Problem

Rules

Task

Examples

Notes

Real-Time Chat Event Deduplicator

Problem

Rules

Task

Examples

Notes