Interview Guides

Deduplicate Fintech Transaction Logs | Dataford Interview Questions - Dataford - Ace your Interview

Deduplicate Fintech Transaction Logs

Medium

Coding

Problem

You’re on the fraud and reliability team at a high-volume fintech processor that ingests tens of millions of card transactions per day. Due to retries, network partitions, and at-least-once delivery, the same transaction log can be recorded multiple times. Duplicates inflate revenue reporting, trigger false fraud alerts, and create audit issues.

You are given a list of transaction log entries. Each entry is a dictionary with:

tx_id (string): globally unique transaction identifier
timestamp (int): Unix epoch seconds when the log was written
amount_cents (int): transaction amount in cents

A duplicate is defined as an entry with the same tx_id as another entry. When duplicates exist for the same tx_id, you must keep exactly one entry using the following rules:

Keep the entry with the earliest timestamp.
If there is still a tie, keep the entry with the smaller amount_cents.
If there is still a tie, keep the entry that appears earlier in the input list.

Return the deduplicated logs as a list of entries sorted by timestamp ascending, and for ties by tx_id ascending.

Function Task

Implement dedupe_transaction_logs(logs).

Examples

Example 1

Input:
- logs = [ {"tx_id":"A", "timestamp":100, "amount_cents":500}, {"tx_id":"B", "timestamp":101, "amount_cents":700}, {"tx_id":"A", "timestamp": 99, "amount_cents":500} ]
Output:
- [ {"tx_id":"A", "timestamp": 99, "amount_cents":500}, {"tx_id":"B", "timestamp":101, "amount_cents":700} ]
Explanation: tx_id="A" appears twice; keep the earliest timestamp (99). Then sort by timestamp.

Example 2

Input:
- logs = [ {"tx_id":"X", "timestamp":200, "amount_cents":1000}, {"tx_id":"X", "timestamp":200, "amount_cents": 999}, {"tx_id":"Y", "timestamp":199, "amount_cents": 500} ]
Output:
- [ {"tx_id":"Y", "timestamp":199, "amount_cents":500}, {"tx_id":"X", "timestamp":200, "amount_cents": 999} ]
Explanation: For X, timestamps tie, so keep smaller amount (999).

Notes

You may return new dictionaries or the original ones; correctness is based on field values.
Assume all entries have the required keys.

Constraints

1 <= logs.length <= 2 * 10^5
1 <= len(tx_id) <= 64
0 <= timestamp <= 2 * 10^9
-10^9 <= amount_cents <= 10^9

Examples

Example 1

Input

logs = [{"tx_id":"A","timestamp":100,"amount_cents":500},{"tx_id":"B","timestamp":101,"amount_cents":700},{"tx_id":"A","timestamp":99,"amount_cents":500}]

Output[{"tx_id":"A","timestamp":99,"amount_cents":500},{"tx_id":"B","timestamp":101,"amount_cents":700}]WhyTwo entries share tx_id A; keep the one with timestamp 99. Then sort by timestamp.

Example 2

Input

logs = [{"tx_id":"X","timestamp":200,"amount_cents":1000},{"tx_id":"X","timestamp":200,"amount_cents":999},{"tx_id":"Y","timestamp":199,"amount_cents":500}]

Output[{"tx_id":"Y","timestamp":199,"amount_cents":500},{"tx_id":"X","timestamp":200,"amount_cents":999}]WhyFor X, timestamps tie so pick smaller amount (999). Output is sorted by timestamp, then tx_id.

Constraints

1 <= logs.length <= 2 * 10^5
1 <= len(tx_id) <= 64
0 <= timestamp <= 2 * 10^9
-10^9 <= amount_cents <= 10^9

Function Signature

def dedupe_transaction_logs(logs: list[dict]) -> list[dict]:

Problem

You are given a list of transaction log entries. Each entry is a dictionary with:

tx_id (string): globally unique transaction identifier
timestamp (int): Unix epoch seconds when the log was written
amount_cents (int): transaction amount in cents

A duplicate is defined as an entry with the same tx_id as another entry. When duplicates exist for the same tx_id, you must keep exactly one entry using the following rules:

Keep the entry with the earliest timestamp.
If there is still a tie, keep the entry with the smaller amount_cents.
If there is still a tie, keep the entry that appears earlier in the input list.

Return the deduplicated logs as a list of entries sorted by timestamp ascending, and for ties by tx_id ascending.

Function Task

Implement dedupe_transaction_logs(logs).

Examples

Example 1

Input:
- logs = [ {"tx_id":"A", "timestamp":100, "amount_cents":500}, {"tx_id":"B", "timestamp":101, "amount_cents":700}, {"tx_id":"A", "timestamp": 99, "amount_cents":500} ]
Output:
- [ {"tx_id":"A", "timestamp": 99, "amount_cents":500}, {"tx_id":"B", "timestamp":101, "amount_cents":700} ]
Explanation: tx_id="A" appears twice; keep the earliest timestamp (99). Then sort by timestamp.

Example 2

Input:
- logs = [ {"tx_id":"X", "timestamp":200, "amount_cents":1000}, {"tx_id":"X", "timestamp":200, "amount_cents": 999}, {"tx_id":"Y", "timestamp":199, "amount_cents": 500} ]
Output:
- [ {"tx_id":"Y", "timestamp":199, "amount_cents":500}, {"tx_id":"X", "timestamp":200, "amount_cents": 999} ]
Explanation: For X, timestamps tie, so keep smaller amount (999).

Notes

You may return new dictionaries or the original ones; correctness is based on field values.
Assume all entries have the required keys.

Constraints

1 <= logs.length <= 2 * 10^5
1 <= len(tx_id) <= 64
0 <= timestamp <= 2 * 10^9
-10^9 <= amount_cents <= 10^9

Examples

Example 1

Input

logs = [{"tx_id":"A","timestamp":100,"amount_cents":500},{"tx_id":"B","timestamp":101,"amount_cents":700},{"tx_id":"A","timestamp":99,"amount_cents":500}]

Output[{"tx_id":"A","timestamp":99,"amount_cents":500},{"tx_id":"B","timestamp":101,"amount_cents":700}]WhyTwo entries share tx_id A; keep the one with timestamp 99. Then sort by timestamp.

Example 2

Input

logs = [{"tx_id":"X","timestamp":200,"amount_cents":1000},{"tx_id":"X","timestamp":200,"amount_cents":999},{"tx_id":"Y","timestamp":199,"amount_cents":500}]

Constraints

1 <= logs.length <= 2 * 10^5
1 <= len(tx_id) <= 64
0 <= timestamp <= 2 * 10^9
-10^9 <= amount_cents <= 10^9

Function Signature

def dedupe_transaction_logs(logs: list[dict]) -> list[dict]:

Practice Python

Python 3.10

Open on desktop for the full Python editor with syntax highlighting and autocomplete.

Up next

Debug and Optimize Fraud DeduplicationMedium

TRUNCATE vs DELETE vs DROP in FintechMedium Index Strategy for Fintech Ledger QueriesMedium

Next question

Deduplicate Fintech Transaction Logs

Medium

Coding

Problem

You are given a list of transaction log entries. Each entry is a dictionary with:

tx_id (string): globally unique transaction identifier
timestamp (int): Unix epoch seconds when the log was written
amount_cents (int): transaction amount in cents

A duplicate is defined as an entry with the same tx_id as another entry. When duplicates exist for the same tx_id, you must keep exactly one entry using the following rules:

Keep the entry with the earliest timestamp.
If there is still a tie, keep the entry with the smaller amount_cents.
If there is still a tie, keep the entry that appears earlier in the input list.

Return the deduplicated logs as a list of entries sorted by timestamp ascending, and for ties by tx_id ascending.

Function Task

Implement dedupe_transaction_logs(logs).

Examples

Example 1

Input:
- logs = [ {"tx_id":"A", "timestamp":100, "amount_cents":500}, {"tx_id":"B", "timestamp":101, "amount_cents":700}, {"tx_id":"A", "timestamp": 99, "amount_cents":500} ]
Output:
- [ {"tx_id":"A", "timestamp": 99, "amount_cents":500}, {"tx_id":"B", "timestamp":101, "amount_cents":700} ]
Explanation: tx_id="A" appears twice; keep the earliest timestamp (99). Then sort by timestamp.

Example 2

Input:
- logs = [ {"tx_id":"X", "timestamp":200, "amount_cents":1000}, {"tx_id":"X", "timestamp":200, "amount_cents": 999}, {"tx_id":"Y", "timestamp":199, "amount_cents": 500} ]
Output:
- [ {"tx_id":"Y", "timestamp":199, "amount_cents":500}, {"tx_id":"X", "timestamp":200, "amount_cents": 999} ]
Explanation: For X, timestamps tie, so keep smaller amount (999).

Notes

You may return new dictionaries or the original ones; correctness is based on field values.
Assume all entries have the required keys.

Constraints

1 <= logs.length <= 2 * 10^5
1 <= len(tx_id) <= 64
0 <= timestamp <= 2 * 10^9
-10^9 <= amount_cents <= 10^9

Examples

Example 1

Input

logs = [{"tx_id":"A","timestamp":100,"amount_cents":500},{"tx_id":"B","timestamp":101,"amount_cents":700},{"tx_id":"A","timestamp":99,"amount_cents":500}]

Output[{"tx_id":"A","timestamp":99,"amount_cents":500},{"tx_id":"B","timestamp":101,"amount_cents":700}]WhyTwo entries share tx_id A; keep the one with timestamp 99. Then sort by timestamp.

Example 2

Input

logs = [{"tx_id":"X","timestamp":200,"amount_cents":1000},{"tx_id":"X","timestamp":200,"amount_cents":999},{"tx_id":"Y","timestamp":199,"amount_cents":500}]

Constraints

1 <= logs.length <= 2 * 10^5
1 <= len(tx_id) <= 64
0 <= timestamp <= 2 * 10^9
-10^9 <= amount_cents <= 10^9

Function Signature

def dedupe_transaction_logs(logs: list[dict]) -> list[dict]:

Problem

You are given a list of transaction log entries. Each entry is a dictionary with:

tx_id (string): globally unique transaction identifier
timestamp (int): Unix epoch seconds when the log was written
amount_cents (int): transaction amount in cents

A duplicate is defined as an entry with the same tx_id as another entry. When duplicates exist for the same tx_id, you must keep exactly one entry using the following rules:

Keep the entry with the earliest timestamp.
If there is still a tie, keep the entry with the smaller amount_cents.
If there is still a tie, keep the entry that appears earlier in the input list.

Return the deduplicated logs as a list of entries sorted by timestamp ascending, and for ties by tx_id ascending.

Function Task

Implement dedupe_transaction_logs(logs).

Examples

Example 1

Input:
- logs = [ {"tx_id":"A", "timestamp":100, "amount_cents":500}, {"tx_id":"B", "timestamp":101, "amount_cents":700}, {"tx_id":"A", "timestamp": 99, "amount_cents":500} ]
Output:
- [ {"tx_id":"A", "timestamp": 99, "amount_cents":500}, {"tx_id":"B", "timestamp":101, "amount_cents":700} ]
Explanation: tx_id="A" appears twice; keep the earliest timestamp (99). Then sort by timestamp.

Example 2

Input:
- logs = [ {"tx_id":"X", "timestamp":200, "amount_cents":1000}, {"tx_id":"X", "timestamp":200, "amount_cents": 999}, {"tx_id":"Y", "timestamp":199, "amount_cents": 500} ]
Output:
- [ {"tx_id":"Y", "timestamp":199, "amount_cents":500}, {"tx_id":"X", "timestamp":200, "amount_cents": 999} ]
Explanation: For X, timestamps tie, so keep smaller amount (999).

Notes

You may return new dictionaries or the original ones; correctness is based on field values.
Assume all entries have the required keys.

Constraints

1 <= logs.length <= 2 * 10^5
1 <= len(tx_id) <= 64
0 <= timestamp <= 2 * 10^9
-10^9 <= amount_cents <= 10^9

Examples

Example 1

Input

logs = [{"tx_id":"A","timestamp":100,"amount_cents":500},{"tx_id":"B","timestamp":101,"amount_cents":700},{"tx_id":"A","timestamp":99,"amount_cents":500}]

Output[{"tx_id":"A","timestamp":99,"amount_cents":500},{"tx_id":"B","timestamp":101,"amount_cents":700}]WhyTwo entries share tx_id A; keep the one with timestamp 99. Then sort by timestamp.

Example 2

Input

logs = [{"tx_id":"X","timestamp":200,"amount_cents":1000},{"tx_id":"X","timestamp":200,"amount_cents":999},{"tx_id":"Y","timestamp":199,"amount_cents":500}]

Constraints

1 <= logs.length <= 2 * 10^5
1 <= len(tx_id) <= 64
0 <= timestamp <= 2 * 10^9
-10^9 <= amount_cents <= 10^9

Function Signature

def dedupe_transaction_logs(logs: list[dict]) -> list[dict]:

Practice Python

Python 3.10

Open on desktop for the full Python editor with syntax highlighting and autocomplete.

Up next

Debug and Optimize Fraud DeduplicationMedium

TRUNCATE vs DELETE vs DROP in FintechMedium Index Strategy for Fintech Ledger QueriesMedium

Next question