You’re on the fraud engineering team at a large fintech processing tens of millions of card transactions per day. A streaming job emits “suspicious transaction IDs” from multiple detectors, and downstream systems must deduplicate these IDs quickly to avoid double-counting fraud alerts (which impacts analyst workload and regulatory reporting).
A teammate shipped the following Python function, but production metrics show both incorrect results and high CPU usage:
def unique_suspicious_ids(ids):
unique = []
for x in ids:
if ids.count(x) == 1: # keep IDs that appear once
unique.append(x)
return unique
Refactor the function into:
def unique_suspicious_ids(ids: list[int]) -> list[int]:
...
n = len(ids).ids.Example 1
ids = [42, 17, 42, 9][17, 9]42 appears twice, so it’s excluded. 17 and 9 appear once and remain in original order.Example 2
ids = [5, 5, 5][]0 <= len(ids) <= 2 * 10^5-10^9 <= ids[i] <= 10^9