You’re on-call for a fintech payments platform processing millions of API calls per minute. During an incident, you need to compute dashboard metrics from a raw log stream to quickly identify whether failures are spiking and which endpoints are slowing down.
Each log line is a space-separated record:
<timestamp_ms> <service> <endpoint> <status_code> <latency_ms>
Example:
"1500 payments /refund 200 300"
Malformed lines (wrong token count or non-integer numeric fields) must be ignored.
Implement extract_metrics(lines, window_start_ms, window_end_ms, k) that considers only valid lines whose timestamp_ms is within the inclusive window [window_start_ms, window_end_ms] and returns a dictionary with:
total_requests: number of valid requests in the windowerror_rate: fraction of windowed requests with status_code >= 500p95_latency_ms: the nearest-rank 95th percentile latency among windowed requests
m = total_requests. Sort latencies ascending.rank = ceil(0.95 * m) (1-indexed), and p95 is the element at index rank - 1.top_k_slowest_endpoints: list of up to k endpoints with the highest average latency (average over windowed requests for that endpoint)
If total_requests == 0, return zeros and an empty list.
Example 1
lines = ["900 payments /charge 200 10", "1000 payments /charge 500 100", "1500 payments /refund 200 300", "1800 payments /charge 200 50", "2100 payments /charge 200 999"], window_start_ms = 1000, window_end_ms = 2000, k = 2{'total_requests': 3, 'error_rate': 0.3333333333333333, 'p95_latency_ms': 300, 'top_k_slowest_endpoints': ['/refund', '/charge']}[100, 300, 50] → sorted [50, 100, 300] → nearest-rank p95 is 300. Endpoint averages: /refund=300, /charge=(100+50)/2=75.Example 2
lines = ["1000 payments /charge 200 10", "oops", "1200 payments /charge two 10"], window_start_ms = 1000, window_end_ms = 1300, k = 5{'total_requests': 1, 'error_rate': 0.0, 'p95_latency_ms': 10, 'top_k_slowest_endpoints': ['/charge']}1 <= len(lines) <= 2 * 10^50 <= window_start_ms <= window_end_ms <= 10^130 <= status_code <= 9990 <= latency_ms <= 10^71 <= k <= 100top_k_slowest_endpoints is part of the contract: if two endpoints have the same average latency, return the lexicographically smaller endpoint first.