Aggregate Databricks Log Metrics

Problem

In a Databricks operations script, you need to process exported log lines that may be either JSON or CSV, keep only valid records for a target service, and compute summary metrics.

Write a function that reads a list of log lines and returns aggregated metrics for records that match the requested service and have status >= min_status.

Formal Specification

Implement aggregate_log_metrics(lines, service, min_status).

lines: list of strings, where each string is either:
- JSON object with keys timestamp, service, status, latency_ms
- CSV line in the format timestamp,service,status,latency_ms
service: string target service name
min_status: integer threshold

Return a dictionary with:

count: number of matching valid records
total_latency: sum of latency_ms
avg_latency: floor division average latency, or 0 if no records match
status_counts: dictionary mapping each matching status code to its frequency
latest_timestamp: lexicographically largest timestamp among matching records, or "" if none

Ignore malformed lines, records with missing fields, non-integer status or latency_ms, or negative latency.

Examples

Example 1 Input: lines = ['{"timestamp":"2024-01-01T00:00:00Z","service":"jobs","status":200,"latency_ms":120}', '2024-01-01T00:01:00Z,jobs,500,300', '2024-01-01T00:02:00Z,sql,200,50'], service = 'jobs', min_status = 400 Output: {'count': 1, 'total_latency': 300, 'avg_latency': 300, 'status_counts': {'500': 1}, 'latest_timestamp': '2024-01-01T00:01:00Z'} Explanation: Only the CSV jobs record with status 500 matches the filter.

Example 2 Input: lines = ['bad line', '2024-01-01T00:00:00Z,jobs,200,-5', '{"timestamp":"2024-01-01T00:03:00Z","service":"jobs","status":404,"latency_ms":80}'], service = 'jobs', min_status = 200 Output: {'count': 1, 'total_latency': 80, 'avg_latency': 80, 'status_counts': {'404': 1}, 'latest_timestamp': '2024-01-01T00:03:00Z'} Explanation: Malformed input and negative latency are ignored.

Constraints

1 <= len(lines) <= 10^5
0 <= len(lines[i]) <= 10^4
100 <= min_status <= 599
Timestamps are comparable as strings

Problem

In a Databricks operations script, you need to process exported log lines that may be either JSON or CSV, keep only valid records for a target service, and compute summary metrics.

Write a function that reads a list of log lines and returns aggregated metrics for records that match the requested service and have status >= min_status.

Formal Specification

Implement aggregate_log_metrics(lines, service, min_status).

lines: list of strings, where each string is either:
- JSON object with keys timestamp, service, status, latency_ms
- CSV line in the format timestamp,service,status,latency_ms
service: string target service name
min_status: integer threshold

Return a dictionary with:

count: number of matching valid records
total_latency: sum of latency_ms
avg_latency: floor division average latency, or 0 if no records match
status_counts: dictionary mapping each matching status code to its frequency
latest_timestamp: lexicographically largest timestamp among matching records, or "" if none

Ignore malformed lines, records with missing fields, non-integer status or latency_ms, or negative latency.

Examples

Constraints

1 <= len(lines) <= 10^5
0 <= len(lines[i]) <= 10^4
100 <= min_status <= 599
Timestamps are comparable as strings

Problem

In a Databricks operations script, you need to process exported log lines that may be either JSON or CSV, keep only valid records for a target service, and compute summary metrics.

Write a function that reads a list of log lines and returns aggregated metrics for records that match the requested service and have status >= min_status.

Formal Specification

Implement aggregate_log_metrics(lines, service, min_status).

lines: list of strings, where each string is either:
- JSON object with keys timestamp, service, status, latency_ms
- CSV line in the format timestamp,service,status,latency_ms
service: string target service name
min_status: integer threshold

Return a dictionary with:

count: number of matching valid records
total_latency: sum of latency_ms
avg_latency: floor division average latency, or 0 if no records match
status_counts: dictionary mapping each matching status code to its frequency
latest_timestamp: lexicographically largest timestamp among matching records, or "" if none

Ignore malformed lines, records with missing fields, non-integer status or latency_ms, or negative latency.

Examples

Constraints

1 <= len(lines) <= 10^5
0 <= len(lines[i]) <= 10^4
100 <= min_status <= 599
Timestamps are comparable as strings

Problem

In a Databricks operations script, you need to process exported log lines that may be either JSON or CSV, keep only valid records for a target service, and compute summary metrics.

Write a function that reads a list of log lines and returns aggregated metrics for records that match the requested service and have status >= min_status.

Formal Specification

Implement aggregate_log_metrics(lines, service, min_status).

lines: list of strings, where each string is either:
- JSON object with keys timestamp, service, status, latency_ms
- CSV line in the format timestamp,service,status,latency_ms
service: string target service name
min_status: integer threshold

Return a dictionary with:

count: number of matching valid records
total_latency: sum of latency_ms
avg_latency: floor division average latency, or 0 if no records match
status_counts: dictionary mapping each matching status code to its frequency
latest_timestamp: lexicographically largest timestamp among matching records, or "" if none

Ignore malformed lines, records with missing fields, non-integer status or latency_ms, or negative latency.

Examples

Constraints

1 <= len(lines) <= 10^5
0 <= len(lines[i]) <= 10^4
100 <= min_status <= 599
Timestamps are comparable as strings

Interview Guides

Problem

Formal Specification

Examples

Constraints

Aggregate Databricks Log Metrics

Problem

Formal Specification

Examples

Constraints

Aggregate Databricks Log Metrics

Problem

Formal Specification

Examples

Constraints

Aggregate Databricks Log Metrics

Problem

Formal Specification

Examples

Constraints