In Databricks, unit-testing a Spark transformation often means verifying that an output collection of rows matches the expected result even when row order is not guaranteed. Implement a Python function that compares two datasets and reports whether they are equivalent after applying deterministic normalization rules.
Write a function assert_spark_rows_equal(actual_rows, expected_rows, key_columns).
actual_rows: list of dictionaries representing transformed output rowsexpected_rows: list of dictionaries representing expected rowskey_columns: list of column names used to sort rows deterministically before comparisonThe function should return True if:
key_columnsNormalization rules:
NoneExample 1
Input: actual_rows = [{"id": 2, "value": "b"}, {"id": 1, "value": "a"}], expected_rows = [{"id": 1, "value": "a"}, {"id": 2, "value": "b"}], key_columns = ["id"]
Output: True
Explanation: The rows are identical after sorting by id.
Example 2
Input: actual_rows = [{"id": 1, "value": "a"}], expected_rows = [{"id": 1, "value": "x"}], key_columns = ["id"]
Output: False
Explanation: The value field differs.
0 <= len(actual_rows), len(expected_rows) <= 10^40 <= len(key_columns) <= 10key_columns are strings