At Stripe, raw event records often contain missing values represented by None. Implement a function that cleans a list of row records by applying simple missing-data rules per column.
Given records, a 2D list where each inner list is a row, and strategies, a list describing how to handle each column, return a new 2D list with missing values filled. A missing value is any element equal to None.
For each column j, strategies[j] is one of:
"zero": replace missing values with 0"empty": replace missing values with """mode": replace missing values with the most frequent non-missing value in that column; if tied, use the smallest value in sorted order; if the column has no non-missing values, use None"drop": remove any row that contains a missing value in that columnApply all column rules to the original dataset. If a row should be dropped because of any "drop" column, remove it entirely. For remaining rows, fill missing values according to the other strategies.
Example 1
records = [[1, None, "a"], [2, 5, None], [3, None, "a"]], strategies = ["zero", "mode", "empty"][[1, 5, "a"], [2, 5, ""], [3, 5, "a"]]5; column 2 replaces missing with an empty string.Example 2
records = [[1, None], [2, 3], [None, 4]], strategies = ["drop", "zero"][[1, 0], [2, 3]]"drop".1 <= len(records) <= 10^41 <= len(records[0]) <= 50strategiesNone, integers, or strings"mode" column, all non-missing values are mutually comparable