HighlyNullDataCheck.
validate
Checks if there are any highly-null columns in the input.
X (ww.DataTable, pd.DataFrame, np.ndarray) – Features
y (ww.DataColumn, pd.Series, np.ndarray) – Ignored.
dict with a DataCheckWarning if there are any highly-null columns.
dict
Example
>>> import pandas as pd >>> df = pd.DataFrame({ ... 'lots_of_null': [None, None, None, None, 5], ... 'no_null': [1, 2, 3, 4, 5] ... }) >>> null_check = HighlyNullDataCheck(pct_null_threshold=0.8) >>> assert null_check.validate(df) == {"errors": [], "warnings": [{"message": "Column 'lots_of_null' is 80.0% or more null", "data_check_name": "HighlyNullDataCheck", "level": "warning", "code": "HIGHLY_NULL", "details": {"column": "lots_of_null"}}]}