TargetLeakageDataCheck.
validate
Check if any of the features are highly correlated with the target.
Currently only supports binary and numeric targets and features.
X (ww.DataTable, pd.DataFrame, np.ndarray) – The input features to check
y (ww.DataColumn, pd.Series, np.ndarray) – The target data
dict with a DataCheckWarning if target leakage is detected.
dict (DataCheckWarning)
Example
>>> import pandas as pd >>> X = pd.DataFrame({ ... 'leak': [10, 42, 31, 51, 61], ... 'x': [42, 54, 12, 64, 12], ... 'y': [12, 5, 13, 74, 24], ... }) >>> y = pd.Series([10, 42, 31, 51, 40]) >>> target_leakage_check = TargetLeakageDataCheck(pct_corr_threshold=0.8) >>> assert target_leakage_check.validate(X, y) == {"warnings": [{"message": "Column 'leak' is 80.0% or more correlated with the target", "data_check_name": "TargetLeakageDataCheck", "level": "warning", "code": "TARGET_LEAKAGE", "details": {"column": "leak"}}], "errors": []}