evalml.data_checks.LabelLeakageDataCheck.validate¶
-
LabelLeakageDataCheck.
validate
(X, y)[source]¶ Check if any of the features are highly correlated with the target.
Currently only supports binary and numeric targets and features.
- Parameters
X (pd.DataFrame) – The input features to check
y (pd.Series) – The labels
- Returns
list with a DataCheckWarning if there is label leakage detected.
- Return type
list (DataCheckWarning)
Example
>>> X = pd.DataFrame({ ... 'leak': [10, 42, 31, 51, 61], ... 'x': [42, 54, 12, 64, 12], ... 'y': [12, 5, 13, 74, 24], ... }) >>> y = pd.Series([10, 42, 31, 51, 40]) >>> label_leakage_check = LabelLeakageDataCheck(pct_corr_threshold=0.8) >>> assert label_leakage_check.validate(X, y) == [DataCheckWarning("Column 'leak' is 80.0% or more correlated with the target", "LabelLeakageDataCheck")]