evalml.guardrails.detect_label_leakage¶
- 
evalml.guardrails.detect_label_leakage(X, y, threshold=0.95)[source]¶
- Check if any of the features are highly correlated with the target. - Currently only supports binary and numeric targets and features - Parameters
- X (pd.DataFrame) – The input features to check 
- y (pd.Series) – the labels 
- threshold (float) – the correlation threshold to be considered leakage. Defaults to .95 
 
- Returns
- leakage, dictionary of features with leakage and corresponding threshold 
 - Example - >>> X = pd.DataFrame({ ... 'leak': [10, 42, 31, 51, 61], ... 'x': [42, 54, 12, 64, 12], ... 'y': [12, 5, 13, 74, 24], ... }) >>> y = pd.Series([10, 42, 31, 51, 40]) >>> detect_label_leakage(X, y, threshold=0.8) {'leak': 0.8827072320669518}