OutliersDataCheck.
validate
Checks if there are any outliers in a dataframe by using an Isolation Forest to obtain the anomaly score of each index and then using IQR to determine score anomalies. Indices with score anomalies are considered outliers.
X (pd.DataFrame) – features
y – Ignored.
A set of indices that may have outlier data.
Example
>>> df = pd.DataFrame({ ... 'x': [1, 2, 3, 40, 5], ... 'y': [6, 7, 8, 990, 10], ... 'z': [-1, -2, -3, -1201, -4] ... }) >>> outliers_check = OutliersDataCheck() >>> assert outliers_check.validate(df) == [DataCheckWarning("Row '3' is likely to have outlier data", "OutliersDataCheck")]