API Reference¶
Demo Datasets¶
Load credit card fraud dataset. |
|
Load wine dataset. |
|
Load breast cancer dataset. |
|
Load diabetes dataset. |
Preprocessing¶
Utilities to preprocess data before using evalml.
Drops rows in X and y when row in the target y has a value of NaN. |
|
Get the label distributions. |
|
Load features and labels from file. |
|
Get the number of features for specific dtypes. |
|
Splits data into train and test sets. |
AutoML¶
AutoML Search Classes¶
Automated Pipeline search. |
AutoML Algorithm Classes¶
Base class for the automl algorithms which power evalml. |
|
An automl algorithm which first fits a base round of pipelines with default parameters, then does a round of parameter tuning on each pipeline in order of performance. |
Pipelines¶
Pipeline Base Classes¶
Base class for all pipelines. |
|
Pipeline subclass for all classification pipelines. |
|
Pipeline subclass for all binary classification pipelines. |
|
Pipeline subclass for all multiclass classification pipelines. |
|
Pipeline subclass for all regression pipelines. |
Classification Pipelines¶
CatBoost Pipeline for binary classification. |
|
CatBoost Pipeline for multiclass classification. |
|
Elastic Net Pipeline for binary classification problems. |
|
Elastic Net Pipeline for multiclass classification problems. |
|
Extra Trees Pipeline for binary classification. |
|
Extra Trees Pipeline for multiclass classification. |
|
Logistic Regression Pipeline for binary classification. |
|
Logistic Regression Pipeline for multiclass classification. |
|
Random Forest Pipeline for binary classification. |
|
Random Forest Pipeline for multiclass classification. |
|
XGBoost Pipeline for binary classification. |
|
XGBoost Pipeline for multiclass classification. |
|
Baseline Pipeline for binary classification. |
|
Baseline Pipeline for multiclass classification. |
|
Mode Baseline Pipeline for binary classification. |
|
Mode Baseline Pipeline for multiclass classification. |
Regression Pipelines¶
Random Forest Pipeline for regression problems. |
|
CatBoost Pipeline for regression problems. |
|
Elastic Net Pipeline for regression problems. |
|
Extra Trees Pipeline for regression problems. |
|
Linear Regression Pipeline for regression problems. |
|
XGBoost Pipeline for regression problems. |
|
Baseline Pipeline for regression problems. |
|
Baseline Pipeline for regression problems. |
Pipeline Utils¶
Returns a complete list of all supported pipeline classes. |
|
Returns the pipelines allowed for a particular problem type. |
|
Returns the estimators allowed for a particular problem type. |
|
Given input data, target data, an estimator class and the problem type, |
|
List model type for a particular problem type. |
Pipeline Graph Utils¶
Given labels and binary classifier predicted probabilities, compute and return the data representing a precision-recall curve. |
|
Generate and display a precision-recall plot. |
|
Given labels and classifier predicted probabilities, compute and return the data representing a Receiver Operating Characteristic (ROC) curve. |
|
Generate and display a Receiver Operating Characteristic (ROC) plot. |
|
Confusion matrix for binary and multiclass classification. |
|
Normalizes a confusion matrix. |
|
Generate and display a confusion matrix plot. |
|
Calculates permutation importance for features. |
|
Generate a bar graph of the pipeline’s permutation importance. |
Components¶
Component Base Classes¶
Components represent a step in a pipeline.
Base class for all components. |
|
A component that may or may not need fitting that transforms data. |
|
A component that fits and predicts given data. |
Transformers¶
Transformers are components that take in data as input and output transformed data.
Drops specified columns in input data. |
|
Selects specified columns in input data. |
|
One-hot encoder to encode non-numeric data. |
|
Imputes missing data according to a specified imputation strategy per column |
|
Imputes missing data according to a specified imputation strategy. |
|
Standardize features: removes mean and scales to unit variance. |
|
Selects top features based on importance weights using a Random Forest regressor. |
|
Selects top features based on importance weights using a Random Forest classifier. |
Estimators¶
Classifiers¶
Classifiers are components that output a predicted class label.
CatBoost Classifier, a classifier that uses gradient-boosting on decision trees. |
|
Elastic Net Classifier. |
|
Extra Trees Classifier. |
|
Random Forest Classifier. |
|
Logistic Regression Classifier. |
|
XGBoost Classifier. |
|
Classifier that predicts using the specified strategy. |
Regressors¶
Regressors are components that output a predicted target value.
CatBoost Regressor, a regressor that uses gradient-boosting on decision trees. |
|
Elastic Net Regressor. |
|
Linear Regressor. |
|
Extra Trees Regressor. |
|
Random Forest Regressor. |
|
XGBoost Regressor. |
|
Regressor that predicts using the specified strategy. |
Objective Functions¶
Objective Base Classes¶
Base class for all objectives. |
|
Base class for all binary classification objectives. |
|
Base class for all multiclass classification objectives. |
|
Base class for all regression objectives. |
Domain-Specific Objectives¶
Score the percentage of money lost of the total transaction amount process due to fraud. |
|
Lead scoring. |
Classification Objectives¶
Accuracy score for binary classification. |
|
Accuracy score for multiclass classification. |
|
AUC score for binary classification. |
|
AUC score for multiclass classification using macro averaging. |
|
AUC score for multiclass classification using micro averaging. |
|
AUC Score for multiclass classification using weighted averaging. |
|
Balanced accuracy score for binary classification. |
|
Balanced accuracy score for multiclass classification. |
|
F1 score for binary classification. |
|
F1 score for multiclass classification using micro averaging. |
|
F1 score for multiclass classification using macro averaging. |
|
F1 score for multiclass classification using weighted averaging. |
|
Log Loss for binary classification. |
|
Log Loss for multiclass classification. |
|
Matthews correlation coefficient for binary classification. |
|
Matthews correlation coefficient for multiclass classification. |
|
Precision score for binary classification. |
|
Precision score for multiclass classification using micro averaging. |
|
Precision score for multiclass classification using macro averaging. |
|
Precision score for multiclass classification using weighted averaging. |
|
Recall score for binary classification. |
|
Recall score for multiclass classification using micro averaging. |
|
Recall score for multiclass classification using macro averaging. |
|
Recall score for multiclass classification using weighted averaging. |
Regression Objectives¶
Coefficient of determination for regression. |
|
Mean absolute error for regression. |
|
Mean squared error for regression. |
|
Mean squared log error for regression. |
|
Median absolute error for regression. |
|
Maximum residual error for regression. |
|
Explained variance score for regression. |
|
Root mean squared error for regression. |
|
Root mean squared log error for regression. |
Problem Types¶
Enum for type of machine learning problem: BINARY, MULTICLASS, or REGRESSION. |
Handles problem_type by either returning the ProblemTypes or converting from a str. |
Model Family¶
Enum for family of machine learning models. |
Tuners¶
Defines API for Tuners. |
|
Bayesian Optimizer. |
|
Grid Search Optimizer. |
|
Random Search Optimizer. |
Data Checks¶
Data Check Classes¶
Base class for all data checks. |
|
Checks if the target labels contain missing or invalid data. |
|
Checks if there are any highly-null columns in the input. |
|
Check if any of the features are likely to be ID columns. |
|
Check if any of the features are highly correlated with the target. |
|
Checks if there are any outliers in input data by using an Isolation Forest to obtain the anomaly score of each index and then using IQR to determine score anomalies. |
A collection of data checks. |
|
A collection of basic data checks that is used by AutoML by default. |
Data Check Messages¶
Base class for all DataCheckMessages. |
|
DataCheckMessage subclass for errors returned by data checks. |
|
DataCheckMessage subclass for warnings returned by data checks. |
Data Check Message Types¶
Enum for type of data check message: WARNING or ERROR. |
Utils¶
Attempts to import the requested library by name. |
|
Converts a string describing a length of time to its length in seconds. |
|
Generates a numpy.random.RandomState instance using seed. |
|
Given a numpy.random.RandomState object, generate an int representing a seed value for another random number generator. |