API Reference

Demo Datasets

load_fraud

Load credit card fraud dataset.

load_wine

Load wine dataset.

load_breast_cancer

Load breast cancer dataset.

load_diabetes

Load diabetes dataset.

Preprocessing

Utilities to preprocess data before using evalml.

drop_nan_target_rows

Drops rows in X and y when row in the target y has a value of NaN.

label_distribution

Get the label distributions

load_data

Load features and labels from file.

number_of_features

Get the number of features for specific dtypes

split_data

Splits data into train and test sets.

AutoML

AutoClassificationSearch

Automatic pipeline search class for classification problems

AutoRegressionSearch

Automatic pipeline search for regression problems

Pipelines

Pipeline Base Classes

PipelineBase

Base class for all pipelines.

ClassificationPipeline

Pipeline subclass for all classification pipelines.

BinaryClassificationPipeline

Pipeline subclass for all binary classification pipelines.

MulticlassClassificationPipeline

Pipeline subclass for all multiclass classification pipelines.

RegressionPipeline

Pipeline subclass for all regression pipelines.

Classification Pipelines

CatBoostBinaryClassificationPipeline

CatBoost Pipeline for binary classification.

CatBoostMulticlassClassificationPipeline

CatBoost Pipeline for multiclass classification.

LogisticRegressionBinaryPipeline

Logistic Regression Pipeline for binary classification

LogisticRegressionMulticlassPipeline

Logistic Regression Pipeline for multiclass classification

RFBinaryClassificationPipeline

Random Forest Pipeline for binary classification

RFMulticlassClassificationPipeline

Random Forest Pipeline for multiclass classification

XGBoostBinaryPipeline

XGBoost Pipeline for binary classification

XGBoostMulticlassPipeline

XGBoost Pipeline for multiclass classification

Regression Pipelines

RFRegressionPipeline

Random Forest Pipeline for regression problems

CatBoostRegressionPipeline

CatBoost Pipeline for regression problems.

LinearRegressionPipeline

Linear Regression Pipeline for regression problems

XGBoostRegressionPipeline

XGBoost Pipeline for regression problems

Pipeline Utils

all_pipelines

Returns a complete list of all supported pipeline classes.

get_pipelines

Returns the pipelines allowed for a particular problem type.

list_model_families

List model type for a particular problem type

Pipeline Plot Utils

roc_curve

Receiver Operating Characteristic score for binary classification.

confusion_matrix

Confusion matrix for binary and multiclass classification.

normalize_confusion_matrix

Normalizes a confusion matrix.

Components

Transformers

Encoders

Encoders convert categorical or non-numerical features into numerical features.

OneHotEncoder

One-hot encoder to encode non-numeric data

Imputers

Imputers fill in missing data.

SimpleImputer

Imputes missing data according to a specified imputation strategy

Scalers

Scalers transform and standardize the range of data.

StandardScaler

Standardize features: removes mean and scales to unit variance

Feature Selectors

Feature selectors select a subset of relevant features for the model.

RFRegressorSelectFromModel

Selects top features based on importance weights using a Random Forest regressor

RFClassifierSelectFromModel

Selects top features based on importance weights using a Random Forest classifier

Estimators

Classifiers

Classifiers are models which can be trained to predict a class label from input data.

CatBoostClassifier

CatBoost Classifier, a classifier that uses gradient-boosting on decision trees.

RandomForestClassifier

Random Forest Classifier

LogisticRegressionClassifier

Logistic Regression Classifier

XGBoostClassifier

XGBoost Classifier

Regressors

Regressors are models which can be trained to predict a target value from input data.

CatBoostRegressor

CatBoost Regressor, a regressor that uses gradient-boosting on decision trees.

LinearRegressor

Linear Regressor

RandomForestRegressor

Random Forest Regressor

XGBoostRegressor

XGBoost Regressor

Objective Functions

Domain-Specific Objectives

FraudCost

Score the percentage of money lost of the total transaction amount process due to fraud

LeadScoring

Lead scoring

Classification Objectives

AccuracyBinary

Accuracy score for binary classification

AccuracyMulticlass

Accuracy score for multiclass classification

AUC

AUC score for binary classification

AUCMacro

AUC score for multiclass classification using macro averaging

AUCMicro

AUC score for multiclass classification using micro averaging

AUCWeighted

AUC Score for multiclass classification using weighted averaging

BalancedAccuracyBinary

Balanced accuracy score for binary classification

BalancedAccuracyMulticlass

Balanced accuracy score for multiclass classification

F1

F1 score for binary classification

F1Micro

F1 score for multiclass classification using micro averaging

F1Macro

F1 score for multiclass classification using macro averaging

F1Weighted

F1 score for multiclass classification using weighted averaging

LogLossBinary

Log Loss for binary classification

LogLossMulticlass

Log Loss for multiclass classification

MCCBinary

Matthews correlation coefficient for binary classification

MCCMulticlass

Matthews correlation coefficient for multiclass classification

Precision

Precision score for binary classification

PrecisionMicro

Precision score for multiclass classification using micro averaging

PrecisionMacro

Precision score for multiclass classification using macro averaging

PrecisionWeighted

Precision score for multiclass classification using weighted averaging

Recall

Recall score for binary classification

RecallMicro

Recall score for multiclass classification using micro averaging

RecallMacro

Recall score for multiclass classification using macro averaging

RecallWeighted

Recall score for multiclass classification using weighted averaging

Regression Objectives

R2

Coefficient of determination for regression

MAE

Mean absolute error for regression

MSE

Mean squared error for regression

MedianAE

Median absolute error for regression

MaxError

Maximum residual error for regression

ExpVariance

Explained variance score for regression

Problem Types

ProblemTypes

Enum for type of machine learning problem: BINARY, MULTICLASS, or REGRESSION

handle_problem_types

Handles problem_type by either returning the ProblemTypes or converting from a str

Model Family

ModelFamily

Enum for family of machine learning models.

Tuners

Tuner

Defines API for Tuners

SKOptTuner

Bayesian Optimizer

GridSearchTuner

Grid Search Optimizer

RandomSearchTuner

Random Search Optimizer

Guardrails

detect_highly_null

Checks if there are any highly-null columns in a dataframe.

detect_label_leakage

Check if any of the features are highly correlated with the target.

detect_outliers

Checks if there are any outliers in a dataframe by using first Isolation Forest to obtain the anomaly score of each index and then using IQR to determine score anomalies.

detect_id_columns

Check if any of the features are ID columns.

Utils

import_or_raise

Attempts to import the requested library by name.

convert_to_seconds

get_random_state

Generates a numpy.random.RandomState instance using seed.

get_random_seed

Given a numpy.random.RandomState object, generate an int representing a seed value for another random number generator.