load_fraud
Load credit card fraud dataset.
load_wine
Load wine dataset.
load_breast_cancer
Load breast cancer dataset.
load_diabetes
Load diabetes dataset.
load_churn
Utilities to preprocess data before using evalml.
load_data
Load features and target from file.
drop_nan_target_rows
Drops rows in X and y when row in the target y has a value of NaN.
target_distribution
Get the target distributions.
number_of_features
Get the number of features of each specific dtype in a DataFrame.
split_data
Splits data into train and test sets.
AutoMLSearch
Automated Pipeline search.
get_default_primary_search_objective
Get the default primary search objective for a problem type.
AutoMLAlgorithm
Base class for the automl algorithms which power evalml.
IterativeAlgorithm
An automl algorithm which first fits a base round of pipelines with default parameters, then does a round of parameter tuning on each pipeline in order of performance.
silent_error_callback
No-op.
log_error_callback
Logs the exception thrown as an error.
raise_error_callback
Raises the exception thrown by the AutoMLSearch object.
log_and_save_error_callback
Logs the exception thrown by the AutoMLSearch object as a warning and adds the exception to the ‘errors’ list in AutoMLSearch object results.
raise_and_save_error_callback
Raises the exception thrown by the AutoMLSearch object, logs it as an error, and adds the exception to the ‘errors’ list in AutoMLSearch object results.
PipelineBase
Base class for all pipelines.
ClassificationPipeline
Pipeline subclass for all classification pipelines.
BinaryClassificationPipeline
Pipeline subclass for all binary classification pipelines.
MulticlassClassificationPipeline
Pipeline subclass for all multiclass classification pipelines.
RegressionPipeline
Pipeline subclass for all regression pipelines.
TimeSeriesRegressionPipeline
Pipeline base class for time series regression problems.
BaselineBinaryPipeline
Baseline Pipeline for binary classification.
BaselineMulticlassPipeline
Baseline Pipeline for multiclass classification.
ModeBaselineBinaryPipeline
Mode Baseline Pipeline for binary classification.
ModeBaselineMulticlassPipeline
Mode Baseline Pipeline for multiclass classification.
BaselineRegressionPipeline
Baseline Pipeline for regression problems.
MeanBaselineRegressionPipeline
make_pipeline
Given input data, target data, an estimator class and the problem type,
make_pipeline_from_components
Given a list of component instances and the problem type, an pipeline instance is generated with the component instances.
generate_pipeline_code
Creates and returns a string that contains the Python imports and code required for running the EvalML pipeline.
Components represent a step in a pipeline.
ComponentBase
Base class for all components.
Transformer
A component that may or may not need fitting that transforms data.
Estimator
A component that fits and predicts given data.
allowed_model_families
List the model types allowed for a particular problem type.
get_estimators
Returns the estimators allowed for a particular problem type.
generate_component_code
Creates and returns a string that contains the Python imports and code required for running the EvalML component.
Transformers are components that take in data as input and output transformed data.
DropColumns
Drops specified columns in input data.
SelectColumns
Selects specified columns in input data.
OneHotEncoder
One-hot encoder to encode non-numeric data.
TargetEncoder
Target encoder to encode categorical data
PerColumnImputer
Imputes missing data according to a specified imputation strategy per column
Imputer
Imputes missing data according to a specified imputation strategy.
SimpleImputer
StandardScaler
Standardize features: removes mean and scales to unit variance.
RFRegressorSelectFromModel
Selects top features based on importance weights using a Random Forest regressor.
RFClassifierSelectFromModel
Selects top features based on importance weights using a Random Forest classifier.
DropNullColumns
Transformer to drop features whose percentage of NaN values exceeds a specified threshold
DateTimeFeaturizer
Transformer that can automatically featurize DateTime columns.
TextFeaturizer
Transformer that can automatically featurize text columns.
DelayedFeatureTransformer
Transformer that delayes input features and target variable for time series problems.
Classifiers are components that output a predicted class label.
CatBoostClassifier
CatBoost Classifier, a classifier that uses gradient-boosting on decision trees.
ElasticNetClassifier
Elastic Net Classifier.
ExtraTreesClassifier
Extra Trees Classifier.
RandomForestClassifier
Random Forest Classifier.
LightGBMClassifier
LightGBM Classifier
LogisticRegressionClassifier
Logistic Regression Classifier.
XGBoostClassifier
XGBoost Classifier.
BaselineClassifier
Classifier that predicts using the specified strategy.
StackedEnsembleClassifier
Stacked Ensemble Classifier.
DecisionTreeClassifier
Decision Tree Classifier.
Regressors are components that output a predicted target value.
CatBoostRegressor
CatBoost Regressor, a regressor that uses gradient-boosting on decision trees.
ElasticNetRegressor
Elastic Net Regressor.
LinearRegressor
Linear Regressor.
ExtraTreesRegressor
Extra Trees Regressor.
RandomForestRegressor
Random Forest Regressor.
XGBoostRegressor
XGBoost Regressor.
BaselineRegressor
Regressor that predicts using the specified strategy.
StackedEnsembleRegressor
Stacked Ensemble Regressor.
DecisionTreeRegressor
Decision Tree Regressor.
confusion_matrix
Confusion matrix for binary and multiclass classification.
normalize_confusion_matrix
Normalizes a confusion matrix.
precision_recall_curve
Given labels and binary classifier predicted probabilities, compute and return the data representing a precision-recall curve.
graph_precision_recall_curve
Generate and display a precision-recall plot.
roc_curve
Given labels and classifier predicted probabilities, compute and return the data representing a Receiver Operating Characteristic (ROC) curve.
graph_roc_curve
Generate and display a Receiver Operating Characteristic (ROC) plot for binary and multiclass classification problems.
graph_confusion_matrix
Generate and display a confusion matrix plot.
calculate_permutation_importance
Calculates permutation importance for features.
graph_permutation_importance
Generate a bar graph of the pipeline’s permutation importance.
binary_objective_vs_threshold
Computes objective score as a function of potential binary classification
graph_binary_objective_vs_threshold
Generates a plot graphing objective score vs.
graph_prediction_vs_actual
Generate a scatter plot comparing the true and predicted values.
explain_prediction
Creates table summarizing the top_k positive and top_k negative contributing features to the prediction of a single datapoint.
explain_predictions
Creates a report summarizing the top contributing features for each data point in the input features.
explain_predictions_best_worst
Creates a report summarizing the top contributing features for the best and worst points in the dataset as measured by error to true labels.
ObjectiveBase
Base class for all objectives.
BinaryClassificationObjective
Base class for all binary classification objectives.
MulticlassClassificationObjective
Base class for all multiclass classification objectives.
RegressionObjective
Base class for all regression objectives.
FraudCost
Score the percentage of money lost of the total transaction amount process due to fraud.
LeadScoring
Lead scoring.
CostBenefitMatrix
Score using a cost-benefit matrix.
AccuracyBinary
Accuracy score for binary classification.
AccuracyMulticlass
Accuracy score for multiclass classification.
AUC
AUC score for binary classification.
AUCMacro
AUC score for multiclass classification using macro averaging.
AUCMicro
AUC score for multiclass classification using micro averaging.
AUCWeighted
AUC Score for multiclass classification using weighted averaging.
BalancedAccuracyBinary
Balanced accuracy score for binary classification.
BalancedAccuracyMulticlass
Balanced accuracy score for multiclass classification.
F1
F1 score for binary classification.
F1Micro
F1 score for multiclass classification using micro averaging.
F1Macro
F1 score for multiclass classification using macro averaging.
F1Weighted
F1 score for multiclass classification using weighted averaging.
LogLossBinary
Log Loss for binary classification.
LogLossMulticlass
Log Loss for multiclass classification.
MCCBinary
Matthews correlation coefficient for binary classification.
MCCMulticlass
Matthews correlation coefficient for multiclass classification.
Precision
Precision score for binary classification.
PrecisionMicro
Precision score for multiclass classification using micro averaging.
PrecisionMacro
Precision score for multiclass classification using macro averaging.
PrecisionWeighted
Precision score for multiclass classification using weighted averaging.
Recall
Recall score for binary classification.
RecallMicro
Recall score for multiclass classification using micro averaging.
RecallMacro
Recall score for multiclass classification using macro averaging.
RecallWeighted
Recall score for multiclass classification using weighted averaging.
R2
Coefficient of determination for regression.
MAE
Mean absolute error for regression.
MSE
Mean squared error for regression.
MeanSquaredLogError
Mean squared log error for regression.
MedianAE
Median absolute error for regression.
MaxError
Maximum residual error for regression.
ExpVariance
Explained variance score for regression.
RootMeanSquaredError
Root mean squared error for regression.
RootMeanSquaredLogError
Root mean squared log error for regression.
get_all_objective_names
Get a list of the names of all objectives.
get_core_objectives
Returns all core objective instances associated with the given problem type.
get_core_objective_names
Get a list of all valid core objectives.
get_non_core_objectives
Get non-core objective classes.
get_objective
Returns the Objective class corresponding to a given objective name.
handle_problem_types
Handles problem_type by either returning the ProblemTypes or converting from a str.
detect_problem_type
Determine the type of problem is being solved based on the targets (binary vs multiclass classification, regression)
ProblemTypes
Enum defining the supported types of machine learning problems.
handle_model_family
Handles model_family by either returning the ModelFamily or converting from a string
ModelFamily
Enum for family of machine learning models.
Tuner
Defines API for Tuners.
SKOptTuner
Bayesian Optimizer.
GridSearchTuner
Grid Search Optimizer.
RandomSearchTuner
Random Search Optimizer.
DataCheck
Base class for all data checks.
InvalidTargetDataCheck
Checks if the target data contains missing or invalid values.
HighlyNullDataCheck
Checks if there are any highly-null columns in the input.
IDColumnsDataCheck
Check if any of the features are likely to be ID columns.
TargetLeakageDataCheck
Check if any of the features are highly correlated with the target.
OutliersDataCheck
Checks if there are any outliers in input data by using IQR to determine score anomalies.
NoVarianceDataCheck
Check if the target or any of the features have no variance.
ClassImbalanceDataCheck
Checks if any target labels are imbalanced beyond a threshold.
DataChecks
A collection of data checks.
DefaultDataChecks
A collection of basic data checks that is used by AutoML by default.
DataCheckMessage
Base class for all DataCheckMessages.
DataCheckError
DataCheckMessage subclass for errors returned by data checks.
DataCheckWarning
DataCheckMessage subclass for warnings returned by data checks.
DataCheckMessageType
Enum for type of data check message: WARNING or ERROR.
DataCheckMessageCode
Enum for data check message code.
import_or_raise
Attempts to import the requested library by name.
convert_to_seconds
Converts a string describing a length of time to its length in seconds.
get_random_state
Generates a numpy.random.RandomState instance using seed.
get_random_seed
Given a numpy.random.RandomState object, generate an int representing a seed value for another random number generator.
pad_with_nans
Pad the beginning num_to_pad rows with nans.
drop_rows_with_nans
Drop rows that have any NaNs in both pd_data_1 and pd_data_2.