logo
  • Install
  • Start
  • Tutorials
  • User Guide
  • API Reference
  • Release Notes
  • Automated Machine Learning (AutoML) Search
  • Objective Functions
  • Components
  • Pipelines
  • Model Understanding
  • Data Checks
  • Utilities
  • FAQ
On this page
  • Background
    • Machine Learning
    • AutoML and Search
  • AutoML in EvalML
    • Detecting Problem Type
    • Objective parameter
    • Using custom pipelines
    • Stopping the search early
  • View Rankings
  • Describe Pipeline
  • Get Pipeline
    • Get best pipeline
  • Access raw results

Automated Machine Learning (AutoML) Search¶

Background¶

Machine Learning¶

Machine learning (ML) is the process of constructing a mathematical model of a system based on a sample dataset collected from that system.

One of the main goals of training an ML model is to teach the model to separate the signal present in the data from the noise inherent in system and in the data collection process. If this is done effectively, the model can then be used to make accurate predictions about the system when presented with new, similar data. Additionally, introspecting on an ML model can reveal key information about the system being modeled, such as which inputs and transformations of the inputs are most useful to the ML model for learning the signal in the data, and are therefore the most predictive.

There are a variety of ML problem types. Supervised learning describes the case where the collected data contains an output value to be modeled and a set of inputs with which to train the model. EvalML focuses on training supervised learning models.

EvalML supports three common supervised ML problem types. The first is regression, where the target value to model is a continuous numeric value. Next are binary and multiclass classification, where the target value to model consists of two or more discrete values or categories. The choice of which supervised ML problem type is most appropriate depends on domain expertise and on how the model will be evaluated and used.

AutoML and Search¶

AutoML is the process of automating the construction, training and evaluation of ML models. Given a data and some configuration, AutoML searches for the most effective and accurate ML model or models to fit the dataset. During the search, AutoML will explore different combinations of model type, model parameters and model architecture.

An effective AutoML solution offers several advantages over constructing and tuning ML models by hand. AutoML can assist with many of the difficult aspects of ML, such as avoiding overfitting and underfitting, imbalanced data, detecting data leakage and other potential issues with the problem setup, and automatically applying best-practice data cleaning, feature engineering, feature selection and various modeling techniques. AutoML can also leverage search algorithms to optimally sweep the hyperparameter search space, resulting in model performance which would be difficult to achieve by manual training.

AutoML in EvalML¶

EvalML supports all of the above and more.

In its simplest usage, the AutoML search interface requires only the input data, the target data and a problem_type specifying what kind of supervised ML problem to model.

** Graphing methods, like AutoMLSearch, on Jupyter Notebook and Jupyter Lab require ipywidgets to be installed.

** If graphing on Jupyter Lab, jupyterlab-plotly required. To download this, make sure you have npm installed.

To provide data to EvalML, it is recommended that you create a DataTable object using the Woodwork project.

EvalML also accepts and works well with pandas DataFrames. But using the DataTable makes it easy to control how EvalML will treat each feature, as a numeric feature, a categorical feature, a text feature or other type of feature. Woodwork’s DataTable includes features like inferring when a categorical feature should be treated as a text feature. For this reason, if you don’t provide Woodwork objects, EvalML will raise a warning.

[1]:
import evalml

X, y = evalml.demos.load_breast_cancer()

import woodwork as ww
X_dt = ww.DataTable(X)
y_dc = ww.DataColumn(y)

automl = evalml.automl.AutoMLSearch(problem_type='binary')
automl.search(X_dt, y_dc)
Using default limit of max_iterations=5.

Generating pipelines to search over...
*****************************
* Beginning pipeline search *
*****************************

Optimizing for Log Loss Binary.
Lower score is better.

Searching up to 5 pipelines.
Allowed model families: catboost, decision_tree, lightgbm, linear_model, xgboost, extra_trees, random_forest

Batch 1: (1/5) Mode Baseline Binary Classification P... Elapsed:00:00
        Starting cross validation
        Finished cross validation - mean Log Loss Binary: 12.868
2/5) Decision Tree Classifier w/ Imputer      Elapsed:00:00
        Starting cross validation
        Finished cross validation - mean Log Loss Binary: 1.965
High coefficient of variation (cv >= 0.2) within cross validation scores. Decision Tree Classifier w/ Imputer may not perform as estimated on unseen data.
3/5) LightGBM Classifier w/ Imputer           Elapsed:00:00
        Starting cross validation
[LightGBM] [Warning] bagging_fraction is set=0.9, subsample=1.0 will be ignored. Current value: bagging_fraction=0.9
[LightGBM] [Warning] bagging_freq is set=0, subsample_freq=0 will be ignored. Current value: bagging_freq=0
[LightGBM] [Warning] bagging_fraction is set=0.9, subsample=1.0 will be ignored. Current value: bagging_fraction=0.9
[LightGBM] [Warning] bagging_freq is set=0, subsample_freq=0 will be ignored. Current value: bagging_freq=0
[LightGBM] [Warning] bagging_fraction is set=0.9, subsample=1.0 will be ignored. Current value: bagging_fraction=0.9
[LightGBM] [Warning] bagging_freq is set=0, subsample_freq=0 will be ignored. Current value: bagging_freq=0
        Finished cross validation - mean Log Loss Binary: 0.130
High coefficient of variation (cv >= 0.2) within cross validation scores. LightGBM Classifier w/ Imputer may not perform as estimated on unseen data.
4/5) Extra Trees Classifier w/ Imputer        Elapsed:00:00
        Starting cross validation
        Finished cross validation - mean Log Loss Binary: 0.146
5/5) Elastic Net Classifier w/ Imputer + S... Elapsed:00:02
        Starting cross validation
        Finished cross validation - mean Log Loss Binary: 0.504

Search finished after 00:02
Best pipeline: LightGBM Classifier w/ Imputer
Best pipeline Log Loss Binary: 0.130426

The AutoML search will log its progress, reporting each pipeline and parameter set evaluated during the search.

There are a number of mechanisms to control the AutoML search time. One way is to set the maximum number of candidate models to be evaluated during AutoML using max_iterations. By default, AutoML will search a fixed number of iterations and parameter pairs (max_iterations=5). The first pipeline to be evaluated will always be a baseline model representing a trivial solution.

The AutoML interface supports a variety of other parameters. For a comprehensive list, please refer to the API reference.

Detecting Problem Type¶

EvalML includes a simple method, detect_problem_type, to help determine the problem type given the target data.

This function can return the predicted problem type as a ProblemType enum, choosing from ProblemType.BINARY, ProblemType.MULTICLASS, and ProblemType.REGRESSION. If the target data is invalid (for instance when there is only 1 unique label), the function will throw an error instead.

[2]:
import pandas as pd
from evalml.problem_types import detect_problem_type

y = pd.Series([0, 1, 1, 0, 1, 1])
detect_problem_type(y)
[2]:
<ProblemTypes.BINARY: 'binary'>

Objective parameter¶

AutoMLSearch takes in an objective parameter to determine which objective to optimize for. By default, this parameter is set to auto, which allows AutoML to choose LogLossBinary for binary classification problems, LogLossMulticlass for multiclass classification problems, and R2 for regression problems.

It should be noted that the objective parameter is only used in ranking and helping choose the pipelines to iterate over, but is not used to optimize each individual pipeline during fit-time.

To get the default objective for each problem type, you can use the get_default_primary_search_objective function.

[3]:
from evalml.automl import get_default_primary_search_objective

binary_objective = get_default_primary_search_objective("binary")
multiclass_objective = get_default_primary_search_objective("multiclass")
regression_objective = get_default_primary_search_objective("regression")

print(binary_objective.name)
print(multiclass_objective.name)
print(regression_objective.name)
Log Loss Binary
Log Loss Multiclass
R2

Using custom pipelines¶

EvalML’s AutoML algorithm generates a set of pipelines to search with. To provide a custom set instead, set allowed_pipelines to a list of custom pipeline classes. Note: this will prevent AutoML from generating other pipelines to search over.

[4]:
from evalml.pipelines import MulticlassClassificationPipeline

class CustomMulticlassClassificationPipeline(MulticlassClassificationPipeline):
    component_graph = ['Simple Imputer', 'Random Forest Classifier']

automl_custom = evalml.automl.AutoMLSearch(problem_type='multiclass', allowed_pipelines=[CustomMulticlassClassificationPipeline])
Using default limit of max_iterations=5.

Stopping the search early¶

To stop the search early, hit Ctrl-C. This will bring up a prompt asking for confirmation. Responding with y will immediately stop the search. Responding with n will continue the search.

Interrupting Search Demo

View Rankings¶

A summary of all the pipelines built can be returned as a pandas DataFrame which is sorted by score. The score column contains the average score across all cross-validation folds while the validation_score column is computed from the first cross-validation fold.

[5]:
automl.rankings
[5]:
id pipeline_name score validation_score percent_better_than_baseline high_variance_cv parameters
0 2 LightGBM Classifier w/ Imputer 0.130426 0.171504 98.986464 True {'Imputer': {'categorical_impute_strategy': 'm...
1 3 Extra Trees Classifier w/ Imputer 0.146243 0.150788 98.863555 False {'Imputer': {'categorical_impute_strategy': 'm...
2 4 Elastic Net Classifier w/ Imputer + Standard S... 0.504486 0.505864 96.079663 False {'Imputer': {'categorical_impute_strategy': 'm...
3 1 Decision Tree Classifier w/ Imputer 1.965214 1.508027 84.728426 True {'Imputer': {'categorical_impute_strategy': 'm...
4 0 Mode Baseline Binary Classification Pipeline 12.868443 12.906595 0.000000 False {'Baseline Classifier': {'strategy': 'mode'}}

Describe Pipeline¶

Each pipeline is given an id. We can get more information about any particular pipeline using that id. Here, we will get more information about the pipeline with id = 1.

[6]:
automl.describe_pipeline(1)
***************************************
* Decision Tree Classifier w/ Imputer *
***************************************

Problem Type: binary
Model Family: Decision Tree

Pipeline Steps
==============
1. Imputer
         * categorical_impute_strategy : most_frequent
         * numeric_impute_strategy : mean
         * categorical_fill_value : None
         * numeric_fill_value : None
2. Decision Tree Classifier
         * criterion : gini
         * max_features : auto
         * max_depth : 6
         * min_samples_split : 2
         * min_weight_fraction_leaf : 0.0

Training
========
Training for binary problems.
Total training time (including CV): 0.2 seconds

Cross Validation
----------------
             Log Loss Binary  MCC Binary   AUC  Precision    F1  Balanced Accuracy Binary  Accuracy Binary # Training # Testing
0                      1.508       0.854 0.936      0.903 0.909                     0.928            0.932    379.000   190.000
1                      2.547       0.843 0.910      0.901 0.901                     0.921            0.926    379.000   190.000
2                      1.840       0.886 0.903      0.941 0.928                     0.940            0.947    380.000   189.000
mean                   1.965       0.861 0.916      0.915 0.913                     0.930            0.935          -         -
std                    0.531       0.023 0.018      0.023 0.013                     0.010            0.011          -         -
coef of var            0.270       0.026 0.019      0.025 0.015                     0.010            0.012          -         -

Get Pipeline¶

We can get the object of any pipeline via their id as well:

[7]:
pipeline = automl.get_pipeline(1)
print(pipeline.name)
print(pipeline.parameters)
Decision Tree Classifier w/ Imputer
{'Imputer': {'categorical_impute_strategy': 'most_frequent', 'numeric_impute_strategy': 'mean', 'categorical_fill_value': None, 'numeric_fill_value': None}, 'Decision Tree Classifier': {'criterion': 'gini', 'max_features': 'auto', 'max_depth': 6, 'min_samples_split': 2, 'min_weight_fraction_leaf': 0.0}}

Get best pipeline¶

If we specifically want to get the best pipeline, there is a convenient accessor for that.

[8]:
best_pipeline = automl.best_pipeline
print(best_pipeline.name)
print(best_pipeline.parameters)
LightGBM Classifier w/ Imputer
{'Imputer': {'categorical_impute_strategy': 'most_frequent', 'numeric_impute_strategy': 'mean', 'categorical_fill_value': None, 'numeric_fill_value': None}, 'LightGBM Classifier': {'boosting_type': 'gbdt', 'learning_rate': 0.1, 'n_estimators': 100, 'max_depth': 0, 'num_leaves': 31, 'min_child_samples': 20, 'n_jobs': -1, 'bagging_freq': 0, 'bagging_fraction': 0.9}}

Access raw results¶

The AutoMLSearch class records detailed results information under the results field, including information about the cross-validation scoring and parameters.

[9]:
automl.results
[9]:
{'pipeline_results': {0: {'id': 0,
   'pipeline_name': 'Mode Baseline Binary Classification Pipeline',
   'pipeline_class': evalml.pipelines.classification.baseline_binary.ModeBaselineBinaryPipeline,
   'pipeline_summary': 'Baseline Classifier',
   'parameters': {'Baseline Classifier': {'strategy': 'mode'}},
   'score': 12.868443394958925,
   'high_variance_cv': False,
   'training_time': 0.0713205337524414,
   'cv_data': [{'all_objective_scores': OrderedDict([('Log Loss Binary',
                   12.906595389677152),
                  ('MCC Binary', 0.0),
                  ('AUC', 0.5),
                  ('Precision', 0.0),
                  ('F1', 0.0),
                  ('Balanced Accuracy Binary', 0.5),
                  ('Accuracy Binary', 0.6263157894736842),
                  ('# Training', 379),
                  ('# Testing', 190)]),
     'score': 12.906595389677152,
     'binary_classification_threshold': 0.5},
    {'all_objective_scores': OrderedDict([('Log Loss Binary',
                   12.906595389677152),
                  ('MCC Binary', 0.0),
                  ('AUC', 0.5),
                  ('Precision', 0.0),
                  ('F1', 0.0),
                  ('Balanced Accuracy Binary', 0.5),
                  ('Accuracy Binary', 0.6263157894736842),
                  ('# Training', 379),
                  ('# Testing', 190)]),
     'score': 12.906595389677152,
     'binary_classification_threshold': 0.5},
    {'all_objective_scores': OrderedDict([('Log Loss Binary',
                   12.792139405522477),
                  ('MCC Binary', 0.0),
                  ('AUC', 0.5),
                  ('Precision', 0.0),
                  ('F1', 0.0),
                  ('Balanced Accuracy Binary', 0.5),
                  ('Accuracy Binary', 0.6296296296296297),
                  ('# Training', 380),
                  ('# Testing', 189)]),
     'score': 12.792139405522477,
     'binary_classification_threshold': 0.5}],
   'percent_better_than_baseline_all_objectives': {'Log Loss Binary': 0,
    'MCC Binary': nan,
    'AUC': 0,
    'Precision': nan,
    'F1': nan,
    'Balanced Accuracy Binary': 0,
    'Accuracy Binary': 0},
   'percent_better_than_baseline': 0,
   'validation_score': 12.906595389677152},
  1: {'id': 1,
   'pipeline_name': 'Decision Tree Classifier w/ Imputer',
   'pipeline_class': evalml.pipelines.utils.make_pipeline.<locals>.GeneratedPipeline,
   'pipeline_summary': 'Decision Tree Classifier w/ Imputer',
   'parameters': {'Imputer': {'categorical_impute_strategy': 'most_frequent',
     'numeric_impute_strategy': 'mean',
     'categorical_fill_value': None,
     'numeric_fill_value': None},
    'Decision Tree Classifier': {'criterion': 'gini',
     'max_features': 'auto',
     'max_depth': 6,
     'min_samples_split': 2,
     'min_weight_fraction_leaf': 0.0}},
   'score': 1.9652138155995642,
   'high_variance_cv': True,
   'training_time': 0.23694443702697754,
   'cv_data': [{'all_objective_scores': OrderedDict([('Log Loss Binary',
                   1.5080271465056452),
                  ('MCC Binary', 0.8542965880445006),
                  ('AUC', 0.9362646467037519),
                  ('Precision', 0.9027777777777778),
                  ('F1', 0.9090909090909091),
                  ('Balanced Accuracy Binary', 0.928334714167357),
                  ('Accuracy Binary', 0.9315789473684211),
                  ('# Training', 379),
                  ('# Testing', 190)]),
     'score': 1.5080271465056452,
     'binary_classification_threshold': 0.5},
    {'all_objective_scores': OrderedDict([('Log Loss Binary',
                   2.547421139007695),
                  ('MCC Binary', 0.8425849212924607),
                  ('AUC', 0.9101077050538526),
                  ('Precision', 0.9014084507042254),
                  ('F1', 0.9014084507042254),
                  ('Balanced Accuracy Binary', 0.9212924606462303),
                  ('Accuracy Binary', 0.9263157894736842),
                  ('# Training', 379),
                  ('# Testing', 190)]),
     'score': 2.547421139007695,
     'binary_classification_threshold': 0.5},
    {'all_objective_scores': OrderedDict([('Log Loss Binary',
                   1.8401931612853524),
                  ('MCC Binary', 0.8861141678760591),
                  ('AUC', 0.9030012004801921),
                  ('Precision', 0.9411764705882353),
                  ('F1', 0.9275362318840579),
                  ('Balanced Accuracy Binary', 0.9403361344537815),
                  ('Accuracy Binary', 0.9470899470899471),
                  ('# Training', 380),
                  ('# Testing', 189)]),
     'score': 1.8401931612853524,
     'binary_classification_threshold': 0.5}],
   'percent_better_than_baseline_all_objectives': {'Log Loss Binary': 84.72842631169038,
    'MCC Binary': nan,
    'AUC': 83.29157014918644,
    'Precision': nan,
    'F1': nan,
    'Balanced Accuracy Binary': 85.99755395115791,
    'Accuracy Binary': 49.022073618179675},
   'percent_better_than_baseline': 84.72842631169038,
   'validation_score': 1.5080271465056452},
  2: {'id': 2,
   'pipeline_name': 'LightGBM Classifier w/ Imputer',
   'pipeline_class': evalml.pipelines.utils.make_pipeline.<locals>.GeneratedPipeline,
   'pipeline_summary': 'LightGBM Classifier w/ Imputer',
   'parameters': {'Imputer': {'categorical_impute_strategy': 'most_frequent',
     'numeric_impute_strategy': 'mean',
     'categorical_fill_value': None,
     'numeric_fill_value': None},
    'LightGBM Classifier': {'boosting_type': 'gbdt',
     'learning_rate': 0.1,
     'n_estimators': 100,
     'max_depth': 0,
     'num_leaves': 31,
     'min_child_samples': 20,
     'n_jobs': -1,
     'bagging_freq': 0,
     'bagging_fraction': 0.9}},
   'score': 0.13042625711564854,
   'high_variance_cv': True,
   'training_time': 0.4964878559112549,
   'cv_data': [{'all_objective_scores': OrderedDict([('Log Loss Binary',
                   0.17150430072453213),
                  ('MCC Binary', 0.8871869342405617),
                  ('AUC', 0.9849686353414605),
                  ('Precision', 0.9552238805970149),
                  ('F1', 0.927536231884058),
                  ('Balanced Accuracy Binary', 0.938099183335306),
                  ('Accuracy Binary', 0.9473684210526315),
                  ('# Training', 379),
                  ('# Testing', 190)]),
     'score': 0.17150430072453213,
     'binary_classification_threshold': 0.5},
    {'all_objective_scores': OrderedDict([('Log Loss Binary',
                   0.13426632483302797),
                  ('MCC Binary', 0.932536394839626),
                  ('AUC', 0.9924251390697124),
                  ('Precision', 0.9577464788732394),
                  ('F1', 0.9577464788732394),
                  ('Balanced Accuracy Binary', 0.966268197419813),
                  ('Accuracy Binary', 0.968421052631579),
                  ('# Training', 379),
                  ('# Testing', 190)]),
     'score': 0.13426632483302797,
     'binary_classification_threshold': 0.5},
    {'all_objective_scores': OrderedDict([('Log Loss Binary',
                   0.08550814578938552),
                  ('MCC Binary', 0.9431710402960837),
                  ('AUC', 0.9953181272509004),
                  ('Precision', 0.9710144927536232),
                  ('F1', 0.9640287769784173),
                  ('Balanced Accuracy Binary', 0.9701680672268908),
                  ('Accuracy Binary', 0.9735449735449735),
                  ('# Training', 380),
                  ('# Testing', 189)]),
     'score': 0.08550814578938552,
     'binary_classification_threshold': 0.5}],
   'percent_better_than_baseline_all_objectives': {'Log Loss Binary': 98.98646438335548,
    'MCC Binary': nan,
    'AUC': 98.18079344413822,
    'Precision': nan,
    'F1': nan,
    'Balanced Accuracy Binary': 91.63569653213398,
    'Accuracy Binary': 53.50337318025803},
   'percent_better_than_baseline': 98.98646438335548,
   'validation_score': 0.17150430072453213},
  3: {'id': 3,
   'pipeline_name': 'Extra Trees Classifier w/ Imputer',
   'pipeline_class': evalml.pipelines.utils.make_pipeline.<locals>.GeneratedPipeline,
   'pipeline_summary': 'Extra Trees Classifier w/ Imputer',
   'parameters': {'Imputer': {'categorical_impute_strategy': 'most_frequent',
     'numeric_impute_strategy': 'mean',
     'categorical_fill_value': None,
     'numeric_fill_value': None},
    'Extra Trees Classifier': {'n_estimators': 100,
     'max_features': 'auto',
     'max_depth': 6,
     'min_samples_split': 2,
     'min_weight_fraction_leaf': 0.0,
     'n_jobs': -1}},
   'score': 0.14624281898680191,
   'high_variance_cv': False,
   'training_time': 1.178330659866333,
   'cv_data': [{'all_objective_scores': OrderedDict([('Log Loss Binary',
                   0.1507883046281384),
                  ('MCC Binary', 0.9097672817424011),
                  ('AUC', 0.9904130666351048),
                  ('Precision', 0.9565217391304348),
                  ('F1', 0.9428571428571428),
                  ('Balanced Accuracy Binary', 0.9521836903775595),
                  ('Accuracy Binary', 0.9578947368421052),
                  ('# Training', 379),
                  ('# Testing', 190)]),
     'score': 0.1507883046281384,
     'binary_classification_threshold': 0.5},
    {'all_objective_scores': OrderedDict([('Log Loss Binary',
                   0.14112945598043936),
                  ('MCC Binary', 0.8757606542930872),
                  ('AUC', 0.9931352822819268),
                  ('Precision', 0.9411764705882353),
                  ('F1', 0.920863309352518),
                  ('Balanced Accuracy Binary', 0.9338975026630371),
                  ('Accuracy Binary', 0.9421052631578948),
                  ('# Training', 379),
                  ('# Testing', 190)]),
     'score': 0.14112945598043936,
     'binary_classification_threshold': 0.5},
    {'all_objective_scores': OrderedDict([('Log Loss Binary',
                   0.14681069635182795),
                  ('MCC Binary', 0.9110001426138854),
                  ('AUC', 0.9895558223289316),
                  ('Precision', 1.0),
                  ('F1', 0.9393939393939393),
                  ('Balanced Accuracy Binary', 0.9428571428571428),
                  ('Accuracy Binary', 0.9576719576719577),
                  ('# Training', 380),
                  ('# Testing', 189)]),
     'score': 0.14681069635182795,
     'binary_classification_threshold': 0.5}],
   'percent_better_than_baseline_all_objectives': {'Log Loss Binary': 98.86355470900163,
    'MCC Binary': nan,
    'AUC': 98.20694474973088,
    'Precision': nan,
    'F1': nan,
    'Balanced Accuracy Binary': 88.5958890598493,
    'Accuracy Binary': 51.821221446325005},
   'percent_better_than_baseline': 98.86355470900163,
   'validation_score': 0.1507883046281384},
  4: {'id': 4,
   'pipeline_name': 'Elastic Net Classifier w/ Imputer + Standard Scaler',
   'pipeline_class': evalml.pipelines.utils.make_pipeline.<locals>.GeneratedPipeline,
   'pipeline_summary': 'Elastic Net Classifier w/ Imputer + Standard Scaler',
   'parameters': {'Imputer': {'categorical_impute_strategy': 'most_frequent',
     'numeric_impute_strategy': 'mean',
     'categorical_fill_value': None,
     'numeric_fill_value': None},
    'Elastic Net Classifier': {'alpha': 0.5,
     'l1_ratio': 0.5,
     'n_jobs': -1,
     'max_iter': 1000,
     'penalty': 'elasticnet',
     'loss': 'log'}},
   'score': 0.5044863636894502,
   'high_variance_cv': False,
   'training_time': 0.16756534576416016,
   'cv_data': [{'all_objective_scores': OrderedDict([('Log Loss Binary',
                   0.5058642653768668),
                  ('MCC Binary', 0.5494527701280131),
                  ('AUC', 0.9874541365842111),
                  ('Precision', 1.0),
                  ('F1', 0.5800000000000001),
                  ('Balanced Accuracy Binary', 0.704225352112676),
                  ('Accuracy Binary', 0.7789473684210526),
                  ('# Training', 379),
                  ('# Testing', 190)]),
     'score': 0.5058642653768668,
     'binary_classification_threshold': 0.5},
    {'all_objective_scores': OrderedDict([('Log Loss Binary',
                   0.5093270551457649),
                  ('MCC Binary', 0.6151945500355619),
                  ('AUC', 0.9827198485027814),
                  ('Precision', 1.0),
                  ('F1', 0.660377358490566),
                  ('Balanced Accuracy Binary', 0.7464788732394366),
                  ('Accuracy Binary', 0.8105263157894737),
                  ('# Training', 379),
                  ('# Testing', 190)]),
     'score': 0.5093270551457649,
     'binary_classification_threshold': 0.5},
    {'all_objective_scores': OrderedDict([('Log Loss Binary',
                   0.49826777054571886),
                  ('MCC Binary', 0.6324555320336759),
                  ('AUC', 0.9876350540216087),
                  ('Precision', 1.0),
                  ('F1', 0.6792452830188679),
                  ('Balanced Accuracy Binary', 0.7571428571428571),
                  ('Accuracy Binary', 0.8201058201058201),
                  ('# Training', 380),
                  ('# Testing', 189)]),
     'score': 0.49826777054571886,
     'binary_classification_threshold': 0.5}],
   'percent_better_than_baseline_all_objectives': {'Log Loss Binary': 96.07966287603148,
    'MCC Binary': nan,
    'AUC': 97.18726927390674,
    'Precision': nan,
    'F1': nan,
    'Balanced Accuracy Binary': 47.18980549966465,
    'Accuracy Binary': 28.015149721860578},
   'percent_better_than_baseline': 96.07966287603148,
   'validation_score': 0.5058642653768668}},
 'search_order': [0, 1, 2, 3, 4]}
User Guide Objectives
Alteryx Open Source
GitHub Twitter

Copyright