Exploring search results

After finishing a pipeline search, we can inspect the results. First, let’s build a search of 10 different pipelines to explore.

[1]:
import evalml
from evalml import AutoMLSearch

X, y = evalml.demos.load_breast_cancer()

automl = AutoMLSearch(problem_type='binary',
                      objective="f1",
                      max_pipelines=5)

automl.search(X, y)
Generating pipelines to search over...
*****************************
* Beginning pipeline search *
*****************************

Optimizing for F1.
Greater score is better.

Searching up to 5 pipelines.
Allowed model families: xgboost, catboost, random_forest, linear_model

✔ Mode Baseline Binary Classification...     0%|          | Elapsed:00:00
✔ CatBoost Classifier w/ Simple Imput...    20%|██        | Elapsed:00:22
✔ Logistic Regression Classifier w/ S...    40%|████      | Elapsed:00:23
✔ Random Forest Classifier w/ Simple ...    60%|██████    | Elapsed:00:25
▹ XGBoost Classifier w/ Simple Imputer:     80%|████████  | Elapsed:00:25[21:04:10] WARNING: ../src/learner.cc:1061: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
[21:04:10] WARNING: ../src/learner.cc:1061: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-evalml/envs/v0.11.0/lib/python3.7/site-packages/xgboost/sklearn.py:888: UserWarning:

The use of label encoder in XGBClassifier is deprecated and will be removed in a future release. To remove this warning, do the following: 1) Pass option use_label_encoder=False when constructing XGBClassifier object; and 2) Encode your labels (y) as integers starting with 0, i.e. 0, 1, 2, ..., [num_class - 1].

/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-evalml/envs/v0.11.0/lib/python3.7/site-packages/xgboost/sklearn.py:888: UserWarning:

The use of label encoder in XGBClassifier is deprecated and will be removed in a future release. To remove this warning, do the following: 1) Pass option use_label_encoder=False when constructing XGBClassifier object; and 2) Encode your labels (y) as integers starting with 0, i.e. 0, 1, 2, ..., [num_class - 1].

[21:04:10] WARNING: ../src/learner.cc:1061: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
✔ XGBoost Classifier w/ Simple Imputer:     80%|████████  | Elapsed:00:25
✔ Optimization finished                     80%|████████  | Elapsed:00:25
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-evalml/envs/v0.11.0/lib/python3.7/site-packages/xgboost/sklearn.py:888: UserWarning:

The use of label encoder in XGBClassifier is deprecated and will be removed in a future release. To remove this warning, do the following: 1) Pass option use_label_encoder=False when constructing XGBClassifier object; and 2) Encode your labels (y) as integers starting with 0, i.e. 0, 1, 2, ..., [num_class - 1].

View Rankings

A summary of all the pipelines built can be returned as a pandas DataFrame. It is sorted by score. EvalML knows based on our objective function whether higher or lower is better.

[2]:
automl.rankings
[2]:
id pipeline_name score high_variance_cv parameters
0 2 Logistic Regression Classifier w/ Simple Imput... 0.980447 False {'Simple Imputer': {'impute_strategy': 'most_f...
1 1 CatBoost Classifier w/ Simple Imputer 0.976333 False {'Simple Imputer': {'impute_strategy': 'most_f...
2 4 XGBoost Classifier w/ Simple Imputer 0.970577 False {'Simple Imputer': {'impute_strategy': 'most_f...
3 3 Random Forest Classifier w/ Simple Imputer 0.966629 False {'Simple Imputer': {'impute_strategy': 'most_f...
4 0 Mode Baseline Binary Classification Pipeline 0.771060 False {'Baseline Classifier': {'strategy': 'random_w...

Describe Pipeline

Each pipeline is given an id. We can get more information about any particular pipeline using that id. Here, we will get more information about the pipeline with id = 0.

[3]:
automl.describe_pipeline(1)
*****************************************
* CatBoost Classifier w/ Simple Imputer *
*****************************************

Problem Type: Binary Classification
Model Family: CatBoost

Pipeline Steps
==============
1. Simple Imputer
         * impute_strategy : most_frequent
         * fill_value : None
2. CatBoost Classifier
         * n_estimators : 1000
         * eta : 0.03
         * max_depth : 6
         * bootstrap_type : None

Training
========
Training for Binary Classification problems.
Total training time (including CV): 22.9 seconds

Cross Validation
----------------
               F1  Accuracy Binary  Balanced Accuracy Binary  Precision   AUC  Log Loss Binary  MCC Binary # Training # Testing
0           0.967            0.958                     0.949      0.951 0.995            0.106       0.910      379.0     190.0
1           0.983            0.979                     0.975      0.975 0.994            0.082       0.955      379.0     190.0
2           0.979            0.974                     0.976      0.991 0.990            0.093       0.944      380.0     189.0
mean        0.976            0.970                     0.967      0.973 0.993            0.094       0.936          -         -
std         0.008            0.011                     0.015      0.020 0.003            0.012       0.024          -         -
coef of var 0.009            0.011                     0.016      0.021 0.003            0.128       0.025          -         -

Get Pipeline

We can get the object of any pipeline via their id as well:

[4]:
automl.get_pipeline(1)
[4]:
<evalml.pipelines.utils.make_pipeline.<locals>.GeneratedPipeline at 0x7f26597c0850>

Get best pipeline

If we specifically want to get the best pipeline, there is a convenient access

[5]:
automl.best_pipeline
[5]:
<evalml.pipelines.utils.make_pipeline.<locals>.GeneratedPipeline at 0x7f26d8cfc390>

Feature Importance

We can get the importance associated with each feature of the resulting pipeline

[6]:
pipeline = automl.get_pipeline(1)
pipeline.fit(X, y)
pipeline.feature_importance
[6]:
feature importance
0 worst texture 11.284484
1 worst concave points 8.105820
2 worst perimeter 7.989750
3 worst radius 7.872959
4 worst area 7.851859
5 mean concave points 7.098117
6 mean texture 5.919744
7 worst smoothness 4.831183
8 worst concavity 4.665609
9 area error 4.204315
10 compactness error 2.616379
11 worst symmetry 2.088257
12 mean concavity 2.015011
13 radius error 1.948857
14 concave points error 1.875384
15 mean compactness 1.824212
16 perimeter error 1.705293
17 mean smoothness 1.697390
18 worst fractal dimension 1.629284
19 mean radius 1.585845
20 mean area 1.479429
21 smoothness error 1.460370
22 fractal dimension error 1.370231
23 mean perimeter 1.306400
24 texture error 1.065599
25 mean fractal dimension 1.000343
26 worst compactness 0.967447
27 mean symmetry 0.960967
28 symmetry error 0.912462
29 concavity error 0.666999

We can also create a bar plot of the feature importances

[7]:
pipeline.graph_feature_importance()

Precision-Recall Curve

For binary classification, you can view the precision-recall curve of a classifier

[8]:
# get the predicted probabilities associated with the "true" label
y_pred_proba = pipeline.predict_proba(X)[1]
evalml.pipelines.graph_utils.graph_precision_recall_curve(y, y_pred_proba)

ROC Curve

For binary and multiclass classification, you can view the ROC curve of a classifier

[9]:
# get the predicted probabilities associated with the "true" label
y_pred_proba = pipeline.predict_proba(X)[1]
evalml.pipelines.graph_utils.graph_roc_curve(y, y_pred_proba)

Confusion Matrix

For binary or multiclass classification, you can view a confusion matrix of the classifier’s predictions

[10]:
y_pred = pipeline.predict(X)
evalml.pipelines.graph_utils.graph_confusion_matrix(y, y_pred)

Access raw results

You can also get access to all the underlying data, like this:

[11]:
automl.results
[11]:
{'pipeline_results': {0: {'id': 0,
   'pipeline_name': 'Mode Baseline Binary Classification Pipeline',
   'pipeline_class': evalml.pipelines.classification.baseline_binary.ModeBaselineBinaryPipeline,
   'pipeline_summary': 'Baseline Classifier',
   'parameters': {'Baseline Classifier': {'strategy': 'random_weighted'}},
   'score': 0.7710601157203097,
   'high_variance_cv': False,
   'training_time': 0.03474116325378418,
   'cv_data': [{'all_objective_scores': OrderedDict([('F1',
                   0.7702265372168284),
                  ('Accuracy Binary', 0.6263157894736842),
                  ('Balanced Accuracy Binary', 0.5),
                  ('Precision', 0.6263157894736842),
                  ('AUC', 0.5),
                  ('Log Loss Binary', 0.6608932451679239),
                  ('MCC Binary', 0.0),
                  ('# Training', 379),
                  ('# Testing', 190)]),
     'score': 0.7702265372168284,
     'binary_classification_threshold': 0.5},
    {'all_objective_scores': OrderedDict([('F1', 0.7702265372168284),
                  ('Accuracy Binary', 0.6263157894736842),
                  ('Balanced Accuracy Binary', 0.5),
                  ('Precision', 0.6263157894736842),
                  ('AUC', 0.5),
                  ('Log Loss Binary', 0.6608932451679239),
                  ('MCC Binary', 0.0),
                  ('# Training', 379),
                  ('# Testing', 190)]),
     'score': 0.7702265372168284,
     'binary_classification_threshold': 0.5},
    {'all_objective_scores': OrderedDict([('F1', 0.7727272727272727),
                  ('Accuracy Binary', 0.6296296296296297),
                  ('Balanced Accuracy Binary', 0.5),
                  ('Precision', 0.6296296296296297),
                  ('AUC', 0.5),
                  ('Log Loss Binary', 0.6591759924082952),
                  ('MCC Binary', 0.0),
                  ('# Training', 380),
                  ('# Testing', 189)]),
     'score': 0.7727272727272727,
     'binary_classification_threshold': 0.5}]},
  1: {'id': 1,
   'pipeline_name': 'CatBoost Classifier w/ Simple Imputer',
   'pipeline_class': evalml.pipelines.utils.make_pipeline.<locals>.GeneratedPipeline,
   'pipeline_summary': 'CatBoost Classifier w/ Simple Imputer',
   'parameters': {'Simple Imputer': {'impute_strategy': 'most_frequent',
     'fill_value': None},
    'CatBoost Classifier': {'n_estimators': 1000,
     'eta': 0.03,
     'max_depth': 6,
     'bootstrap_type': None}},
   'score': 0.9763329621163277,
   'high_variance_cv': False,
   'training_time': 22.883776903152466,
   'cv_data': [{'all_objective_scores': OrderedDict([('F1',
                   0.9669421487603305),
                  ('Accuracy Binary', 0.9578947368421052),
                  ('Balanced Accuracy Binary', 0.9493431175287016),
                  ('Precision', 0.9512195121951219),
                  ('AUC', 0.9945555687063559),
                  ('Log Loss Binary', 0.10583268649418161),
                  ('MCC Binary', 0.909956827190137),
                  ('# Training', 379),
                  ('# Testing', 190)]),
     'score': 0.9669421487603305,
     'binary_classification_threshold': 0.5},
    {'all_objective_scores': OrderedDict([('F1', 0.9833333333333334),
                  ('Accuracy Binary', 0.9789473684210527),
                  ('Balanced Accuracy Binary', 0.9746715587643509),
                  ('Precision', 0.9752066115702479),
                  ('AUC', 0.9943188543022844),
                  ('Log Loss Binary', 0.08186397218927995),
                  ('MCC Binary', 0.955011564828661),
                  ('# Training', 379),
                  ('# Testing', 190)]),
     'score': 0.9833333333333334,
     'binary_classification_threshold': 0.5},
    {'all_objective_scores': OrderedDict([('F1', 0.9787234042553192),
                  ('Accuracy Binary', 0.9735449735449735),
                  ('Balanced Accuracy Binary', 0.9760504201680673),
                  ('Precision', 0.9913793103448276),
                  ('AUC', 0.9899159663865547),
                  ('Log Loss Binary', 0.09296190874649334),
                  ('MCC Binary', 0.9443109474170326),
                  ('# Training', 380),
                  ('# Testing', 189)]),
     'score': 0.9787234042553192,
     'binary_classification_threshold': 0.5}]},
  2: {'id': 2,
   'pipeline_name': 'Logistic Regression Classifier w/ Simple Imputer + Standard Scaler',
   'pipeline_class': evalml.pipelines.utils.make_pipeline.<locals>.GeneratedPipeline,
   'pipeline_summary': 'Logistic Regression Classifier w/ Simple Imputer + Standard Scaler',
   'parameters': {'Simple Imputer': {'impute_strategy': 'most_frequent',
     'fill_value': None},
    'Logistic Regression Classifier': {'penalty': 'l2',
     'C': 1.0,
     'n_jobs': -1}},
   'score': 0.9804468499596668,
   'high_variance_cv': False,
   'training_time': 1.0504083633422852,
   'cv_data': [{'all_objective_scores': OrderedDict([('F1',
                   0.9831932773109243),
                  ('Accuracy Binary', 0.9789473684210527),
                  ('Balanced Accuracy Binary', 0.9775121316132087),
                  ('Precision', 0.9831932773109243),
                  ('AUC', 0.9936087110900698),
                  ('Log Loss Binary', 0.09347817517438463),
                  ('MCC Binary', 0.9550242632264173),
                  ('# Training', 379),
                  ('# Testing', 190)]),
     'score': 0.9831932773109243,
     'binary_classification_threshold': 0.5},
    {'all_objective_scores': OrderedDict([('F1', 0.9794238683127572),
                  ('Accuracy Binary', 0.9736842105263158),
                  ('Balanced Accuracy Binary', 0.9647887323943662),
                  ('Precision', 0.9596774193548387),
                  ('AUC', 0.9975144987572493),
                  ('Log Loss Binary', 0.08320464479579018),
                  ('MCC Binary', 0.9445075449666159),
                  ('# Training', 379),
                  ('# Testing', 190)]),
     'score': 0.9794238683127572,
     'binary_classification_threshold': 0.5},
    {'all_objective_scores': OrderedDict([('F1', 0.9787234042553192),
                  ('Accuracy Binary', 0.9735449735449735),
                  ('Balanced Accuracy Binary', 0.9760504201680673),
                  ('Precision', 0.9913793103448276),
                  ('AUC', 0.9906362545018007),
                  ('Log Loss Binary', 0.09680859555948443),
                  ('MCC Binary', 0.9443109474170326),
                  ('# Training', 380),
                  ('# Testing', 189)]),
     'score': 0.9787234042553192,
     'binary_classification_threshold': 0.5}]},
  3: {'id': 3,
   'pipeline_name': 'Random Forest Classifier w/ Simple Imputer',
   'pipeline_class': evalml.pipelines.utils.make_pipeline.<locals>.GeneratedPipeline,
   'pipeline_summary': 'Random Forest Classifier w/ Simple Imputer',
   'parameters': {'Simple Imputer': {'impute_strategy': 'most_frequent',
     'fill_value': None},
    'Random Forest Classifier': {'n_estimators': 100,
     'max_depth': 6,
     'n_jobs': -1}},
   'score': 0.9666291581686476,
   'high_variance_cv': False,
   'training_time': 1.4155998229980469,
   'cv_data': [{'all_objective_scores': OrderedDict([('F1',
                   0.9543568464730291),
                  ('Accuracy Binary', 0.9421052631578948),
                  ('Balanced Accuracy Binary', 0.9338975026630371),
                  ('Precision', 0.9426229508196722),
                  ('AUC', 0.9893478518167831),
                  ('Log Loss Binary', 0.13984688783161608),
                  ('MCC Binary', 0.8757606542930872),
                  ('# Training', 379),
                  ('# Testing', 190)]),
     'score': 0.9543568464730291,
     'binary_classification_threshold': 0.5},
    {'all_objective_scores': OrderedDict([('F1', 0.9709543568464729),
                  ('Accuracy Binary', 0.9631578947368421),
                  ('Balanced Accuracy Binary', 0.9563853710498283),
                  ('Precision', 0.9590163934426229),
                  ('AUC', 0.989347851816783),
                  ('Log Loss Binary', 0.12010721015394274),
                  ('MCC Binary', 0.9211492315750531),
                  ('# Training', 379),
                  ('# Testing', 190)]),
     'score': 0.9709543568464729,
     'binary_classification_threshold': 0.5},
    {'all_objective_scores': OrderedDict([('F1', 0.9745762711864406),
                  ('Accuracy Binary', 0.9682539682539683),
                  ('Balanced Accuracy Binary', 0.9689075630252101),
                  ('Precision', 0.9829059829059829),
                  ('AUC', 0.9927971188475391),
                  ('Log Loss Binary', 0.10765634363120971),
                  ('MCC Binary', 0.9325680982740896),
                  ('# Training', 380),
                  ('# Testing', 189)]),
     'score': 0.9745762711864406,
     'binary_classification_threshold': 0.5}]},
  4: {'id': 4,
   'pipeline_name': 'XGBoost Classifier w/ Simple Imputer',
   'pipeline_class': evalml.pipelines.utils.make_pipeline.<locals>.GeneratedPipeline,
   'pipeline_summary': 'XGBoost Classifier w/ Simple Imputer',
   'parameters': {'Simple Imputer': {'impute_strategy': 'most_frequent',
     'fill_value': None},
    'XGBoost Classifier': {'eta': 0.1,
     'max_depth': 6,
     'min_child_weight': 1,
     'n_estimators': 100}},
   'score': 0.9705772481706093,
   'high_variance_cv': False,
   'training_time': 0.4369237422943115,
   'cv_data': [{'all_objective_scores': OrderedDict([('F1',
                   0.9666666666666667),
                  ('Accuracy Binary', 0.9578947368421052),
                  ('Balanced Accuracy Binary', 0.9521836903775595),
                  ('Precision', 0.9586776859504132),
                  ('AUC', 0.9915966386554622),
                  ('Log Loss Binary', 0.11449876085695762),
                  ('MCC Binary', 0.9097672817424011),
                  ('# Training', 379),
                  ('# Testing', 190)]),
     'score': 0.9666666666666667,
     'binary_classification_threshold': 0.5},
    {'all_objective_scores': OrderedDict([('F1', 0.979253112033195),
                  ('Accuracy Binary', 0.9736842105263158),
                  ('Balanced Accuracy Binary', 0.9676293052432241),
                  ('Precision', 0.9672131147540983),
                  ('AUC', 0.9959758551307847),
                  ('Log Loss Binary', 0.07421583775339011),
                  ('MCC Binary', 0.943843520216036),
                  ('# Training', 379),
                  ('# Testing', 190)]),
     'score': 0.979253112033195,
     'binary_classification_threshold': 0.5},
    {'all_objective_scores': OrderedDict([('F1', 0.9658119658119659),
                  ('Accuracy Binary', 0.9576719576719577),
                  ('Balanced Accuracy Binary', 0.9605042016806722),
                  ('Precision', 0.9826086956521739),
                  ('AUC', 0.9885954381752701),
                  ('Log Loss Binary', 0.11418110851220609),
                  ('MCC Binary', 0.9112159507396058),
                  ('# Training', 380),
                  ('# Testing', 189)]),
     'score': 0.9658119658119659,
     'binary_classification_threshold': 0.5}]}},
 'search_order': [0, 1, 2, 3, 4]}