Exploring search results¶
After finishing a pipeline search, we can inspect the results. First, let’s build a search of 10 different pipelines to explore.
[1]:
import evalml
from evalml import AutoMLSearch
X, y = evalml.demos.load_breast_cancer()
automl = AutoMLSearch(problem_type='binary',
objective="f1",
max_pipelines=5)
automl.search(X, y)
Generating pipelines to search over...
*****************************
* Beginning pipeline search *
*****************************
Optimizing for F1.
Greater score is better.
Searching up to 5 pipelines.
Allowed model families: xgboost, catboost, random_forest, linear_model
✔ Mode Baseline Binary Classification... 0%| | Elapsed:00:00
✔ CatBoost Classifier w/ Simple Imput... 20%|██ | Elapsed:00:22
✔ Logistic Regression Classifier w/ S... 40%|████ | Elapsed:00:23
✔ Random Forest Classifier w/ Simple ... 60%|██████ | Elapsed:00:25
▹ XGBoost Classifier w/ Simple Imputer: 80%|████████ | Elapsed:00:25[21:04:10] WARNING: ../src/learner.cc:1061: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
[21:04:10] WARNING: ../src/learner.cc:1061: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-evalml/envs/v0.11.0/lib/python3.7/site-packages/xgboost/sklearn.py:888: UserWarning:
The use of label encoder in XGBClassifier is deprecated and will be removed in a future release. To remove this warning, do the following: 1) Pass option use_label_encoder=False when constructing XGBClassifier object; and 2) Encode your labels (y) as integers starting with 0, i.e. 0, 1, 2, ..., [num_class - 1].
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-evalml/envs/v0.11.0/lib/python3.7/site-packages/xgboost/sklearn.py:888: UserWarning:
The use of label encoder in XGBClassifier is deprecated and will be removed in a future release. To remove this warning, do the following: 1) Pass option use_label_encoder=False when constructing XGBClassifier object; and 2) Encode your labels (y) as integers starting with 0, i.e. 0, 1, 2, ..., [num_class - 1].
[21:04:10] WARNING: ../src/learner.cc:1061: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
✔ XGBoost Classifier w/ Simple Imputer: 80%|████████ | Elapsed:00:25
✔ Optimization finished 80%|████████ | Elapsed:00:25
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-evalml/envs/v0.11.0/lib/python3.7/site-packages/xgboost/sklearn.py:888: UserWarning:
The use of label encoder in XGBClassifier is deprecated and will be removed in a future release. To remove this warning, do the following: 1) Pass option use_label_encoder=False when constructing XGBClassifier object; and 2) Encode your labels (y) as integers starting with 0, i.e. 0, 1, 2, ..., [num_class - 1].
View Rankings¶
A summary of all the pipelines built can be returned as a pandas DataFrame. It is sorted by score. EvalML knows based on our objective function whether higher or lower is better.
[2]:
automl.rankings
[2]:
| id | pipeline_name | score | high_variance_cv | parameters | |
|---|---|---|---|---|---|
| 0 | 2 | Logistic Regression Classifier w/ Simple Imput... | 0.980447 | False | {'Simple Imputer': {'impute_strategy': 'most_f... |
| 1 | 1 | CatBoost Classifier w/ Simple Imputer | 0.976333 | False | {'Simple Imputer': {'impute_strategy': 'most_f... |
| 2 | 4 | XGBoost Classifier w/ Simple Imputer | 0.970577 | False | {'Simple Imputer': {'impute_strategy': 'most_f... |
| 3 | 3 | Random Forest Classifier w/ Simple Imputer | 0.966629 | False | {'Simple Imputer': {'impute_strategy': 'most_f... |
| 4 | 0 | Mode Baseline Binary Classification Pipeline | 0.771060 | False | {'Baseline Classifier': {'strategy': 'random_w... |
Describe Pipeline¶
Each pipeline is given an id. We can get more information about any particular pipeline using that id. Here, we will get more information about the pipeline with id = 0.
[3]:
automl.describe_pipeline(1)
*****************************************
* CatBoost Classifier w/ Simple Imputer *
*****************************************
Problem Type: Binary Classification
Model Family: CatBoost
Pipeline Steps
==============
1. Simple Imputer
* impute_strategy : most_frequent
* fill_value : None
2. CatBoost Classifier
* n_estimators : 1000
* eta : 0.03
* max_depth : 6
* bootstrap_type : None
Training
========
Training for Binary Classification problems.
Total training time (including CV): 22.9 seconds
Cross Validation
----------------
F1 Accuracy Binary Balanced Accuracy Binary Precision AUC Log Loss Binary MCC Binary # Training # Testing
0 0.967 0.958 0.949 0.951 0.995 0.106 0.910 379.0 190.0
1 0.983 0.979 0.975 0.975 0.994 0.082 0.955 379.0 190.0
2 0.979 0.974 0.976 0.991 0.990 0.093 0.944 380.0 189.0
mean 0.976 0.970 0.967 0.973 0.993 0.094 0.936 - -
std 0.008 0.011 0.015 0.020 0.003 0.012 0.024 - -
coef of var 0.009 0.011 0.016 0.021 0.003 0.128 0.025 - -
Get Pipeline¶
We can get the object of any pipeline via their id as well:
[4]:
automl.get_pipeline(1)
[4]:
<evalml.pipelines.utils.make_pipeline.<locals>.GeneratedPipeline at 0x7f26597c0850>
Get best pipeline¶
If we specifically want to get the best pipeline, there is a convenient access
[5]:
automl.best_pipeline
[5]:
<evalml.pipelines.utils.make_pipeline.<locals>.GeneratedPipeline at 0x7f26d8cfc390>
Feature Importance¶
We can get the importance associated with each feature of the resulting pipeline
[6]:
pipeline = automl.get_pipeline(1)
pipeline.fit(X, y)
pipeline.feature_importance
[6]:
| feature | importance | |
|---|---|---|
| 0 | worst texture | 11.284484 |
| 1 | worst concave points | 8.105820 |
| 2 | worst perimeter | 7.989750 |
| 3 | worst radius | 7.872959 |
| 4 | worst area | 7.851859 |
| 5 | mean concave points | 7.098117 |
| 6 | mean texture | 5.919744 |
| 7 | worst smoothness | 4.831183 |
| 8 | worst concavity | 4.665609 |
| 9 | area error | 4.204315 |
| 10 | compactness error | 2.616379 |
| 11 | worst symmetry | 2.088257 |
| 12 | mean concavity | 2.015011 |
| 13 | radius error | 1.948857 |
| 14 | concave points error | 1.875384 |
| 15 | mean compactness | 1.824212 |
| 16 | perimeter error | 1.705293 |
| 17 | mean smoothness | 1.697390 |
| 18 | worst fractal dimension | 1.629284 |
| 19 | mean radius | 1.585845 |
| 20 | mean area | 1.479429 |
| 21 | smoothness error | 1.460370 |
| 22 | fractal dimension error | 1.370231 |
| 23 | mean perimeter | 1.306400 |
| 24 | texture error | 1.065599 |
| 25 | mean fractal dimension | 1.000343 |
| 26 | worst compactness | 0.967447 |
| 27 | mean symmetry | 0.960967 |
| 28 | symmetry error | 0.912462 |
| 29 | concavity error | 0.666999 |
We can also create a bar plot of the feature importances
[7]:
pipeline.graph_feature_importance()
Precision-Recall Curve¶
For binary classification, you can view the precision-recall curve of a classifier
[8]:
# get the predicted probabilities associated with the "true" label
y_pred_proba = pipeline.predict_proba(X)[1]
evalml.pipelines.graph_utils.graph_precision_recall_curve(y, y_pred_proba)
ROC Curve¶
For binary and multiclass classification, you can view the ROC curve of a classifier
[9]:
# get the predicted probabilities associated with the "true" label
y_pred_proba = pipeline.predict_proba(X)[1]
evalml.pipelines.graph_utils.graph_roc_curve(y, y_pred_proba)
Confusion Matrix¶
For binary or multiclass classification, you can view a confusion matrix of the classifier’s predictions
[10]:
y_pred = pipeline.predict(X)
evalml.pipelines.graph_utils.graph_confusion_matrix(y, y_pred)
Access raw results¶
You can also get access to all the underlying data, like this:
[11]:
automl.results
[11]:
{'pipeline_results': {0: {'id': 0,
'pipeline_name': 'Mode Baseline Binary Classification Pipeline',
'pipeline_class': evalml.pipelines.classification.baseline_binary.ModeBaselineBinaryPipeline,
'pipeline_summary': 'Baseline Classifier',
'parameters': {'Baseline Classifier': {'strategy': 'random_weighted'}},
'score': 0.7710601157203097,
'high_variance_cv': False,
'training_time': 0.03474116325378418,
'cv_data': [{'all_objective_scores': OrderedDict([('F1',
0.7702265372168284),
('Accuracy Binary', 0.6263157894736842),
('Balanced Accuracy Binary', 0.5),
('Precision', 0.6263157894736842),
('AUC', 0.5),
('Log Loss Binary', 0.6608932451679239),
('MCC Binary', 0.0),
('# Training', 379),
('# Testing', 190)]),
'score': 0.7702265372168284,
'binary_classification_threshold': 0.5},
{'all_objective_scores': OrderedDict([('F1', 0.7702265372168284),
('Accuracy Binary', 0.6263157894736842),
('Balanced Accuracy Binary', 0.5),
('Precision', 0.6263157894736842),
('AUC', 0.5),
('Log Loss Binary', 0.6608932451679239),
('MCC Binary', 0.0),
('# Training', 379),
('# Testing', 190)]),
'score': 0.7702265372168284,
'binary_classification_threshold': 0.5},
{'all_objective_scores': OrderedDict([('F1', 0.7727272727272727),
('Accuracy Binary', 0.6296296296296297),
('Balanced Accuracy Binary', 0.5),
('Precision', 0.6296296296296297),
('AUC', 0.5),
('Log Loss Binary', 0.6591759924082952),
('MCC Binary', 0.0),
('# Training', 380),
('# Testing', 189)]),
'score': 0.7727272727272727,
'binary_classification_threshold': 0.5}]},
1: {'id': 1,
'pipeline_name': 'CatBoost Classifier w/ Simple Imputer',
'pipeline_class': evalml.pipelines.utils.make_pipeline.<locals>.GeneratedPipeline,
'pipeline_summary': 'CatBoost Classifier w/ Simple Imputer',
'parameters': {'Simple Imputer': {'impute_strategy': 'most_frequent',
'fill_value': None},
'CatBoost Classifier': {'n_estimators': 1000,
'eta': 0.03,
'max_depth': 6,
'bootstrap_type': None}},
'score': 0.9763329621163277,
'high_variance_cv': False,
'training_time': 22.883776903152466,
'cv_data': [{'all_objective_scores': OrderedDict([('F1',
0.9669421487603305),
('Accuracy Binary', 0.9578947368421052),
('Balanced Accuracy Binary', 0.9493431175287016),
('Precision', 0.9512195121951219),
('AUC', 0.9945555687063559),
('Log Loss Binary', 0.10583268649418161),
('MCC Binary', 0.909956827190137),
('# Training', 379),
('# Testing', 190)]),
'score': 0.9669421487603305,
'binary_classification_threshold': 0.5},
{'all_objective_scores': OrderedDict([('F1', 0.9833333333333334),
('Accuracy Binary', 0.9789473684210527),
('Balanced Accuracy Binary', 0.9746715587643509),
('Precision', 0.9752066115702479),
('AUC', 0.9943188543022844),
('Log Loss Binary', 0.08186397218927995),
('MCC Binary', 0.955011564828661),
('# Training', 379),
('# Testing', 190)]),
'score': 0.9833333333333334,
'binary_classification_threshold': 0.5},
{'all_objective_scores': OrderedDict([('F1', 0.9787234042553192),
('Accuracy Binary', 0.9735449735449735),
('Balanced Accuracy Binary', 0.9760504201680673),
('Precision', 0.9913793103448276),
('AUC', 0.9899159663865547),
('Log Loss Binary', 0.09296190874649334),
('MCC Binary', 0.9443109474170326),
('# Training', 380),
('# Testing', 189)]),
'score': 0.9787234042553192,
'binary_classification_threshold': 0.5}]},
2: {'id': 2,
'pipeline_name': 'Logistic Regression Classifier w/ Simple Imputer + Standard Scaler',
'pipeline_class': evalml.pipelines.utils.make_pipeline.<locals>.GeneratedPipeline,
'pipeline_summary': 'Logistic Regression Classifier w/ Simple Imputer + Standard Scaler',
'parameters': {'Simple Imputer': {'impute_strategy': 'most_frequent',
'fill_value': None},
'Logistic Regression Classifier': {'penalty': 'l2',
'C': 1.0,
'n_jobs': -1}},
'score': 0.9804468499596668,
'high_variance_cv': False,
'training_time': 1.0504083633422852,
'cv_data': [{'all_objective_scores': OrderedDict([('F1',
0.9831932773109243),
('Accuracy Binary', 0.9789473684210527),
('Balanced Accuracy Binary', 0.9775121316132087),
('Precision', 0.9831932773109243),
('AUC', 0.9936087110900698),
('Log Loss Binary', 0.09347817517438463),
('MCC Binary', 0.9550242632264173),
('# Training', 379),
('# Testing', 190)]),
'score': 0.9831932773109243,
'binary_classification_threshold': 0.5},
{'all_objective_scores': OrderedDict([('F1', 0.9794238683127572),
('Accuracy Binary', 0.9736842105263158),
('Balanced Accuracy Binary', 0.9647887323943662),
('Precision', 0.9596774193548387),
('AUC', 0.9975144987572493),
('Log Loss Binary', 0.08320464479579018),
('MCC Binary', 0.9445075449666159),
('# Training', 379),
('# Testing', 190)]),
'score': 0.9794238683127572,
'binary_classification_threshold': 0.5},
{'all_objective_scores': OrderedDict([('F1', 0.9787234042553192),
('Accuracy Binary', 0.9735449735449735),
('Balanced Accuracy Binary', 0.9760504201680673),
('Precision', 0.9913793103448276),
('AUC', 0.9906362545018007),
('Log Loss Binary', 0.09680859555948443),
('MCC Binary', 0.9443109474170326),
('# Training', 380),
('# Testing', 189)]),
'score': 0.9787234042553192,
'binary_classification_threshold': 0.5}]},
3: {'id': 3,
'pipeline_name': 'Random Forest Classifier w/ Simple Imputer',
'pipeline_class': evalml.pipelines.utils.make_pipeline.<locals>.GeneratedPipeline,
'pipeline_summary': 'Random Forest Classifier w/ Simple Imputer',
'parameters': {'Simple Imputer': {'impute_strategy': 'most_frequent',
'fill_value': None},
'Random Forest Classifier': {'n_estimators': 100,
'max_depth': 6,
'n_jobs': -1}},
'score': 0.9666291581686476,
'high_variance_cv': False,
'training_time': 1.4155998229980469,
'cv_data': [{'all_objective_scores': OrderedDict([('F1',
0.9543568464730291),
('Accuracy Binary', 0.9421052631578948),
('Balanced Accuracy Binary', 0.9338975026630371),
('Precision', 0.9426229508196722),
('AUC', 0.9893478518167831),
('Log Loss Binary', 0.13984688783161608),
('MCC Binary', 0.8757606542930872),
('# Training', 379),
('# Testing', 190)]),
'score': 0.9543568464730291,
'binary_classification_threshold': 0.5},
{'all_objective_scores': OrderedDict([('F1', 0.9709543568464729),
('Accuracy Binary', 0.9631578947368421),
('Balanced Accuracy Binary', 0.9563853710498283),
('Precision', 0.9590163934426229),
('AUC', 0.989347851816783),
('Log Loss Binary', 0.12010721015394274),
('MCC Binary', 0.9211492315750531),
('# Training', 379),
('# Testing', 190)]),
'score': 0.9709543568464729,
'binary_classification_threshold': 0.5},
{'all_objective_scores': OrderedDict([('F1', 0.9745762711864406),
('Accuracy Binary', 0.9682539682539683),
('Balanced Accuracy Binary', 0.9689075630252101),
('Precision', 0.9829059829059829),
('AUC', 0.9927971188475391),
('Log Loss Binary', 0.10765634363120971),
('MCC Binary', 0.9325680982740896),
('# Training', 380),
('# Testing', 189)]),
'score': 0.9745762711864406,
'binary_classification_threshold': 0.5}]},
4: {'id': 4,
'pipeline_name': 'XGBoost Classifier w/ Simple Imputer',
'pipeline_class': evalml.pipelines.utils.make_pipeline.<locals>.GeneratedPipeline,
'pipeline_summary': 'XGBoost Classifier w/ Simple Imputer',
'parameters': {'Simple Imputer': {'impute_strategy': 'most_frequent',
'fill_value': None},
'XGBoost Classifier': {'eta': 0.1,
'max_depth': 6,
'min_child_weight': 1,
'n_estimators': 100}},
'score': 0.9705772481706093,
'high_variance_cv': False,
'training_time': 0.4369237422943115,
'cv_data': [{'all_objective_scores': OrderedDict([('F1',
0.9666666666666667),
('Accuracy Binary', 0.9578947368421052),
('Balanced Accuracy Binary', 0.9521836903775595),
('Precision', 0.9586776859504132),
('AUC', 0.9915966386554622),
('Log Loss Binary', 0.11449876085695762),
('MCC Binary', 0.9097672817424011),
('# Training', 379),
('# Testing', 190)]),
'score': 0.9666666666666667,
'binary_classification_threshold': 0.5},
{'all_objective_scores': OrderedDict([('F1', 0.979253112033195),
('Accuracy Binary', 0.9736842105263158),
('Balanced Accuracy Binary', 0.9676293052432241),
('Precision', 0.9672131147540983),
('AUC', 0.9959758551307847),
('Log Loss Binary', 0.07421583775339011),
('MCC Binary', 0.943843520216036),
('# Training', 379),
('# Testing', 190)]),
'score': 0.979253112033195,
'binary_classification_threshold': 0.5},
{'all_objective_scores': OrderedDict([('F1', 0.9658119658119659),
('Accuracy Binary', 0.9576719576719577),
('Balanced Accuracy Binary', 0.9605042016806722),
('Precision', 0.9826086956521739),
('AUC', 0.9885954381752701),
('Log Loss Binary', 0.11418110851220609),
('MCC Binary', 0.9112159507396058),
('# Training', 380),
('# Testing', 189)]),
'score': 0.9658119658119659,
'binary_classification_threshold': 0.5}]}},
'search_order': [0, 1, 2, 3, 4]}