Regression Example¶
[1]:
import evalml
from evalml import AutoRegressionSearch
from evalml.demos import load_diabetes
from evalml.pipelines import PipelineBase, get_pipelines
X, y = evalml.demos.load_diabetes()
automl = AutoRegressionSearch(objective="R2", max_pipelines=5)
automl.search(X, y)
*****************************
* Beginning pipeline search *
*****************************
Optimizing for R2. Greater score is better.
Searching up to 5 pipelines.
Possible model types: random_forest, catboost, linear_model
✔ Random Forest Regressor w/ One Hot ...    20%|██        | Elapsed:00:09
✔ Random Forest Regressor w/ One Hot ...    40%|████      | Elapsed:00:16
✔ Linear Regressor w/ One Hot Encoder...    60%|██████    | Elapsed:00:16
✔ Random Forest Regressor w/ One Hot ...    80%|████████  | Elapsed:00:25
✔ CatBoost Regressor w/ Simple Imputer:    100%|██████████| Elapsed:00:26
✔ Optimization finished                    100%|██████████| Elapsed:00:26
[2]:
automl.rankings
[2]:
| id | pipeline_class_name | score | high_variance_cv | parameters | |
|---|---|---|---|---|---|
| 0 | 2 | LinearRegressionPipeline | 0.488703 | False | {'impute_strategy': 'mean', 'normalize': True,... | 
| 1 | 0 | RFRegressionPipeline | 0.422322 | False | {'n_estimators': 569, 'max_depth': 22, 'impute... | 
| 2 | 3 | RFRegressionPipeline | 0.383134 | False | {'n_estimators': 609, 'max_depth': 7, 'impute_... | 
| 3 | 1 | RFRegressionPipeline | 0.381204 | False | {'n_estimators': 369, 'max_depth': 10, 'impute... | 
| 4 | 4 | CatBoostRegressionPipeline | 0.250449 | False | {'impute_strategy': 'most_frequent', 'n_estima... | 
[3]:
automl.best_pipeline
[3]:
<evalml.pipelines.regression.linear_regression.LinearRegressionPipeline at 0x7fc084451cc0>
[4]:
automl.get_pipeline(0)
[4]:
<evalml.pipelines.regression.random_forest.RFRegressionPipeline at 0x7fc08558bf60>
[5]:
automl.describe_pipeline(0)
************************************************************************************************
* Random Forest Regressor w/ One Hot Encoder + Simple Imputer + RF Regressor Select From Model *
************************************************************************************************
Problem Types: Regression
Model Type: Random Forest
Objective to Optimize: R2 (greater is better)
Number of features: 8
Pipeline Steps
==============
1. One Hot Encoder
2. Simple Imputer
         * impute_strategy : most_frequent
3. RF Regressor Select From Model
         * percent_features : 0.8593661614465293
         * threshold : -inf
4. Random Forest Regressor
         * n_estimators : 569
         * max_depth : 22
Training
========
Training for Regression problems.
Total training time (including CV): 10.0 seconds
Cross Validation
----------------
               R2    MAE      MSE  MedianAE  MaxError  ExpVariance # Training # Testing
0           0.427 46.033 3276.018    39.699   161.858        0.428    294.000   148.000
1           0.450 48.953 3487.566    44.344   160.513        0.451    295.000   147.000
2           0.390 47.401 3477.117    41.297   171.420        0.390    295.000   147.000
mean        0.422 47.462 3413.567    41.780   164.597        0.423          -         -
std         0.031  1.461  119.235     2.360     5.947        0.031          -         -
coef of var 0.072  0.031    0.035     0.056     0.036        0.073          -         -