Changelog¶

Future Releases

Enhancements
Fixes
Changes
Documentation Changes
Testing Changes

v0.11.0 June 30, 2020

Enhancements
- Added multiclass support for ROC curve graphing #832
- Added preprocessing component to drop features whose percentage of NaN values exceeds a specified threshold #834
- Added data check to check for problematic target labels #814
- Added PerColumnImputer that allows imputation strategies per column #824
- Added transformer to drop specific columns #827
- Added support for categories, handle_error, and drop parameters in OneHotEncoder #830 #897
- Added preprocessing component to handle DateTime columns featurization #838
- Added ability to clone pipelines and components #842
- Define getter method for component parameters #847
- Added utility methods to calculate and graph permutation importances #860, #880
- Added new utility functions necessary for generating dynamic preprocessing pipelines #852
- Added kwargs to all components #863
- Updated AutoSearchBase to use dynamically generated preprocessing pipelines #870
- Added SelectColumns transformer #873
- Added ability to evaluate additional pipelines for automl search #874
- Added default_parameters class property to components and pipelines #879
- Added better support for disabling data checks in automl search #892
- Added ability to save and load AutoML objects to file #888
- Updated AutoSearchBase.get_pipelines to return an untrained pipeline instance #876
- Saved learned binary classification thresholds in automl results cv data dict #876
Fixes
- Fixed bug where SimpleImputer cannot handle dropped columns #846
- Fixed bug where PerColumnImputer cannot handle dropped columns #855
- Enforce requirement that builtin components save all inputted values in their parameters dict #847
- Don’t list base classes in all_components output #847
- Standardize all components to output pandas data structures, and accept either pandas or numpy #853
- Fixed rankings and full_rankings error when search has not been run #894
Changes
- Update all_pipelines and all_components to try initializing pipelines/components, and on failure exclude them #849
- Refactor handle_components to handle_components_class, standardize to ComponentBase subclass instead of instance #850
- Refactor “blacklist”/”whitelist” to “allow”/”exclude” lists #854
- Replaced AutoClassificationSearch and AutoRegressionSearch with AutoMLSearch #871
- Renamed feature_importances and permutation_importances methods to use singular names (feature_importance and permutation_importance) #883
- Updated automl default data splitter to train/validation split for large datasets #877
- Added open source license, update some repo metadata #887
Documentation Changes
- Fix some typos and update the EvalML logo #872
Testing Changes
- Update the changelog check job to expect the new branching pattern for the deps update bot #836
- Check that all components output pandas datastructures, and can accept either pandas or numpy #853
- Replaced AutoClassificationSearch and AutoRegressionSearch with AutoMLSearch #871

Warning

Breaking Changes

Pipelines’ static component_graph field must contain either ComponentBase subclasses or str, instead of ComponentBase subclass instances #850
Rename handle_component to handle_component_class. Now standardizes to ComponentBase subclasses instead of ComponentBase subclass instances #850
Renamed automl’s cv argument to data_split #877
Pipelines’ and classifiers’ feature_importances is renamed feature_importance, graph_feature_importances is renamed graph_feature_importance #883
Passing data_checks=None to automl search will not perform any data checks as opposed to default checks. #892
Pipelines to search for in AutoML are now determined automatically, rather than using the statically-defined pipeline classes. #870
Updated AutoSearchBase.get_pipelines to return an untrained pipeline instance, instead of one which happened to be trained on the final cross-validation fold #876

v0.10.0 May 29, 2020

Enhancements
- Added baseline models for classification and regression, add functionality to calculate baseline models before searching in AutoML #746
- Port over highly-null guardrail as a data check and define DefaultDataChecks and DisableDataChecks classes #745
- Update Tuner classes to work directly with pipeline parameters dicts instead of flat parameter lists #779
- Add Elastic Net as a pipeline option #812
- Added new Pipeline option ExtraTrees #790
- Added precicion-recall curve metrics and plot for binary classification problems in evalml.pipeline.graph_utils #794
- Update the default automl algorithm to search in batches, starting with default parameters for each pipeline and iterating from there #793
- Added AutoMLAlgorithm class and IterativeAlgorithm impl, separated from AutoSearchBase #793
Fixes
- Update pipeline score to return nan score for any objective which throws an exception during scoring #787
- Fixed bug introduced in #787 where binary classification metrics requiring predicted probabilities error in scoring #798
- CatBoost and XGBoost classifiers and regressors can no longer have a learning rate of 0 #795
Changes
- Cleanup pipeline score code, and cleanup codecov #711
- Remove pass for abstract methods for codecov #730
- Added __str__ for AutoSearch object #675
- Add util methods to graph ROC and confusion matrix #720
- Refactor AutoBase to AutoSearchBase #758
- Updated AutoBase with data_checks parameter, removed previous detect_label_leakage parameter, and added functionality to run data checks before search in AutoML #765
- Updated our logger to use Python’s logging utils #763
- Refactor most of AutoSearchBase._do_iteration impl into AutoSearchBase._evaluate #762
- Port over all guardrails to use the new DataCheck API #789
- Expanded import_or_raise to catch all exceptions #759
- Adds RMSE, MSLE, RMSLE as standard metrics #788
- Don’t allow Recall to be used as an objective for AutoML #784
- Removed feature selection from pipelines #819
- Update default estimator parameters to make automl search faster and more accurate #793
Documentation Changes
- Add instructions to freeze master on release.md #726
- Update release instructions with more details #727 #733
- Add objective base classes to API reference #736
- Fix components API to match other modules #747
Testing Changes
- Delete codecov yml, use codecov.io’s default #732
- Added unit tests for fraud cost, lead scoring, and standard metric objectives #741
- Update codecov client #782
- Updated AutoBase __str__ test to include no parameters case #783
- Added unit tests for ExtraTrees pipeline #790
- If codecov fails to upload, fail build #810
- Updated Python version of dependency action #816
- Update the dependency update bot to use a suffix when creating branches #817

Warning

Breaking Changes

The detect_label_leakage parameter for AutoML classes has been removed and replaced by a data_checks parameter #765
Moved ROC and confusion matrix methods from evalml.pipeline.plot_utils to evalml.pipeline.graph_utils #720
Tuner classes require a pipeline hyperparameter range dict as an init arg instead of a space definition #779
Tuner.propose and Tuner.add work directly with pipeline parameters dicts instead of flat parameter lists #779
PipelineBase.hyperparameters and custom_hyperparameters use pipeline parameters dict format instead of being represented as a flat list #779
All guardrail functions previously under evalml.guardrails.utils will be removed and replaced by data checks #789
Recall disallowed as an objective for AutoML #784
AutoSearchBase parameter tuner has been renamed to tuner_class #793
AutoSearchBase parameter possible_pipelines and possible_model_families have been renamed to allowed_pipelines and allowed_model_families #793

v0.9.0 Apr. 27, 2020

Enhancements
- Added accuracy as an standard objective #624
- Added verbose parameter to load_fraud #560
- Added Balanced Accuracy metric for binary, multiclass #612 #661
- Added XGBoost regressor and XGBoost regression pipeline #666
- Added Accuracy metric for multiclass #672
- Added objective name in AutoBase.describe_pipeline #686
- Added DataCheck and DataChecks, Message classes and relevant subclasses #739
Fixes
- Removed direct access to cls.component_graph #595
- Add testing files to .gitignore #625
- Remove circular dependencies from Makefile #637
- Add error case for normalize_confusion_matrix() #640
- Fixed XGBoostClassifier and XGBoostRegressor bug with feature names that contain [, ], or < #659
- Update make_pipeline_graph to not accidentally create empty file when testing if path is valid #649
- Fix pip installation warning about docsutils version, from boto dependency #664
- Removed zero division warning for F1/precision/recall metrics #671
- Fixed summary for pipelines without estimators #707
Changes
- Updated default objective for binary/multiseries classification to log loss #613
- Created classification and regression pipeline subclasses and removed objective as an attribute of pipeline classes #405
- Changed the output of score to return one dictionary #429
- Created binary and multiclass objective subclasses #504
- Updated objectives API #445
- Removed call to get_plot_data from AutoML #615
- Set raise_error to default to True for AutoML classes #638
- Remove unnecessary “u” prefixes on some unicode strings #641
- Changed one-hot encoder to return uint8 dtypes instead of ints #653
- Pipeline _name field changed to custom_name #650
- Removed graphs.py and moved methods into PipelineBase #657, #665
- Remove s3fs as a dev dependency #664
- Changed requirements-parser to be a core dependency #673
- Replace supported_problem_types field on pipelines with problem_type attribute on base classes #678
- Changed AutoML to only show best results for a given pipeline template in rankings, added full_rankings property to show all #682
- Update ModelFamily values: don’t list xgboost/catboost as classifiers now that we have regression pipelines for them #677
- Changed AutoML’s describe_pipeline to get problem type from pipeline instead #685
- Standardize import_or_raise error messages #683
- Updated argument order of objectives to align with sklearn’s #698
- Renamed pipeline.feature_importance_graph to pipeline.graph_feature_importances #700
- Moved ROC and confusion matrix methods to evalml.pipelines.plot_utils #704
- Renamed MultiClassificationObjective to MulticlassClassificationObjective, to align with pipeline naming scheme #715
Documentation Changes
- Fixed some sphinx warnings #593
- Fixed docstring for AutoClassificationSearch with correct command #599
- Limit readthedocs formats to pdf, not htmlzip and epub #594 #600
- Clean up objectives API documentation #605
- Fixed function on Exploring search results page #604
- Update release process doc #567
- AutoClassificationSearch and AutoRegressionSearch show inherited methods in API reference #651
- Fixed improperly formatted code in breaking changes for changelog #655
- Added configuration to treat Sphinx warnings as errors #660
- Removed separate plotting section for pipelines in API reference #657, #665
- Have leads example notebook load S3 files using https, so we can delete s3fs dev dependency #664
- Categorized components in API reference and added descriptions for each category #663
- Fixed Sphinx warnings about BalancedAccuracy objective #669
- Updated API reference to include missing components and clean up pipeline docstrings #689
- Reorganize API ref, and clarify pipeline sub-titles #688
- Add and update preprocessing utils in API reference #687
- Added inheritance diagrams to API reference #695
- Documented which default objective AutoML optimizes for #699
- Create seperate install page #701
- Include more utils in API ref, like import_or_raise #704
- Add more color to pipeline documentation #705
Testing Changes
- Matched install commands of check_latest_dependencies test and it’s GitHub action #578
- Added Github app to auto assign PR author as assignee #477
- Removed unneeded conda installation of xgboost in windows checkin tests #618
- Update graph tests to always use tmpfile dir #649
- Changelog checkin test workaround for release PRs: If ‘future release’ section is empty of PR refs, pass check #658
- Add changelog checkin test exception for dep-update branch #723

Warning

Breaking Changes

Pipelines will now no longer take an objective parameter during instantiation, and will no longer have an objective attribute.
fit() and predict() now use an optional objective parameter, which is only used in binary classification pipelines to fit for a specific objective.
score() will now use a required objectives parameter that is used to determine all the objectives to score on. This differs from the previous behavior, where the pipeline’s objective was scored on regardless.
score() will now return one dictionary of all objective scores.
ROC and ConfusionMatrix plot methods via Auto(*).plot have been removed by #615 and are replaced by roc_curve and confusion_matrix in evamlm.pipelines.plot_utils` in #704
normalize_confusion_matrix has been moved to evalml.pipelines.plot_utils #704
Pipelines _name field changed to custom_name
Pipelines supported_problem_types field is removed because it is no longer necessary #678
Updated argument order of objectives’ objective_function to align with sklearn #698
pipeline.feature_importance_graph has been renamed to pipeline.graph_feature_importances in #700
Removed unsupported MSLE objective #704

v0.8.0 Apr. 1, 2020

Enhancements
- Add normalization option and information to confusion matrix #484
- Add util function to drop rows with NaN values #487
- Renamed PipelineBase.name as PipelineBase.summary and redefined PipelineBase.name as class property #491
- Added access to parameters in Pipelines with PipelineBase.parameters (used to be return of PipelineBase.describe) #501
- Added fill_value parameter for SimpleImputer #509
- Added functionality to override component hyperparameters and made pipelines take hyperparemeters from components #516
- Allow numpy.random.RandomState for random_state parameters #556
Fixes
- Removed unused dependency matplotlib, and move category_encoders to test reqs #572
Changes
- Undo version cap in XGBoost placed in #402 and allowed all released of XGBoost #407
- Support pandas 1.0.0 #486
- Made all references to the logger static #503
- Refactored model_type parameter for components and pipelines to model_family #507
- Refactored problem_types for pipelines and components into supported_problem_types #515
- Moved pipelines/utils.save_pipeline and pipelines/utils.load_pipeline to PipelineBase.save and PipelineBase.load #526
- Limit number of categories encoded by OneHotEncoder #517
Documentation Changes
- Updated API reference to remove PipelinePlot and added moved PipelineBase plotting methods #483
- Add code style and github issue guides #463 #512
- Updated API reference for to surface class variables for pipelines and components #537
- Fixed README documentation link #535
- Unhid PR references in changelog #656
Testing Changes
- Added automated dependency check PR #482, #505
- Updated automated dependency check comment #497
- Have build_docs job use python executor, so that env vars are set properly #547
- Added simple test to make sure OneHotEncoder’s top_n works with large number of categories #552
- Run windows unit tests on PRs #557

Warning

Breaking Changes

AutoClassificationSearch and AutoRegressionSearch’s model_types parameter has been refactored into allowed_model_families
ModelTypes enum has been changed to ModelFamily
Components and Pipelines now have a model_family field instead of model_type
get_pipelines utility function now accepts model_families as an argument instead of model_types
PipelineBase.name no longer returns structure of pipeline and has been replaced by PipelineBase.summary
PipelineBase.problem_types and Estimator.problem_types has been renamed to supported_problem_types
pipelines/utils.save_pipeline and pipelines/utils.load_pipeline moved to PipelineBase.save and PipelineBase.load

v0.7.0 Mar. 9, 2020

Enhancements
- Added emacs buffers to .gitignore #350
- Add CatBoost (gradient-boosted trees) classification and regression components and pipelines #247
- Added Tuner abstract base class #351
- Added n_jobs as parameter for AutoClassificationSearch and AutoRegressionSearch #403
- Changed colors of confusion matrix to shades of blue and updated axis order to match scikit-learn’s #426
- Added PipelineBase graph and feature_importance_graph methods, moved from previous location #423
- Added support for python 3.8 #462
Fixes
- Fixed ROC and confusion matrix plots not being calculated if user passed own additional_objectives #276
- Fixed ReadtheDocs FileNotFoundError exception for fraud dataset #439
Changes
- Added n_estimators as a tunable parameter for XGBoost #307
- Remove unused parameter ObjectiveBase.fit_needs_proba #320
- Remove extraneous parameter component_type from all components #361
- Remove unused rankings.csv file #397
- Downloaded demo and test datasets so unit tests can run offline #408
- Remove _needs_fitting attribute from Components #398
- Changed plot.feature_importance to show only non-zero feature importances by default, added optional parameter to show all #413
- Refactored PipelineBase to take in parameter dictionary and moved pipeline metadata to class attribute #421
- Dropped support for Python 3.5 #438
- Removed unused apply.py file #449
- Clean up requirements.txt to remove unused deps #451
- Support installation without all required dependencies #459
Documentation Changes
- Update release.md with instructions to release to internal license key #354
Testing Changes
- Added tests for utils (and moved current utils to gen_utils) #297
- Moved XGBoost install into it’s own separate step on Windows using Conda #313
- Rewind pandas version to before 1.0.0, to diagnose test failures for that version #325
- Added dependency update checkin test #324
- Rewind XGBoost version to before 1.0.0 to diagnose test failures for that version #402
- Update dependency check to use a whitelist #417
- Update unit test jobs to not install dev deps #455

Warning

Breaking Changes

Python 3.5 will not be actively supported.

v0.6.0 Dec. 16, 2019

Enhancements
- Added ability to create a plot of feature importances #133
- Add early stopping to AutoML using patience and tolerance parameters #241
- Added ROC and confusion matrix metrics and plot for classification problems and introduce PipelineSearchPlots class #242
- Enhanced AutoML results with search order #260
- Added utility function to show system and environment information #300
Fixes
- Lower botocore requirement #235
- Fixed decision_function calculation for FraudCost objective #254
- Fixed return value of Recall metrics #264
- Components return self on fit #289
Changes
- Renamed automl classes to AutoRegressionSearch and AutoClassificationSearch #287
- Updating demo datasets to retain column names #223
- Moving pipeline visualization to PipelinePlots class #228
- Standarizing inputs as pd.Dataframe / pd.Series #130
- Enforcing that pipelines must have an estimator as last component #277
- Added ipywidgets as a dependency in requirements.txt #278
- Added Random and Grid Search Tuners #240
Documentation Changes
- Adding class properties to API reference #244
- Fix and filter FutureWarnings from scikit-learn #249, #257
- Adding Linear Regression to API reference and cleaning up some Sphinx warnings #227
Testing Changes
- Added support for testing on Windows with CircleCI #226
- Added support for doctests #233

Warning

Breaking Changes

The fit() method for AutoClassifier and AutoRegressor has been renamed to search().
AutoClassifier has been renamed to AutoClassificationSearch
AutoRegressor has been renamed to AutoRegressionSearch
AutoClassificationSearch.results and AutoRegressionSearch.results now is a dictionary with pipeline_results and search_order keys. pipeline_results can be used to access a dictionary that is identical to the old .results dictionary. Whereas, search_order returns a list of the search order in terms of pipeline_id.
Pipelines now require an estimator as the last component in component_list. Slicing pipelines now throws an NotImplementedError to avoid returning pipelines without an estimator.

v0.5.2 Nov. 18, 2019

Enhancements
- Adding basic pipeline structure visualization #211
Documentation Changes
- Added notebooks to build process #212

v0.5.1 Nov. 15, 2019

Enhancements
- Added basic outlier detection guardrail #151
- Added basic ID column guardrail #135
- Added support for unlimited pipelines with a max_time limit #70
- Updated .readthedocs.yaml to successfully build #188
Fixes
- Removed MSLE from default additional objectives #203
- Fixed random_state passed in pipelines #204
- Fixed slow down in RFRegressor #206
Changes
- Pulled information for describe_pipeline from pipeline’s new describe method #190
- Refactored pipelines #108
- Removed guardrails from Auto(*) #202, #208
Documentation Changes
- Updated documentation to show max_time enhancements #189
- Updated release instructions for RTD #193
- Added notebooks to build process #212
- Added contributing instructions #213
- Added new content #222

v0.5.0 Oct. 29, 2019

Enhancements
- Added basic one hot encoding #73
- Use enums for model_type #110
- Support for splitting regression datasets #112
- Auto-infer multiclass classification #99
- Added support for other units in max_time #125
- Detect highly null columns #121
- Added additional regression objectives #100
- Show an interactive iteration vs. score plot when using fit() #134
Fixes
- Reordered describe_pipeline #94
- Added type check for model_type #109
- Fixed s units when setting string max_time #132
- Fix objectives not appearing in API documentation #150
Changes
- Reorganized tests #93
- Moved logging to its own module #119
- Show progress bar history #111
- Using cloudpickle instead of pickle to allow unloading of custom objectives #113
- Removed render.py #154
Documentation Changes
- Update release instructions #140
- Include additional_objectives parameter #124
- Added Changelog #136
Testing Changes
- Code coverage #90
- Added CircleCI tests for other Python versions #104
- Added doc notebooks as tests #139
- Test metadata for CircleCI and 2 core parallelism #137

v0.4.1 Sep. 16, 2019

Enhancements
- Added AutoML for classification and regressor using Autobase and Skopt #7 #9
- Implemented standard classification and regression metrics #7
- Added logistic regression, random forest, and XGBoost pipelines #7
- Implemented support for custom objectives #15
- Feature importance for pipelines #18
- Serialization for pipelines #19
- Allow fitting on objectives for optimal threshold #27
- Added detect label leakage #31
- Implemented callbacks #42
- Allow for multiclass classification #21
- Added support for additional objectives #79
Fixes
- Fixed feature selection in pipelines #13
- Made random_seed usage consistent #45
Documentation Changes
- Documentation Changes
- Added docstrings #6
- Created notebooks for docs #6
- Initialized readthedocs EvalML #6
- Added favicon #38
Testing Changes
- Added testing for loading data #39

v0.2.0 Aug. 13, 2019

Enhancements
- Created fraud detection objective #4

v0.1.0 July. 31, 2019

First Release
Enhancements
- Added lead scoring objecitve #1
- Added basic classifier #1
Documentation Changes
- Initialized Sphinx for docs #1