Enhancements
Fixes
Changes
Documentation Changes
Testing Changes
Updated pipelines and make_pipeline to accept Woodwork inputs #1393
make_pipeline
Woodwork
Updated components to accept Woodwork inputs #1423
Added ability to freeze hyperparameters for AutoMLSearch #1284
AutoMLSearch
Added Target Encoder into transformer components #1401
Target Encoder
Added callback for error handling in AutoMLSearch #1403
Added the index id to the explain_predictions_best_worst output to help users identify which rows in their data are included #1365
explain_predictions_best_worst
The top_k features displayed in explain_predictions_* functions are now determined by the magnitude of shap values as opposed to the top_k largest and smallest shap values. #1374
explain_predictions_*
top_k
Added a problem type for time series regression #1386
Added a is_defined_for_problem_type method to ObjectiveBase #1386
is_defined_for_problem_type
ObjectiveBase
Added a random_state parameter to make_pipeline_from_components function #1411
random_state
make_pipeline_from_components
Added DelayedFeaturesTransformer #1396
DelayedFeaturesTransformer
Added a TimeSeriesRegressionPipeline class #1418
TimeSeriesRegressionPipeline
Removed core-requirements.txt from the package distribution #1429
core-requirements.txt
Updated data check messages to include a “code” and “details” fields #1451, #1462
Added a TimeSeriesSplit data splitter for time series problems #1441
TimeSeriesSplit
Added a problem_configuration parameter to AutoMLSearch #1457
problem_configuration
Fixed IndexError raised in AutoMLSearch when ensembling = True but only one pipeline to iterate over #1397
IndexError
ensembling = True
Fixed stacked ensemble input bug and LightGBM warning and bug in AutoMLSearch #1388
Updated enum classes to show possible enum values as attributes #1391
Updated calls to Woodwork’s to_pandas() to to_series() and to_dataframe() #1428
to_pandas()
to_series()
to_dataframe()
Fixed bug in OHE where column names were not guaranteed to be unique #1349
Fixed bug with percent improvement of ExpVariance objective on data with highly skewed target #1467
ExpVariance
Changed OutliersDataCheck to return the list of columns, rather than rows, that contain outliers #1377
OutliersDataCheck
Simplified and cleaned output for Code Generation #1371
Reverted changes from #1337 #1409
Updated data checks to return dictionary of warnings and errors instead of a list #1448
Updated AutoMLSearch to pass Woodwork data structures to every pipeline (instead of pandas DataFrames) #1450
Update AutoMLSearch to default to max_batches=1 instead of max_iterations=5 #1452
max_batches=1
max_iterations=5
Added description of CLA to contributing guide, updated description of draft PRs #1402
Updated documentation to include all data checks, DataChecks, and usage of data checks in AutoML #1412
DataChecks
Updated docstrings from np.array to np.ndarray #1417
np.array
np.ndarray
Added section on stacking ensembles in AutoMLSearch documentation #1425
Removed category_encoders from test-requirements.txt #1373
category_encoders
Tweak codecov.io settings again to avoid flakes #1413
Modified make lint to check notebook versions in the docs #1431
make lint
Modified make lint-fix to standardize notebook versions in the docs #1431
make lint-fix
Use new version of pull request Github Action for dependency check (#1443)
Reduced number of workers for tests to 4 #1447
Warning
The top_k and top_k_features parameters in explain_predictions_* functions now return k features as opposed to 2 * k features #1374
top_k_features
k
2 * k
Renamed problem_type to problem_types in RegressionObjective, BinaryClassificationObjective, and MulticlassClassificationObjective #1319
problem_type
problem_types
RegressionObjective
BinaryClassificationObjective
MulticlassClassificationObjective
Data checks now return a dictionary of warnings and errors instead of a list #1448
Added stacked ensemble component classes (StackedEnsembleClassifier, StackedEnsembleRegressor) #1134
StackedEnsembleClassifier
StackedEnsembleRegressor
Added stacked ensemble components to AutoMLSearch #1253
Added DecisionTreeClassifier and DecisionTreeRegressor to AutoML #1255
DecisionTreeClassifier
DecisionTreeRegressor
Added graph_prediction_vs_actual in model_understanding for regression problems #1252
graph_prediction_vs_actual
model_understanding
Added parameter to OneHotEncoder to enable filtering for features to encode for #1249
OneHotEncoder
Added percent-better-than-baseline for all objectives to automl.results #1244
Added HighVarianceCVDataCheck and replaced synonymous warning in AutoMLSearch #1254
HighVarianceCVDataCheck
Added PCA Transformer component for dimensionality reduction #1270
Added generate_pipeline_code and generate_component_code to allow for code generation given a pipeline or component instance #1306
generate_pipeline_code
generate_component_code
PCA Transformer
Updated AutoMLSearch to support Woodwork data structures #1299
Added cv_folds to ClassImbalanceDataCheck and added this check to DefaultDataChecks #1333
ClassImbalanceDataCheck
DefaultDataChecks
Make max_batches argument to AutoMLSearch.search public #1320
max_batches
AutoMLSearch.search
Added text support to automl search #1062
Added _pipelines_per_batch as a private argument to AutoMLSearch #1355
_pipelines_per_batch
Fixed ML performance issue with ordered datasets: always shuffle data in automl’s default CV splits #1265
Fixed broken evalml info CLI command #1293
evalml info
Fixed boosting type='rf' for LightGBM Classifier, as well as num_leaves error #1302
boosting type='rf'
num_leaves
Fixed bug in explain_predictions_best_worst where a custom index in the target variable would cause a ValueError #1318
ValueError
Added stacked ensemble estimators to to evalml.pipelines.__init__ file #1326
evalml.pipelines.__init__
Fixed bug in OHE where calls to transform were not deterministic if top_n was less than the number of categories in a column #1324
top_n
Fixed LightGBM warning messages during AutoMLSearch #1342
Fix warnings thrown during AutoMLSearch in HighVarianceCVDataCheck #1346
Fixed bug where TrainingValidationSplit would return invalid location indices for dataframes with a custom index #1348
Fixed bug where the AutoMLSearch random_state was not being passed to the created pipelines #1321
Allow add_to_rankings to be called before AutoMLSearch is called #1250
add_to_rankings
Removed Graphviz from test-requirements to add to requirements.txt #1327
Removed max_pipelines parameter from AutoMLSearch #1264
max_pipelines
Include editable installs in all install make targets #1335
Made pip dependencies featuretools and nlp_primitives core dependencies #1062
Removed PartOfSpeechCount from TextFeaturizer transform primitives #1062
Added warning for partial_dependency when the feature includes null values #1352
partial_dependency
Fixed and updated code blocks in Release Notes #1243
Added DecisionTree estimators to API Reference #1246
Changed class inheritance display to flow vertically #1248
Updated cost-benefit tutorial to use a holdout/test set #1159
Added evalml info command to documentation #1293
Miscellaneous doc updates #1269
Removed conda pre-release testing from the release process document #1282
Updates to contributing guide #1310
Added Alteryx footer to docs with Twitter and Github link #1312
Added documentation for evalml installation for Python 3.6 #1322
Added documentation changes to make the API Docs easier to understand #1323
Fixed documentation for feature_importance #1353
feature_importance
Added tutorial for running AutoML with text data #1357
Added documentation for woodwork integration with automl search #1361
Added tests for jupyter_check to handle IPython #1256
jupyter_check
Cleaned up make_pipeline tests to test for all estimators #1257
Added a test to check conda build after merge to main #1247
Removed code that was lacking codecov for __main__.py and unnecessary #1293
__main__.py
Codecov: round coverage up instead of down #1334
Add DockerHub credentials to CI testing environment #1356
Add DockerHub credentials to conda testing environment #1363
Renamed LabelLeakageDataCheck to TargetLeakageDataCheck #1319
LabelLeakageDataCheck
TargetLeakageDataCheck
max_pipelines parameter has been removed from AutoMLSearch. Please use max_iterations instead. #1264
max_iterations
AutoMLSearch.search() will now log a warning if the input is not a Woodwork data structure (pandas, numpy) #1299
AutoMLSearch.search()
pandas
numpy
Removed unused argument feature_types from AutoMLSearch.search #1062
Updated partial dependence methods to support calculating numeric columns in a dataset with non-numeric columns #1150
Added get_feature_names on OneHotEncoder #1193
get_feature_names
Added detect_problem_type to problem_type/utils.py to automatically detect the problem type given targets #1194
detect_problem_type
problem_type/utils.py
Added LightGBM to AutoMLSearch #1199
Updated scikit-learn and scikit-optimize to use latest versions - 0.23.2 and 0.8.1 respectively #1141
scikit-learn
scikit-optimize
Added __str__ and __repr__ for pipelines and components #1218
__str__
__repr__
Included internal target check for both training and validation data in AutoMLSearch #1226
Added ProblemTypes.all_problem_types helper to get list of supported problem types #1219
ProblemTypes.all_problem_types
Added DecisionTreeClassifier and DecisionTreeRegressor classes #1223
DataChecks can now be parametrized by passing a list of DataCheck classes and a parameter dictionary #1167
DataCheck
Added first CV fold score as validation score in AutoMLSearch.rankings #1221
AutoMLSearch.rankings
Updated flake8 configuration to enable linting on __init__.py files #1234
flake8
__init__.py
Refined make_pipeline_from_components implementation #1204
Updated GitHub URL after migration to Alteryx GitHub org #1207
Changed Problem Type enum to be more similar to the string name #1208
Wrapped call to scikit-learn’s partial dependence method in a try/finally block #1232
try
finally
Added allow_writing_files as a named argument to CatBoost estimators. #1202
allow_writing_files
Added solver and multi_class as named arguments to LogisticRegressionClassifier #1202
solver
multi_class
LogisticRegressionClassifier
Replaced pipeline’s ._transform method to evaluate all the preprocessing steps of a pipeline with .compute_estimator_features #1231
._transform
.compute_estimator_features
Changed default large dataset train/test splitting behavior #1205
Included description of how to access the component instances and features for pipeline user guide #1163
Updated API docs to refer to target as “target” instead of “labels” for non-classification tasks and minor docs cleanup #1160
Added Class Imbalance Data Check to api_reference.rst #1190 #1200
api_reference.rst
Added pipeline properties to API reference #1209
Clarified what the objective parameter in AutoML is used for in AutoML API reference and AutoML user guide #1222
Updated API docs to include skopt.space.Categorical option for component hyperparameter range definition #1228
skopt.space.Categorical
Added install documentation for libomp in order to use LightGBM on Mac #1233
libomp
Improved description of max_iterations in documentation #1212
Removed unused code from sphinx conf #1235
DefaultDataChecks now accepts a problem_type parameter that must be specified #1167
Pipeline’s ._transform method to evaluate all the preprocessing steps of a pipeline has been replaced with .compute_estimator_features #1231
get_objectives has been renamed to get_core_objectives. This function will now return a list of valid objective instances #1230
get_objectives
get_core_objectives
Added output_format field to explain predictions functions #1107
output_format
Modified get_objective and get_objectives to be able to return any objective in evalml.objectives #1132
get_objective
evalml.objectives
Added a return_instance boolean parameter to get_objective #1132
return_instance
Added ClassImbalanceDataCheck to determine whether target imbalance falls below a given threshold #1135
Added label encoder to LightGBM for binary classification #1152
Added labels for the row index of confusion matrix #1154
Added AutoMLSearch object as another parameter in search callbacks #1156
Added the corresponding probability threshold for each point displayed in graph_roc_curve #1161
graph_roc_curve
Added __eq__ for ComponentBase and PipelineBase #1178
__eq__
ComponentBase
PipelineBase
Added support for multiclass classification for roc_curve #1164
roc_curve
Added categories accessor to OneHotEncoder for listing the categories associated with a feature #1182
categories
Added utility function to create pipeline instances from a list of component instances #1176
Fixed XGBoost column names for partial dependence methods #1104
Removed dead code validating column type from TextFeaturizer #1122
TextFeaturizer
Fixed issue where Imputer cannot fit when there is None in a categorical or boolean column #1144
Imputer
OneHotEncoder preserves the custom index in the input data #1146
Fixed representation for ModelFamily #1165
ModelFamily
Removed duplicate nbsphinx dependency in dev-requirements.txt #1168
nbsphinx
dev-requirements.txt
Users can now pass in any valid kwargs to all estimators #1157
Remove broken accessor OneHotEncoder.get_feature_names and unneeded base class #1179
OneHotEncoder.get_feature_names
Removed LightGBM Estimator from AutoML models #1186
Pinned scikit-optimize version to 0.7.4 #1136
Removed tqdm as a dependency #1177
tqdm
Added lightgbm version 3.0.0 to latest_dependency_versions.txt #1185
latest_dependency_versions.txt
Rename max_pipelines to max_iterations #1169
Fixed API docs for AutoMLSearch add_result_callback #1113
add_result_callback
Added a step to our release process for pushing our latest version to conda-forge #1118
Added warning for missing ipywidgets dependency for using PipelineSearchPlots on Jupyterlab #1145
PipelineSearchPlots
Updated README.md example to load demo dataset #1151
README.md
Swapped mapping of breast cancer targets in model_understanding.ipynb #1170
model_understanding.ipynb
Added test confirming TextFeaturizer never outputs null values #1122
Changed Python version of Update Dependencies action to 3.8.x #1137
Update Dependencies
Fixed release notes check-in test for Update Dependencies actions #1172
get_objective will now return a class definition rather than an instance by default #1132
Deleted OPTIONS dictionary in evalml.objectives.utils.py #1132
OPTIONS
evalml.objectives.utils.py
If specifying an objective by string, the string must now match the objective’s name field, case-insensitive #1132
“Recall”, “Recall Macro”, “Recall Micro”, “Recall Weighted”, or “Root Mean Squared Log Error” to AutoMLSearch will now result in a ValueError rather than an ObjectiveNotFoundError #1132
ObjectiveNotFoundError
Search callbacks start_iteration_callback and add_results_callback have changed to include a copy of the AutoMLSearch object as a third parameter #1156
start_iteration_callback
add_results_callback
Deleted OneHotEncoder.get_feature_names method which had been broken for a while, in favor of pipelines’ input_feature_names #1179
input_feature_names
Deleted empty base class CategoricalEncoder which OneHotEncoder component was inheriting from #1176
CategoricalEncoder
Results from roc_curve will now return as a list of dictionaries with each dictionary representing a class #1164
max_pipelines now raises a DeprecationWarning and will be removed in the next release. max_iterations should be used instead. #1169
DeprecationWarning
Added Cost-Benefit Matrix objective for binary classification #1038
Split fill_value into categorical_fill_value and numeric_fill_value for Imputer #1019
fill_value
categorical_fill_value
numeric_fill_value
Added explain_predictions and explain_predictions_best_worst for explaining multiple predictions with SHAP #1016
explain_predictions
Added new LSA component for text featurization #1022
Added guide on installing with conda #1041
Added a “cost-benefit curve” util method to graph cost-benefit matrix scores vs. binary classification thresholds #1081
Standardized error when calling transform/predict before fit for pipelines #1048
Added percent_better_than_baseline to AutoML search rankings and full rankings table #1050
percent_better_than_baseline
Added one-way partial dependence and partial dependence plots #1079
Added “Feature Value” column to prediction explanation reports. #1064
Added LightGBM classification estimator #1082, #1114
Added max_batches parameter to AutoMLSearch #1087
Updated TextFeaturizer component to no longer require an internet connection to run #1022
Fixed non-deterministic element of TextFeaturizer transformations #1022
Added a StandardScaler to all ElasticNet pipelines #1065
Updated cost-benefit matrix to normalize score #1099
Fixed logic in calculate_percent_difference so that it can handle negative values #1100
calculate_percent_difference
Added needs_fitting property to ComponentBase #1044
needs_fitting
Updated references to data types to use datatype lists defined in evalml.utils.gen_utils #1039
evalml.utils.gen_utils
Remove maximum version limit for SciPy dependency #1051
Moved all_components and other component importers into runtime methods #1045
all_components
Consolidated graphing utility methods under evalml.utils.graph_utils #1060
evalml.utils.graph_utils
Made slight tweaks to how TextFeaturizer uses featuretools, and did some refactoring of that and of LSA #1090
featuretools
Changed show_all_features parameter into importance_threshold, which allows for thresholding feature importance #1097, #1103
show_all_features
importance_threshold
Update setup.py URL to point to the github repo #1037
setup.py
Added tutorial for using the cost-benefit matrix objective #1088
Updated model_understanding.ipynb to include documentation for using plotly on Jupyter Lab #1108
Refactor CircleCI tests to use matrix jobs (#1043)
Added a test to check that all test directories are included in evalml package #1054
confusion_matrix and normalize_confusion_matrix have been moved to evalml.utils #1038
confusion_matrix
normalize_confusion_matrix
evalml.utils
All graph utility methods previously under evalml.pipelines.graph_utils have been moved to evalml.utils.graph_utils #1060
evalml.pipelines.graph_utils
Add save/load method to components #1023
Expose pickle protocol as optional arg to save/load #1023
protocol
Updated estimators used in AutoML to include ExtraTrees and ElasticNet estimators #1030
Removed DeprecationWarning for SimpleImputer #1018
SimpleImputer
Add note about version numbers to release process docs #1034
Test files are now included in the evalml package #1029
Added string and categorical targets support for binary and multiclass pipelines and check for numeric targets for DetectLabelLeakage data check #932
DetectLabelLeakage
Added clear exception for regression pipelines if target datatype is string or categorical #960
Added target column names and class labels in predict and predict_proba output for pipelines #951
predict
predict_proba
Added _compute_shap_values and normalize_values to pipelines/explanations module #958
_compute_shap_values
normalize_values
pipelines/explanations
Added explain_prediction feature which explains single predictions with SHAP #974
explain_prediction
Added Imputer to allow different imputation strategies for numerical and categorical dtypes #991
Added support for configuring logfile path using env var, and don’t create logger if there are filesystem errors #975
Updated catboost estimators’ default parameters and automl hyperparameter ranges to speed up fit time #998
Fixed ReadtheDocs warning failure regarding embedded gif #943
Removed incorrect parameter passed to pipeline classes in _add_baseline_pipelines #941
_add_baseline_pipelines
Added universal error for calling predict, predict_proba, transform, and feature_importances before fitting #969, #994
transform
feature_importances
Made TextFeaturizer component and pip dependencies featuretools and nlp_primitives optional #976
nlp_primitives
Updated imputation strategy in automl to no longer limit impute strategy to most_frequent for all features if there are any categorical columns #991
most_frequent
Fixed UnboundLocalError for cv_pipeline when automl search errors #996
UnboundLocalError
cv_pipeline
Fixed Imputer to reset dataframe index to preserve behavior expected from SimpleImputer #1009
Moved get_estimators to evalml.pipelines.components.utils #934
get_estimators
evalml.pipelines.components.utils
Modified Pipelines to raise PipelineScoreError when they encounter an error during scoring #936
PipelineScoreError
Moved evalml.model_families.list_model_families to evalml.pipelines.components.allowed_model_families #959
evalml.model_families.list_model_families
evalml.pipelines.components.allowed_model_families
Renamed DateTimeFeaturization to DateTimeFeaturizer #977
DateTimeFeaturization
DateTimeFeaturizer
Added check to stop search and raise an error if all pipelines in a batch return NaN scores #1015
Updated README.md #963
Reworded message when errors are returned from data checks in search #982
Added section on understanding model predictions with explain_prediction to User Guide #981
Added a section to the user guide and api reference about how XGBoost and CatBoost are not fully supported. #992
Added custom components section in user guide #993
Updated FAQ section formatting #997
Updated release process documentation #1003
Moved predict_proba and predict tests regarding string / categorical targets to test_pipelines.py #972
test_pipelines.py
Fixed dependency update bot by updating python version to 3.7 to avoid frequent github version updates #1002
get_estimators has been moved to evalml.pipelines.components.utils (previously was under evalml.pipelines.utils) #934
evalml.pipelines.utils
Removed the raise_errors flag in AutoML search. All errors during pipeline evaluation will be caught and logged. #936
raise_errors
evalml.model_families.list_model_families has been moved to evalml.pipelines.components.allowed_model_families #959
TextFeaturizer: the featuretools and nlp_primitives packages must be installed after installing evalml in order to use this component #976
Added NoVarianceDataCheck to DefaultDataChecks #893
NoVarianceDataCheck
Added text processing and featurization component TextFeaturizer #913, #924
Added additional checks to InvalidTargetDataCheck to handle invalid target data types #929
InvalidTargetDataCheck
AutoMLSearch will now handle KeyboardInterrupt and prompt user for confirmation #915
KeyboardInterrupt
Makes automl results a read-only property #919
Deleted static pipelines and refactored tests involving static pipelines, removed all_pipelines() and get_pipelines() #904
all_pipelines()
get_pipelines()
Moved list_model_families to evalml.model_family.utils #903
list_model_families
evalml.model_family.utils
Updated all_pipelines, all_estimators, all_components to use the same mechanism for dynamically generating their elements #898
all_pipelines
all_estimators
Rename master branch to main #918
master
main
Add pypi release github action #923
Updated AutoMLSearch.search stdout output and logging and removed tqdm progress bar #921
Moved automl config checks previously in search() to init #933
search()
Reorganized and rewrote documentation #937
Updated to use pydata sphinx theme #937
Updated docs to use release_notes instead of changelog #942
release_notes
changelog
Cleaned up fixture names and usages in tests #895
list_model_families has been moved to evalml.model_family.utils (previously was under evalml.pipelines.utils) #903
Static pipeline definitions have been removed, but similar pipelines can still be constructed via creating an instance of PipelineBase #904
all_pipelines() and get_pipelines() utility methods have been removed #904
Added multiclass support for ROC curve graphing #832
Added preprocessing component to drop features whose percentage of NaN values exceeds a specified threshold #834
Added data check to check for problematic target labels #814
Added PerColumnImputer that allows imputation strategies per column #824
Added transformer to drop specific columns #827
Added support for categories, handle_error, and drop parameters in OneHotEncoder #830 #897
handle_error
drop
Added preprocessing component to handle DateTime columns featurization #838
Added ability to clone pipelines and components #842
Define getter method for component parameters #847
parameters
Added utility methods to calculate and graph permutation importances #860, #880
Added new utility functions necessary for generating dynamic preprocessing pipelines #852
Added kwargs to all components #863
Updated AutoSearchBase to use dynamically generated preprocessing pipelines #870
AutoSearchBase
Added SelectColumns transformer #873
Added ability to evaluate additional pipelines for automl search #874
Added default_parameters class property to components and pipelines #879
default_parameters
Added better support for disabling data checks in automl search #892
Added ability to save and load AutoML objects to file #888
Updated AutoSearchBase.get_pipelines to return an untrained pipeline instance #876
AutoSearchBase.get_pipelines
Saved learned binary classification thresholds in automl results cv data dict #876
Fixed bug where SimpleImputer cannot handle dropped columns #846
Fixed bug where PerColumnImputer cannot handle dropped columns #855
Enforce requirement that builtin components save all inputted values in their parameters dict #847
Don’t list base classes in all_components output #847
Standardize all components to output pandas data structures, and accept either pandas or numpy #853
Fixed rankings and full_rankings error when search has not been run #894
Update all_pipelines and all_components to try initializing pipelines/components, and on failure exclude them #849
Refactor handle_components to handle_components_class, standardize to ComponentBase subclass instead of instance #850
handle_components
handle_components_class
Refactor “blacklist”/”whitelist” to “allow”/”exclude” lists #854
Replaced AutoClassificationSearch and AutoRegressionSearch with AutoMLSearch #871
AutoClassificationSearch
AutoRegressionSearch
Renamed feature_importances and permutation_importances methods to use singular names (feature_importance and permutation_importance) #883
Updated automl default data splitter to train/validation split for large datasets #877
automl
Added open source license, update some repo metadata #887
Removed dead code in _get_preprocessing_components #896
_get_preprocessing_components
Fix some typos and update the EvalML logo #872
Update the changelog check job to expect the new branching pattern for the deps update bot #836
Check that all components output pandas datastructures, and can accept either pandas or numpy #853
Pipelines’ static component_graph field must contain either ComponentBase subclasses or str, instead of ComponentBase subclass instances #850
component_graph
str
Rename handle_component to handle_component_class. Now standardizes to ComponentBase subclasses instead of ComponentBase subclass instances #850
handle_component
handle_component_class
Renamed automl’s cv argument to data_split #877
cv
data_split
Pipelines’ and classifiers’ feature_importances is renamed feature_importance, graph_feature_importances is renamed graph_feature_importance #883
graph_feature_importances
graph_feature_importance
Passing data_checks=None to automl search will not perform any data checks as opposed to default checks. #892
data_checks=None
Pipelines to search for in AutoML are now determined automatically, rather than using the statically-defined pipeline classes. #870
Updated AutoSearchBase.get_pipelines to return an untrained pipeline instance, instead of one which happened to be trained on the final cross-validation fold #876
Added baseline models for classification and regression, add functionality to calculate baseline models before searching in AutoML #746
Port over highly-null guardrail as a data check and define DefaultDataChecks and DisableDataChecks classes #745
DisableDataChecks
Update Tuner classes to work directly with pipeline parameters dicts instead of flat parameter lists #779
Tuner
Add Elastic Net as a pipeline option #812
Added new Pipeline option ExtraTrees #790
ExtraTrees
Added precicion-recall curve metrics and plot for binary classification problems in evalml.pipeline.graph_utils #794
evalml.pipeline.graph_utils
Update the default automl algorithm to search in batches, starting with default parameters for each pipeline and iterating from there #793
Added AutoMLAlgorithm class and IterativeAlgorithm impl, separated from AutoSearchBase #793
AutoMLAlgorithm
IterativeAlgorithm
Update pipeline score to return nan score for any objective which throws an exception during scoring #787
score
nan
Fixed bug introduced in #787 where binary classification metrics requiring predicted probabilities error in scoring #798
CatBoost and XGBoost classifiers and regressors can no longer have a learning rate of 0 #795
Cleanup pipeline score code, and cleanup codecov #711
Remove pass for abstract methods for codecov #730
pass
Added __str__ for AutoSearch object #675
Add util methods to graph ROC and confusion matrix #720
Refactor AutoBase to AutoSearchBase #758
AutoBase
Updated AutoBase with data_checks parameter, removed previous detect_label_leakage parameter, and added functionality to run data checks before search in AutoML #765
data_checks
detect_label_leakage
Updated our logger to use Python’s logging utils #763
Refactor most of AutoSearchBase._do_iteration impl into AutoSearchBase._evaluate #762
AutoSearchBase._do_iteration
AutoSearchBase._evaluate
Port over all guardrails to use the new DataCheck API #789
Expanded import_or_raise to catch all exceptions #759
import_or_raise
Adds RMSE, MSLE, RMSLE as standard metrics #788
Don’t allow Recall to be used as an objective for AutoML #784
Recall
Removed feature selection from pipelines #819
Update default estimator parameters to make automl search faster and more accurate #793
Add instructions to freeze master on release.md #726
release.md
Update release instructions with more details #727 #733
Add objective base classes to API reference #736
Fix components API to match other modules #747
Delete codecov yml, use codecov.io’s default #732
Added unit tests for fraud cost, lead scoring, and standard metric objectives #741
Update codecov client #782
Updated AutoBase __str__ test to include no parameters case #783
Added unit tests for ExtraTrees pipeline #790
If codecov fails to upload, fail build #810
Updated Python version of dependency action #816
Update the dependency update bot to use a suffix when creating branches #817
The detect_label_leakage parameter for AutoML classes has been removed and replaced by a data_checks parameter #765
Moved ROC and confusion matrix methods from evalml.pipeline.plot_utils to evalml.pipeline.graph_utils #720
evalml.pipeline.plot_utils
Tuner classes require a pipeline hyperparameter range dict as an init arg instead of a space definition #779
Tuner.propose and Tuner.add work directly with pipeline parameters dicts instead of flat parameter lists #779
Tuner.propose
Tuner.add
PipelineBase.hyperparameters and custom_hyperparameters use pipeline parameters dict format instead of being represented as a flat list #779
PipelineBase.hyperparameters
custom_hyperparameters
All guardrail functions previously under evalml.guardrails.utils will be removed and replaced by data checks #789
evalml.guardrails.utils
Recall disallowed as an objective for AutoML #784
AutoSearchBase parameter tuner has been renamed to tuner_class #793
tuner
tuner_class
AutoSearchBase parameter possible_pipelines and possible_model_families have been renamed to allowed_pipelines and allowed_model_families #793
possible_pipelines
possible_model_families
allowed_pipelines
allowed_model_families
Added Accuracy as an standard objective #624
Accuracy
Added verbose parameter to load_fraud #560
Added Balanced Accuracy metric for binary, multiclass #612 #661
Added XGBoost regressor and XGBoost regression pipeline #666
Added Accuracy metric for multiclass #672
Added objective name in AutoBase.describe_pipeline #686
AutoBase.describe_pipeline
Added DataCheck and DataChecks, Message classes and relevant subclasses #739
Message
Removed direct access to cls.component_graph #595
cls.component_graph
Add testing files to .gitignore #625
Remove circular dependencies from Makefile #637
Makefile
Add error case for normalize_confusion_matrix() #640
normalize_confusion_matrix()
Fixed XGBoostClassifier and XGBoostRegressor bug with feature names that contain [, ], or < #659
XGBoostClassifier
XGBoostRegressor
Update make_pipeline_graph to not accidentally create empty file when testing if path is valid #649
make_pipeline_graph
Fix pip installation warning about docsutils version, from boto dependency #664
Removed zero division warning for F1/precision/recall metrics #671
Fixed summary for pipelines without estimators #707
summary
Updated default objective for binary/multiclass classification to log loss #613
Created classification and regression pipeline subclasses and removed objective as an attribute of pipeline classes #405
Changed the output of score to return one dictionary #429
Created binary and multiclass objective subclasses #504
Updated objectives API #445
Removed call to get_plot_data from AutoML #615
get_plot_data
Set raise_error to default to True for AutoML classes #638
raise_error
Remove unnecessary “u” prefixes on some unicode strings #641
Changed one-hot encoder to return uint8 dtypes instead of ints #653
Pipeline _name field changed to custom_name #650
_name
custom_name
Removed graphs.py and moved methods into PipelineBase #657, #665
graphs.py
Remove s3fs as a dev dependency #664
Changed requirements-parser to be a core dependency #673
Replace supported_problem_types field on pipelines with problem_type attribute on base classes #678
supported_problem_types
Changed AutoML to only show best results for a given pipeline template in rankings, added full_rankings property to show all #682
rankings
full_rankings
Update ModelFamily values: don’t list xgboost/catboost as classifiers now that we have regression pipelines for them #677
Changed AutoML’s describe_pipeline to get problem type from pipeline instead #685
describe_pipeline
Standardize import_or_raise error messages #683
Updated argument order of objectives to align with sklearn’s #698
Renamed pipeline.feature_importance_graph to pipeline.graph_feature_importances #700
pipeline.feature_importance_graph
pipeline.graph_feature_importances
Moved ROC and confusion matrix methods to evalml.pipelines.plot_utils #704
evalml.pipelines.plot_utils
Renamed MultiClassificationObjective to MulticlassClassificationObjective, to align with pipeline naming scheme #715
MultiClassificationObjective
Fixed some sphinx warnings #593
Fixed docstring for AutoClassificationSearch with correct command #599
Limit readthedocs formats to pdf, not htmlzip and epub #594 #600
Clean up objectives API documentation #605
Fixed function on Exploring search results page #604
Update release process doc #567
AutoClassificationSearch and AutoRegressionSearch show inherited methods in API reference #651
Fixed improperly formatted code in breaking changes for changelog #655
Added configuration to treat Sphinx warnings as errors #660
Removed separate plotting section for pipelines in API reference #657, #665
Have leads example notebook load S3 files using https, so we can delete s3fs dev dependency #664
Categorized components in API reference and added descriptions for each category #663
Fixed Sphinx warnings about BalancedAccuracy objective #669
BalancedAccuracy
Updated API reference to include missing components and clean up pipeline docstrings #689
Reorganize API ref, and clarify pipeline sub-titles #688
Add and update preprocessing utils in API reference #687
Added inheritance diagrams to API reference #695
Documented which default objective AutoML optimizes for #699
Create seperate install page #701
Include more utils in API ref, like import_or_raise #704
Add more color to pipeline documentation #705
Matched install commands of check_latest_dependencies test and it’s GitHub action #578
check_latest_dependencies
Added Github app to auto assign PR author as assignee #477
Removed unneeded conda installation of xgboost in windows checkin tests #618
Update graph tests to always use tmpfile dir #649
Changelog checkin test workaround for release PRs: If ‘future release’ section is empty of PR refs, pass check #658
Add changelog checkin test exception for dep-update branch #723
dep-update
Breaking Changes
Pipelines will now no longer take an objective parameter during instantiation, and will no longer have an objective attribute.
fit() and predict() now use an optional objective parameter, which is only used in binary classification pipelines to fit for a specific objective.
fit()
predict()
objective
score() will now use a required objectives parameter that is used to determine all the objectives to score on. This differs from the previous behavior, where the pipeline’s objective was scored on regardless.
score()
objectives
score() will now return one dictionary of all objective scores.
ROC and ConfusionMatrix plot methods via Auto(*).plot have been removed by #615 and are replaced by roc_curve and confusion_matrix in evamlm.pipelines.plot_utils in #704
ROC
ConfusionMatrix
Auto(*).plot
evamlm.pipelines.plot_utils
normalize_confusion_matrix has been moved to evalml.pipelines.plot_utils #704
Pipelines _name field changed to custom_name
Pipelines supported_problem_types field is removed because it is no longer necessary #678
Updated argument order of objectives’ objective_function to align with sklearn #698
objective_function
pipeline.feature_importance_graph has been renamed to pipeline.graph_feature_importances in #700
Removed unsupported MSLE objective #704
MSLE
Add normalization option and information to confusion matrix #484
Add util function to drop rows with NaN values #487
Renamed PipelineBase.name as PipelineBase.summary and redefined PipelineBase.name as class property #491
PipelineBase.name
PipelineBase.summary
Added access to parameters in Pipelines with PipelineBase.parameters (used to be return of PipelineBase.describe) #501
PipelineBase.parameters
PipelineBase.describe
Added fill_value parameter for SimpleImputer #509
Added functionality to override component hyperparameters and made pipelines take hyperparemeters from components #516
Allow numpy.random.RandomState for random_state parameters #556
numpy.random.RandomState
Removed unused dependency matplotlib, and move category_encoders to test reqs #572
matplotlib
Undo version cap in XGBoost placed in #402 and allowed all released of XGBoost #407
Support pandas 1.0.0 #486
Made all references to the logger static #503
Refactored model_type parameter for components and pipelines to model_family #507
model_type
model_family
Refactored problem_types for pipelines and components into supported_problem_types #515
Moved pipelines/utils.save_pipeline and pipelines/utils.load_pipeline to PipelineBase.save and PipelineBase.load #526
pipelines/utils.save_pipeline
pipelines/utils.load_pipeline
PipelineBase.save
PipelineBase.load
Limit number of categories encoded by OneHotEncoder #517
Updated API reference to remove PipelinePlot and added moved PipelineBase plotting methods #483
PipelinePlot
Add code style and github issue guides #463 #512
Updated API reference for to surface class variables for pipelines and components #537
Fixed README documentation link #535
Unhid PR references in changelog #656
Added automated dependency check PR #482, #505
Updated automated dependency check comment #497
Have build_docs job use python executor, so that env vars are set properly #547
Added simple test to make sure OneHotEncoder’s top_n works with large number of categories #552
Run windows unit tests on PRs #557
AutoClassificationSearch and AutoRegressionSearch’s model_types parameter has been refactored into allowed_model_families
model_types
ModelTypes enum has been changed to ModelFamily
ModelTypes
Components and Pipelines now have a model_family field instead of model_type
get_pipelines utility function now accepts model_families as an argument instead of model_types
get_pipelines
model_families
PipelineBase.name no longer returns structure of pipeline and has been replaced by PipelineBase.summary
PipelineBase.problem_types and Estimator.problem_types has been renamed to supported_problem_types
PipelineBase.problem_types
Estimator.problem_types
pipelines/utils.save_pipeline and pipelines/utils.load_pipeline moved to PipelineBase.save and PipelineBase.load
Added emacs buffers to .gitignore #350
Add CatBoost (gradient-boosted trees) classification and regression components and pipelines #247
Added Tuner abstract base class #351
Added n_jobs as parameter for AutoClassificationSearch and AutoRegressionSearch #403
n_jobs
Changed colors of confusion matrix to shades of blue and updated axis order to match scikit-learn’s #426
Added PipelineBase .graph and .feature_importance_graph methods, moved from previous location #423
.graph
.feature_importance_graph
Added support for python 3.8 #462
Fixed ROC and confusion matrix plots not being calculated if user passed own additional_objectives #276
Fixed ReadtheDocs FileNotFoundError exception for fraud dataset #439
FileNotFoundError
Added n_estimators as a tunable parameter for XGBoost #307
n_estimators
Remove unused parameter ObjectiveBase.fit_needs_proba #320
ObjectiveBase.fit_needs_proba
Remove extraneous parameter component_type from all components #361
component_type
Remove unused rankings.csv file #397
rankings.csv
Downloaded demo and test datasets so unit tests can run offline #408
Remove _needs_fitting attribute from Components #398
_needs_fitting
Changed plot.feature_importance to show only non-zero feature importances by default, added optional parameter to show all #413
Refactored PipelineBase to take in parameter dictionary and moved pipeline metadata to class attribute #421
Dropped support for Python 3.5 #438
Removed unused apply.py file #449
apply.py
Clean up requirements.txt to remove unused deps #451
requirements.txt
Support installation without all required dependencies #459
Update release.md with instructions to release to internal license key #354
Added tests for utils (and moved current utils to gen_utils) #297
Moved XGBoost install into it’s own separate step on Windows using Conda #313
Rewind pandas version to before 1.0.0, to diagnose test failures for that version #325
Added dependency update checkin test #324
Rewind XGBoost version to before 1.0.0 to diagnose test failures for that version #402
Update dependency check to use a whitelist #417
Update unit test jobs to not install dev deps #455
Python 3.5 will not be actively supported.
Added ability to create a plot of feature importances #133
Add early stopping to AutoML using patience and tolerance parameters #241
Added ROC and confusion matrix metrics and plot for classification problems and introduce PipelineSearchPlots class #242
Enhanced AutoML results with search order #260
Added utility function to show system and environment information #300
Lower botocore requirement #235
Fixed decision_function calculation for FraudCost objective #254
FraudCost
Fixed return value of Recall metrics #264
Components return self on fit #289
self
Renamed automl classes to AutoRegressionSearch and AutoClassificationSearch #287
Updating demo datasets to retain column names #223
Moving pipeline visualization to PipelinePlot class #228
Standarizing inputs as pd.Dataframe / pd.Series #130
pd.Dataframe
pd.Series
Enforcing that pipelines must have an estimator as last component #277
Added ipywidgets as a dependency in requirements.txt #278
ipywidgets
Added Random and Grid Search Tuners #240
Adding class properties to API reference #244
Fix and filter FutureWarnings from scikit-learn #249, #257
Adding Linear Regression to API reference and cleaning up some Sphinx warnings #227
Added support for testing on Windows with CircleCI #226
Added support for doctests #233
The fit() method for AutoClassifier and AutoRegressor has been renamed to search().
AutoClassifier
AutoRegressor
AutoClassifier has been renamed to AutoClassificationSearch
AutoRegressor has been renamed to AutoRegressionSearch
AutoClassificationSearch.results and AutoRegressionSearch.results now is a dictionary with pipeline_results and search_order keys. pipeline_results can be used to access a dictionary that is identical to the old .results dictionary. Whereas, search_order returns a list of the search order in terms of pipeline_id.
AutoClassificationSearch.results
AutoRegressionSearch.results
pipeline_results
search_order
.results
pipeline_id
Pipelines now require an estimator as the last component in component_list. Slicing pipelines now throws an NotImplementedError to avoid returning pipelines without an estimator.
component_list
NotImplementedError
Adding basic pipeline structure visualization #211
Added notebooks to build process #212
Added basic outlier detection guardrail #151
Added basic ID column guardrail #135
Added support for unlimited pipelines with a max_time limit #70
max_time
Updated .readthedocs.yaml to successfully build #188
Removed MSLE from default additional objectives #203
Fixed random_state passed in pipelines #204
Fixed slow down in RFRegressor #206
Pulled information for describe_pipeline from pipeline’s new describe method #190
Refactored pipelines #108
Removed guardrails from Auto(*) #202, #208
Updated documentation to show max_time enhancements #189
Updated release instructions for RTD #193
Added contributing instructions #213
Added new content #222
Added basic one hot encoding #73
Use enums for model_type #110
Support for splitting regression datasets #112
Auto-infer multiclass classification #99
Added support for other units in max_time #125
Detect highly null columns #121
Added additional regression objectives #100
Show an interactive iteration vs. score plot when using fit() #134
Reordered describe_pipeline #94
Added type check for model_type #109
Fixed s units when setting string max_time #132
s
Fix objectives not appearing in API documentation #150
Reorganized tests #93
Moved logging to its own module #119
Show progress bar history #111
Using cloudpickle instead of pickle to allow unloading of custom objectives #113
cloudpickle
Removed render.py #154
Update release instructions #140
Include additional_objectives parameter #124
Added Changelog #136
Code coverage #90
Added CircleCI tests for other Python versions #104
Added doc notebooks as tests #139
Test metadata for CircleCI and 2 core parallelism #137
Added AutoML for classification and regressor using Autobase and Skopt #7 #9
Implemented standard classification and regression metrics #7
Added logistic regression, random forest, and XGBoost pipelines #7
Implemented support for custom objectives #15
Feature importance for pipelines #18
Serialization for pipelines #19
Allow fitting on objectives for optimal threshold #27
Added detect label leakage #31
Implemented callbacks #42
Allow for multiclass classification #21
Added support for additional objectives #79
Fixed feature selection in pipelines #13
Made random_seed usage consistent #45
random_seed
Added docstrings #6
Created notebooks for docs #6
Initialized readthedocs EvalML #6
Added favicon #38
Added testing for loading data #39
Created fraud detection objective #4
First Release
Added lead scoring objecitve #1
Added basic classifier #1
Initialized Sphinx for docs #1