Release Notes#
- Future Releases
Enhancements
Fixes
Changes
Documentation Changes
Testing Changes
Warning
Breaking Changes
- v0.84.0 Jun 6, 2024
- v0.83.0 Feb 2, 2024
Warning
Breaking Changes
- v0.82.0 Nov 3, 2023
Warning
Breaking Changes
- v0.81.1 Oct 16, 2023
Warning
Breaking Changes
- v0.81.0 Oct 5, 2023
- Enhancements
Extended STLDecomposer to support multiseries #4253
Extended TimeSeriesImputer to support multiseries #4291
Added datacheck to check for mismatched series length in multiseries #4296
Added STLDecomposer to multiseries pipelines #4299
Extended DateTimeFormatCheck data check to support multiseries #4300
Extended TimeSeriesRegularizer to support multiseries #4303
- Documentation Changes
Removed LightGBM’s excessive amount of warnings #4308
- Testing Changes
Removed old performance testing workflow #4318
Warning
Breaking Changes
- v0.80.0 Aug. 30, 2023
Warning
Breaking Changes
- v0.79.0 Aug. 11, 2023
- Enhancements
Updated regression metrics to handle multioutput dataframes as well as single output series #4233
Added baseline regressor for multiseries time series problems #4246
Added stacking and unstacking utility functions to work with multiseries data #4250
Added multiseries regression pipeline class #4256
Added multiseries VARMAX regressor #4238
Documentation Changes
Testing Changes
Warning
Breaking Changes
- v0.78.0 Jul. 10, 2023
Warning
- v0.77.0 Jun. 07, 2023
- Enhancements
Added
check_distributionfunction for determining if the predicted distribution matches the true one #4184Added
get_recommendation_score_breakdownfunction for insight on the recommendation score #4188Added excluded_model_families parameter to AutoMLSearch() #4196
Added option to exclude time index in
IDColumnsDataCheck#4194
Changes
Documentation Changes
- Testing Changes
Run looking glass performance tests on merge via Airflow #4198
- v0.76.0 May. 09, 2023
- v0.75.0 May. 01, 2023
- Fixes
Fixed bug where resetting the holdout data indices would cause time series
predict_in_sampleto be wrong #4161
- v0.74.0 Apr. 18, 2023
- Changes
Capped size of seasonal period used for determining whether to include STLDecomposer in pipelines #4147
- v0.73.0 Apr. 10, 2023
- Changes
Removed unnecessary logic from imputer components prior to nullable type handling #4038, #4043
Added calls to
_handle_nullable_typesin component fit, transform, and predict methods when needed #4046, #4043Removed existing nullable type handling across AutoMLSearch to just use new handling #4085, #4043
Handled nullable type incompatibility in
Decomposer#4105, :pr:`4043Removed nullable type incompatibility handling for ARIMA and ExponentialSmoothingRegressor #4129
Changed the default value for
null_strategyinInvalidTargetDataChecktodrop#4131Pinned sktime version to 0.17.0 for nullable types support #4137
- Testing Changes
Fixed installation of prophet for linux nightly tests #4114
- v0.72.0 Mar. 27, 2023
- v0.71.0 Mar. 17, 2023*
- Fixes
Fixed error in
PipelineBase._supports_fast_permutation_importancewith stacked ensemble pipelines #4083
- v0.70.0 Mar. 16, 2023
- v0.69.0 Mar. 15, 2023
- Enhancements
Move black to regular dependency and use it for
generate_pipeline_code#4005Implement
generate_pipeline_example#4023Add new downcast utils for component-specific nullable type handling and begin implementation on objective and component base classes #4024
Add nullable type incompatibility properties to the components that need them #4031
Add
get_evalml_requirements_file#4034Pipelines with DFS Transformers will run fast permutation importance if DFS features pre-exist #4037
Add get_prediction_intervals() at the pipeline level #4052
- Changes
Uncapped
pmdarimaand updated minimum version #4027Increase min catboost to 1.1.1 and xgboost to 1.7.0 to add nullable type support for those estimators #3996
Unpinned
networkxand updated minimum version #4035Increased
scikit-learnversion to 1.2.2 #4064Capped max
holidaysversion to 0.21 #4064Stop allowing
knnas a boolean impute strategy #4058Capped
nbsphinxat < 0.9.0 #4071
- v0.68.0 Feb. 15, 2023
- v0.67.0 Jan. 31, 2023
- v0.66.1 Jan. 26, 2023
- v0.66.0 Jan. 24, 2023
- Enhancements
Improved decomposer
determine_periodicityfunctionality for better period guesses #3912Added
dates_needed_for_predictionfor time series pipelines #3906Added
RFClassifierRFESelectorandRFRegressorRFESelectorcomponents for feature selection using recursive feature elimination #3934Added
dates_needed_for_prediction_rangefor time series pipelines #3941
- Fixes
- v0.65.0 Jan. 3, 2023
- Changes
Added a threshold to
DateTimeFormatDataCheckto account for too many duplicate or nan values #3883Changed treatment of
Booleancolumns forSimpleImputerandClassImbalanceDataCheckto be compatible with new Woodwork inference #3892Split decomposer
seasonal_periodparameter intoseasonal_smootherandperiodparameters #3896Excluded catboost from the broken link checking workflow due to 403 errors #3899
Pinned scikit-learn version below 1.2.0 #3901
Cast newly created one hot encoded columns as
booldtype #3913
- Documentation Changes
Hid non-essential warning messages in time series docs #3890
Testing Changes
- v0.64.0 Dec. 8, 2022
Enhancements
- Changes
Update leaderboard names to show ranking_score instead of validation_score #3878
Remove Int64Index after Pandas 1.5 Upgrade #3825
Reduced the threshold for setting
use_covariatesto False for ARIMA models in AutoMLSearch #3868Pinned woodwork version at <=0.19.0 #3871
Updated minimum Pandas version to 1.5.0 #3808
Remove dsherry from automated dependency update reviews and added tamargrey #3870
Documentation Changes
Testing Changes
- v0.63.0 Nov. 23, 2022
- Fixes
Fixed
TimeSeriesFeaturizerpotentially selecting lags outside of feature engineering window #3773Fixed bug where
TimeSeriesFeaturizercould not encode Ordinal columns with non numeric categories #3812Updated demo dataset links to point to new endpoint #3826
Updated
STLDecomposerto infer the time index frequency if it’s not present #3829Updated
_drop_time_indexto move the time index from X to bothX.indexandy.index#3829Fixed bug where engineered features lost their origin attribute in partial dependence, causing it to fail #3830
Fixed bug where partial dependence’s fast mode handling for the DFS Transformer wouldn’t work with multi output features #3830
Allowed target to be present and ignored in partial dependence’s DFS Transformer fast mode handling #3830
- Changes
Consolidated decomposition frequency validation logic to
Decomposerclass #3811Removed Featuretools version upper bound and prevent Woodwork 0.20.0 from being installed #3813
Updated min Featuretools version to 0.16.0, min nlp-primitives version to 2.9.0 and min Dask version to 2022.2.0 #3823
Rename issue templates config.yaml to config.yml #3844
Reverted change adding a
should_skip_featurizationflag to time series pipelines #3862
- v0.62.0 Nov. 01, 2022
- v0.61.1 Oct. 27, 2022
- v0.61.0 Oct. 25, 2022
- v0.60.0 Oct. 19, 2022
Warning
- Breaking Changes
TargetLeakageDataChecknow uses argumentmutual_inforather thanmutual#3728
- v0.59.0 Sept. 27, 2022
- v0.58.0 Sept. 20, 2022
- Enhancements
Defined get_trend_df() for PolynomialDecomposer to allow decomposition of target data into trend, seasonality and residual. #3720
Updated to run with Woodwork >= 0.18.0 #3700
Pass time index column to time series native estimators but drop otherwise #3691
Added
errorsattribute toAutoMLSearchfor useful debugging #3702
- Changes
Bumped up minimum version of sktime to 0.12.0. #3720
Added abstract Decomposer class as a parent to PolynomialDecomposer to support additional decomposers. #3720
Pinned
pmdarima< 2.0.0 #3679Added support for using
downcast_nullable_typeswith Series as well as DataFrames #3697Added distinction between ranking and optimization objectives #3721
Documentation Changes
- Testing Changes
- v0.57.0 Sept. 6, 2022
- Enhancements
Added
KNNImputerclass and created new knn parameter for Imputer #3662
- Fixes
IDColumnsDataChecknow only returns an action code to set the first column as the primary key if it contains unique values #3639IDColumnsDataChecknow can handle primary key columns containing “integer” values that are of the double type #3683Added support for BooleanNullable columns in EvalML pipelines and imputer #3678
Updated StandardScaler to only apply to numeric columns #3686
- v0.56.1 Aug. 19, 2022
- v0.56.0 Aug. 15, 2022
- Enhancements
Add CI testing environment in Mac for install workflow #3646
Updated
make_pipelineto only include the Imputer in pipelines if NaNs exist in the data #3657Updated to run with Woodwork >= 0.17.2 #3626
Add
exclude_featurizersparameter toAutoMLSearchto specify featurizers that should be excluded from all pipelines #3631Add
fit_transformmethod to pipelines and component graphs #3640Changed default value of data splitting for time series problem holdout set evaluation #3650
- Fixes
Reverted the Woodwork 0.17.x compatibility work due to performance regression #3664
- v0.55.0 July. 24, 2022
- Enhancements
Increased the amount of logical type information passed to Woodwork when calling
ww.init()in transformers #3604Added ability to log how long each batch and pipeline take in
automl.search()#3577Added the option to set the
spparameter for ARIMA models #3597Updated the CV split size of time series problems to match forecast horizon for improved performance #3616
Added holdout set evaluation as part of AutoML search and pipeline ranking #3499
Added Dockerfile.arm and .dockerignore for python version and M1 testing #3609
Added
test_gen_utils::in_container_arm64()fixture #3609
- Fixes
Fixed iterative graphs not appearing in documentation #3592
Updated the
load_diabetes()method to account for scikit-learn 1.1.1 changes to the dataset #3591Capped woodwork version at < 0.17.0 #3612
Bump minimum scikit-optimize version to 0.9.0 :pr:`3614
Invalid target data checks involving regression and unsupported data types now produce a different
DataCheckMessageCode#3630Updated
test_data_checks.py::test_data_checks_raises_value_errors_on_init- more lenient text check #3609
Documentation Changes
- Testing Changes
Warning
- Breaking Changes
Refactored test cases that iterate over all components to use
pytest.mark.parametriseand changed the correspondingif...continueblocks topytest.mark.xfail#3622
- v0.54.0 June. 23, 2022
- Fixes
Updated the Imputer and SimpleImputer to work with scikit-learn 1.1.1. #3525
Bumped the minimum versions of scikit-learn to 1.1.1 and imbalanced-learn to 0.9.1. #3525
Added a clearer error message when
describeis called on an un-instantiated ComponentGraph #3569Added a clearer error message when time series’
predictis called with its X_train or y_train parameter set as None #3579
- v0.53.1 June. 9, 2022
- Changes
Set the development status to
4 - Betainsetup.cfg#3550
- v0.53.0 June. 9, 2022
- Enhancements
Pass
n_jobsto default algorithm #3548
- v0.52.0 May. 12, 2022
- Changes
Added github workflows for featuretools and woodwork to test their main branch against evalml. #3504
Added pmdarima to conda recipe. #3505
Added a threshold for
NullDataCheckbefore a warning is issued for null values #3507Changed
NoVarianceDataCheckto only output warnings #3506Reverted XGBoost Classifier/Regressor patch for all boolean columns needing to be converted to int. #3503
Updated
roc_curve()andconf_matrix()to work with IntegerNullable and BooleanNullable types. #3465Changed
ComponentGraph._transform_featuresto raise aPipelineErrorinstead of aValueError. This is not a breaking change becausePipelineErroris a subclass ofValueError. #3497Capped
sklearnat version 1.1.0 #3518
- Documentation Changes
Updated to install prophet extras in Read the Docs. #3509
- Testing Changes
Moved vowpal wabbit in test recipe to
evalmlpackage fromevalml-core#3502
- v0.51.0 Apr. 28, 2022
- Enhancements
Updated
make_pipeline_from_data_check_outputto work with time series problems. #3454
- Fixes
Changed
PipelineBase.graph_json()to return a python dictionary and renamed asgraph_dict()#3463
- Changes
Added
vowpalwabbitto local recipe and removeis_using_condapytest skip markers from relevant tests #3481
- Documentation Changes
Warning
- v0.50.0 Apr. 12, 2022
- Enhancements
Added
TimeSeriesImputercomponent #3374Replaced
pipeline_parametersandcustom_hyperparameterswithsearch_parametersinAutoMLSearch#3373, #3427Added
TimeSeriesRegularizerto smooth uninferrable date ranges for time series problems #3376Enabled ensembling as a parameter for
DefaultAlgorithm#3435, #3444
Warning
- v0.49.0 Mar. 31, 2022
- Enhancements
Added
use_covariatesparameter toARIMARegressor#3407AutoMLSearchwill setuse_covariatestoFalsefor ARIMA when dataset is large #3407Add ability to retrieve logical types to a component in the graph via
get_component_input_logical_types#3428Add ability to get logical types passed to the last component via
last_component_input_logical_types#3428
- Fixes
Fix conda build after PR 3407 #3429
Warning
- Breaking Changes
Moved model understanding metrics from
graph.pytometrics.py#3417
- v0.48.0 Mar. 25, 2022
- Enhancements
Add support for oversampling in time series classification problems #3387
Warning
- Breaking Changes
Moved partial dependence functions from
graph.pytopartial_dependence.py#3404
- v0.47.0 Mar. 16, 2022
- v0.46.0 Mar. 03, 2022
- Documentation Changes
Added in-line tabs and copy-paste functionality to documentation, overhauled Install page #3353
- v0.45.0 Feb. 17, 2022
- Testing Changes
Add auto approve dependency workflow schedule for every 30 mins #3312
- v0.44.0 Feb. 04, 2022
- Enhancements
Updated
DefaultAlgorithmto also limit estimator usage for long-running multiclass problems #3099Added
make_pipeline_from_data_check_output()utility method #3277Updated
AutoMLSearchto useDefaultAlgorithmas the default automl algorithm #3261, #3304Added more specific data check errors to
DatetimeFormatDataCheck#3288Added
featuresas a parameter forAutoMLSearchand addDFSTransformerto pipelines whenfeaturesare present #3309
Warning
- v0.43.0 Jan. 25, 2022
- Enhancements
Updated new
NullDataCheckto return a warning and suggest an action to impute columns with null values #3197Updated
make_pipeline_from_actionsto handle null column imputation #3237Updated data check actions API to return options instead of actions and add functionality to suggest and take action on columns with null values #3182
- Changes
Updated
DataCheckvalidate()output to return a dictionary instead of list for actions #3142Updated
DataCheckvalidate()API to use the newDataCheckActionOptionclass instead ofDataCheckAction#3152Uncapped numba version and removed it from requirements #3263
Renamed
HighlyNullDataChecktoNullDataCheck#3197Updated data check
validate()output to return a list of warnings and errors instead of a dictionary #3244Capped
pandasat < 1.4.0 #3274
- Testing Changes
Bumped minimum
IPythonversion to 7.16.3 intest-requirements.txtbased on dependabot feedback #3269
Warning
- Breaking Changes
Renamed
HighlyNullDataChecktoNullDataCheck#3197Updated data check
validate()output to return a list of warnings and errors instead of a dictionary. See the Data Check or Data Check Actions pages (under User Guide) for examples. #3244Removed
impute_allanddefault_impute_strategyparameters from thePerColumnImputer#3267Updated
PerColumnImputersuch that columns not specified inimpute_strategiesdict will not be imputed anymore #3267
- v0.42.0 Jan. 18, 2022
- Enhancements
Required the separation of training and test data by
gap+ 1 units to be verified bytime_indexfor time series problems #3208Added support for boolean features for
ARIMARegressor#3187Updated dependency bot workflow to remove outdated description and add new configuration to delete branches automatically #3212
Added
n_obsandn_splitstoTimeSeriesParametersDataCheckerror details #3246
- Fixes
Fixed classification pipelines to only accept target data with the appropriate number of classes #3185
Added support for time series in
DefaultAlgorithm#3177Standardized names of featurization components #3192
Removed empty cell in text_input.ipynb #3234
Removed potential prediction explanations failure when pipelines predicted a class with probability 1 #3221
Dropped NaNs before partial dependence grid generation #3235
Allowed prediction explanations to be json-serializable #3262
Fixed bug where
InvalidTargetDataCheckwould not check time series regression targets #3251Fixed bug in
are_datasets_separated_by_gap_time_index#3256
- Changes
Raised lowest compatible numpy version to 1.21.0 to address security concerns #3207
Changed the default objective to
MedianAEfromR2for time series regression #3205Removed all-nan Unknown to Double logical conversion in
infer_feature_types#3196Checking the validity of holdout data for time series problems can be performed by calling
pipelines.utils.validate_holdout_datasetsprior to callingpredict#3208
- Testing Changes
Update auto approve workflow trigger and delete branch after merge #3265
Warning
- Breaking Changes
Renamed
DateTime Featurizer ComponenttoDateTime FeaturizerandNatural Language Featurization ComponenttoNatural Language Featurizer#3192
- v0.41.0 Jan. 06, 2022
- v0.40.0 Dec. 22, 2021
- Enhancements
Added
TimeSeriesSplittingDataChecktoDefaultDataChecksto verify adequate class representation in time series classification problems #3141Added the ability to accept serialized features and skip computation in
DFSTransformer#3106Added support for known-in-advance features #3149
Added Holt-Winters
ExponentialSmoothingRegressorfor time series regression problems #3157Required the separation of training and test data by
gap+ 1 units to be verified bytime_indexfor time series problems #3160
- Fixes
Fixed error caused when tuning threshold for time series binary classification #3140
- Changes
TimeSeriesParametersDataCheckwas added toDefaultDataChecksfor time series problems #3139Renamed
date_indextotime_indexinproblem_configurationfor time series problems #3137Updated
nlp-primitivesminimum version to 2.1.0 #3166Updated minimum version of
woodworkto v0.11.0 #3171Revert 3160 until uninferrable frequency can be addressed earlier in the process #3198
- Documentation Changes
Added comments to provide clarity on doctests #3155
- Testing Changes
Parameterized tests in
test_datasets.py#3145
Warning
- Breaking Changes
Renamed
date_indextotime_indexinproblem_configurationfor time series problems #3137
- v0.39.0 Dec. 9, 2021
- Enhancements
Renamed
DelayedFeatureTransformertoTimeSeriesFeaturizerand enhanced it to compute rolling features #3028Added ability to impute only specific columns in
PerColumnImputer#3123Added
TimeSeriesParametersDataCheckto verify the time series parameters are valid given the number of splits in cross validation #3111
- Fixes
Default parameters for
RFRegressorSelectFromModelandRFClassifierSelectFromModelhas been fixed to avoid selecting all features #3110
- Changes
Removed reliance on a datetime index for
ARIMARegressorandProphetRegressor#3104Included target leakage check when fitting
ARIMARegressorto account for the lack ofTimeSeriesFeaturizerinARIMARegressorbased pipelines #3104Cleaned up and refactored
InvalidTargetDataCheckimplementation and docstring #3122Removed indices information from the output of
HighlyNullDataCheck’svalidate()method #3092Added
ReplaceNullableTypescomponent to prepare for handling pandas nullable types. #3090Updated
make_pipelinefor handling pandas nullable types in preprocessing pipeline. #3129Removed unused
EnsembleMissingPipelinesErrorexception definition #3131
Warning
- Breaking Changes
Renamed
DelayedFeatureTransformertoTimeSeriesFeaturizer#3028ProphetRegressornow requires a datetime column inXrepresented by thedate_indexparameter #3104Renamed module
evalml.data_checks.invalid_target_data_checktoevalml.data_checks.invalid_targets_data_check#3122Removed unused
EnsembleMissingPipelinesErrorexception definition #3131
- v0.38.0 Nov. 27, 2021
- Enhancements
Added
data_check_nameattribute to the data check action class #3034Added
NumWordsandNumCharactersprimitives toTextFeaturizerand renamedTextFeaturizer` to ``NaturalLanguageFeaturizer#3030Added support for
scikit-learn > 1.0.0#3051Required the
date_indexparameter to be specified for time series problems inAutoMLSearch#3041Allowed time series pipelines to predict on test datasets whose length is less than or equal to the
forecast_horizon. Also allowed the test set index to start at 0. #3071Enabled time series pipeline to predict on data with features that are not known-in-advanced #3094
- Fixes
Added in error message when fit and predict/predict_proba data types are different #3036
Fixed bug where ensembling components could not get converted to JSON format #3049
Fixed bug where components with tuned integer hyperparameters could not get converted to JSON format #3049
Fixed bug where force plots were not displaying correct feature values #3044
Included confusion matrix at the pipeline threshold for
find_confusion_matrix_per_threshold#3080Fixed bug where One Hot Encoder would error out if a non-categorical feature had a missing value #3083
Fixed bug where features created from categorical columns by
Delayed Feature Transformerwould be inferred as categorical #3083
- Documentation Changes
Updated docs to use data check action methods rather than manually cleaning data #3050
- Testing Changes
Updated integration tests to use
make_pipeline_from_actionsinstead of private method #3047
Warning
- Breaking Changes
Added
data_check_nameattribute to the data check action class #3034Renamed
TextFeaturizer` to ``NaturalLanguageFeaturizer#3030Updated the
Pipeline.graph_jsonfunction to return a dictionary of “from” and “to” edges instead of tuples #3049Delete
predict_uses_yestimator attribute #3069Changed time series problems in
AutoMLSearchto need a not-Nonedate_index#3041Changed the
DelayedFeatureTransformerto throw aValueErrorduring fit if thedate_indexisNone#3041Passing
X=NonetoDelayedFeatureTransformeris deprecated #3041
- v0.37.0 Nov. 9, 2021
- Enhancements
Added
find_confusion_matrix_per_thresholdto Model Understanding #2972Limit computationally-intensive models during
AutoMLSearchfor certain multiclass problems, allow for opt-in with parameterallow_long_running_models#2982Added support for stacked ensemble pipelines to prediction explanations module #2971
Added integration tests for data checks and data checks actions workflow #2883
Added a change in pipeline structure to handle categorical columns separately for pipelines in
DefaultAlgorithm#2986Added an algorithm to
DelayedFeatureTransformerto select better lags #3005Added test to ensure pickling pipelines preserves thresholds #3027
Added AutoML function to access ensemble pipeline’s input pipelines IDs #3011
Added ability to define which class is “positive” for label encoder in binary classification case #3033
- Fixes
Fixed bug where
Oversamplerdidn’t consider boolean columns to be categorical #2980Fixed permutation importance failing when target is categorical #3017
Updated estimator and pipelines’
predict,predict_proba,transform,inverse_transformmethods to preserve input indices #2979Updated demo dataset link for daily min temperatures #3023
- Changes
Updated
OutliersDataCheckandUniquenessDataCheckand allow for the suspension of the Nullable types error #3018
- v0.36.0 Oct. 27, 2021
- Enhancements
Added LIME as an algorithm option for
explain_predictionsandexplain_predictions_best_worst#2905Standardized data check messages and added default “rows” and “columns” to data check message details dictionary #2869
Added
rows_of_interestto pipeline utils #2908Added support for woodwork version
0.8.2#2909Enhanced the
DateTimeFeaturizerto handleNaNsin date features #2909Added support for woodwork logical types
PostalCode,SubRegionCode, andCountryCodein model understanding tools #2946Added Vowpal Wabbit regressor and classifiers #2846
Added NoSplit data splitter for future unsupervised learning searches #2958
Added method to convert actions into a preprocessing pipeline #2968
- Fixes
Fixed bug where partial dependence was not respecting the ww schema #2929
Fixed
calculate_permutation_importancefor datetimes onStandardScaler#2938Fixed
SelectColumnsto only select available features for feature selection inDefaultAlgorithm#2944Fixed
DropColumnscomponent not receiving parameters inDefaultAlgorithm#2945Fixed bug where trained binary thresholds were not being returned by
get_pipelineorclone#2948Fixed bug where
Oversamplerselected ww logical categorical instead of ww semantic category #2946
Warning
- Breaking Changes
Standardized data check messages and added default “rows” and “columns” to data check message details dictionary. This may change the number of messages returned from a data check. #2869
- v0.35.0 Oct. 14, 2021
- Changes
Updated pipelines to use a label encoder component instead of doing encoding on the pipeline level #2821
Deleted scikit-learn ensembler #2819
Refactored pipeline building logic out of
AutoMLSearchand intoIterativeAlgorithm#2854Refactored names for methods in
ComponentGraphandPipelineBase#2902
Warning
- Breaking Changes
Updated pipelines to use a label encoder component instead of doing encoding on the pipeline level. This means that pipelines will no longer automatically encode non-numerical targets. Please use a label encoder if working with classification problems and non-numeric targets. #2821
Deleted scikit-learn ensembler #2819
IterativeAlgorithmnow requires X, y, problem_type as required arguments as well as sampler_name, allowed_model_families, allowed_component_graphs, max_batches, and verbose as optional arguments #2854Changed method names of
fit_featuresandcompute_final_component_featurestofit_and_transform_all_but_finalandtransform_all_but_finalinComponentGraph, andcompute_estimator_featurestotransform_all_but_finalin pipeline classes #2902
- v0.34.0 Sep. 30, 2021
- Enhancements
Updated to work with Woodwork 0.8.1 #2783
Added validation that
training_dataandtraining_targetare notNonein prediction explanations #2787Added support for training-only components in pipelines and component graphs #2776
Added default argument for the parameters value for
ComponentGraph.instantiate#2796Added
TIME_SERIES_REGRESSIONtoLightGBMRegressor'ssupported problem types #2793Provided a JSON representation of a pipeline’s DAG structure #2812
Added validation to holdout data passed to
predictandpredict_probafor time series #2804Added information about which row indices are outliers in
OutliersDataCheck#2818Added verbose flag to top level
search()method #2813Added support for linting jupyter notebooks and clearing the executed cells and empty cells #2829 #2837
Added “DROP_ROWS” action to output of
OutliersDataCheck.validate()#2820Added the ability of
AutoMLSearchto accept aSequentialEngineinstance as engine input #2838Added new label encoder component to EvalML #2853
Added our own partial dependence implementation #2834
- Fixes
Fixed bug where
calculate_permutation_importancewas not calculating the right value for pipelines with target transformers #2782Fixed bug where transformed target values were not used in
fitfor time series pipelines #2780Fixed bug where
score_pipelinesmethod ofAutoMLSearchwould not work for time series problems #2786Removed
TargetTransformerclass #2833Added tests to verify
ComponentGraphsupport by pipelines #2830Fixed incorrect parameter for baseline regression pipeline in
AutoMLSearch#2847Fixed bug where the desired estimator family order was not respected in
IterativeAlgorithm#2850
- Changes
Changed woodwork initialization to use partial schemas #2774
Made
Transformer.transform()an abstract method #2744Deleted
EmptyDataChecksclass #2794Removed data check for checking log distributions in
make_pipeline#2806Changed the minimum
woodworkversion to 0.8.0 #2783Pinned
woodworkversion to 0.8.0 #2832Removed
model_familyattribute fromComponentBaseand transformers #2828Limited
scikit-learnuntil new features and errors can be addressed #2842Show DeprecationWarning when Sklearn Ensemblers are called #2859
Warning
- v0.33.0 Sep. 15, 2021
- v0.32.1 Sep. 10, 2021
- Enhancements
Added
verboseflag toAutoMLSearchto run search in silent mode by default #2645Added label encoder to
XGBoostClassifierto remove the warning #2701Set
eval_metrictologlossforXGBoostClassifier#2741Added support for
woodworkversions0.7.0and0.7.1#2743Changed
explain_predictionsfunctions to display original feature values #2759Added
X_trainandy_traintograph_prediction_vs_actual_over_timeandget_prediction_vs_actual_over_time_data#2762Added
forecast_horizonas a required parameter to time series pipelines andAutoMLSearch#2697Added
predict_in_sampleandpredict_proba_in_samplemethods to time series pipelines to predict on data where the target is known, e.g. cross-validation #2697
- Changes
Deleted
drop_nan_target_rowsutility method #2737Removed default logging setup and debugging log file #2645
Changed the default n_jobs value for
XGBoostClassifierandXGBoostRegressorto 12 #2757Changed
TimeSeriesBaselineEstimatorto only work on a time series pipeline with aDelayedFeaturesTransformer#2697Added
X_trainandy_trainas optional parameters to pipelinepredict,predict_proba. Only used for time series pipelines #2697Added
training_dataandtraining_targetas optional parameters toexplain_predictionsandexplain_predictions_best_worstto support time series pipelines #2697Changed time series pipeline predictions to no longer output series/dataframes padded with NaNs. A prediction will be returned for every row in the X input #2697
- Testing Changes
Fixed flaky
TargetDistributionDataChecktest for very_lognormal distribution #2748
Warning
- Breaking Changes
Removed default logging setup and debugging log file #2645
Added
X_trainandy_traintograph_prediction_vs_actual_over_timeandget_prediction_vs_actual_over_time_data#2762Added
forecast_horizonas a required parameter to time series pipelines andAutoMLSearch#2697Changed
TimeSeriesBaselineEstimatorto only work on a time series pipeline with aDelayedFeaturesTransformer#2697Added
X_trainandy_trainas required parameters forpredictandpredict_probain time series pipelines #2697Added
training_dataandtraining_targetas required parameters toexplain_predictionsandexplain_predictions_best_worstfor time series pipelines #2697
- v0.32.0 Aug. 31, 2021
- Enhancements
Allow string for
engineparameter forAutoMLSearch#2667Add
ProphetRegressorto AutoML #2619Integrated
DefaultAlgorithmintoAutoMLSearch#2634Removed SVM “linear” and “precomputed” kernel hyperparameter options, and improved default parameters #2651
Updated
ComponentGraphinitalization to raiseValueErrorwhen user attempts to use.yfor a component that does not produce a tuple output #2662Updated to support Woodwork 0.6.0 #2690
Updated pipeline
graph()to distingush X and y edges #2654Added
DropRowsTransformercomponent #2692Added
DROP_ROWSto_make_component_list_from_actionsand clean up metadata #2694Add new ensembler component #2653
- Fixes
Updated Oversampler logic to select best SMOTE based on component input instead of pipeline input #2695
Added ability to explicitly close DaskEngine resources to improve runtime and reduce Dask warnings #2667
Fixed partial dependence bug for ensemble pipelines #2714
Updated
TargetLeakageDataCheckto maintain user-selected logical types #2711
Warning
- Breaking Changes
Renamed the current top level
searchmethod tosearch_iterativeand defined a newsearchmethod for theDefaultAlgorithm#2634Replaced
SMOTEOversampler,SMOTENOversamplerandSMOTENCOversamplerwith consolidatedOversamplercomponent #2695Removed
LinearRegressorfrom the list of defaultAutoMLSearchestimators due to poor performance #2660
- v0.31.0 Aug. 19, 2021
- Enhancements
Updated the high variance check in AutoMLSearch to be robust to a variety of objectives and cv scores #2622
Use Woodwork’s outlier detection for the
OutliersDataCheck#2637Added ability to utilize instantiated components when creating a pipeline #2643
Sped up the all Nan and unknown check in
infer_feature_types#2661
Fixes
- Testing Changes
Speed up CI by splitting Prophet tests into a separate workflow in GitHub #2644
Warning
- Breaking Changes
TimeSeriesRegressionPipelineno longer inherits fromTimeSeriesRegressionPipeline#2649
- v0.30.2 Aug. 16, 2021
- Fixes
Updated changelog and version numbers to match the release. Release 0.30.1 was release erroneously without a change to the version numbers. 0.30.2 replaces it.
- v0.30.1 Aug. 12, 2021
- Enhancements
Added
DatetimeFormatDataCheckfor time series problems #2603Added
ProphetRegressorto estimators #2242Updated
ComponentGraphto handle not calling samplers’ transform during predict, and updated samplers’ transform methods s.t.fit_transformis equivalent tofit(X, y).transform(X, y)#2583Updated
ComponentGraph_validate_component_dictlogic to be stricter about input values #2599Patched bug in
xgboostestimators where predicting on a feature matrix of only booleans would throw an exception. #2602Updated
ARIMARegressorto use relative forecasting to predict values #2613Added support for creating pipelines without an estimator as the final component and added
transform(X, y)method to pipelines and component graphs #2625Updated to support Woodwork 0.5.1 #2610
- Fixes
Updated
AutoMLSearchto dropARIMARegressorfromallowed_estimatorsif an incompatible frequency is detected #2632Updated
get_best_sampler_for_datato consider all non-numeric datatypes as categorical for SMOTE #2590Fixed inconsistent test results from TargetDistributionDataCheck #2608
Adopted vectorized pd.NA checking for Woodwork 0.5.1 support #2626
Pinned upper version of astroid to 2.6.6 to keep ReadTheDocs working. #2638
- Changes
Renamed SMOTE samplers to SMOTE oversampler #2595
Changed
partial_dependenceandgraph_partial_dependenceto raise aPartialDependenceErrorinstead ofValueError. This is not a breaking change becausePartialDependenceErroris a subclass ofValueError#2604Cleaned up code duplication in
ComponentGraph#2612Stored predict_proba results in .x for intermediate estimators in ComponentGraph #2629
- Documentation Changes
To avoid local docs build error, only add warning disable and download headers on ReadTheDocs builds, not locally #2617
- Testing Changes
Updated partial_dependence tests to change the element-wise comparison per the Plotly 5.2.1 upgrade #2638
Changed the lint CI job to only check against python 3.9 via the -t flag #2586
Installed Prophet in linux nightlies test and fixed
test_all_components#2598Refactored and fixed all
make_pipelinetests to assert correct order and address new Woodwork Unknown type inference #2572Removed
component_graphsas a global variable intest_component_graphs.py#2609
Warning
- Breaking Changes
Renamed SMOTE samplers to SMOTE oversampler. Please use
SMOTEOversampler,SMOTENCOversampler,SMOTENOversamplerinstead ofSMOTESampler,SMOTENCSampler, andSMOTENSampler#2595
- v0.30.0 Aug. 3, 2021
- Enhancements
Added
LogTransformerandTargetDistributionDataCheck#2487Issue a warning to users when a pipeline parameter passed in isn’t used in the pipeline #2564
Added Gini coefficient as an objective #2544
Added
reprtoComponentGraph#2565Added components to extract features from
URLandEmailAddressLogical Types #2550Added support for NaN values in
TextFeaturizer#2532Added
SelectByTypetransformer #2531Added separate thresholds for percent null rows and columns in
HighlyNullDataCheck#2562Added support for NaN natural language values #2577
- Fixes
Raised error message for types
URL,NaturalLanguage, andEmailAddressinpartial_dependence#2573
- Changes
Updated
PipelineBaseimplementation for creating pipelines from a list of components #2549Moved
get_hyperparameter_rangestoPipelineBaseclass from automl/utils module #2546Renamed
ComponentGraph’sget_parentstoget_inputs#2540Removed
ComponentGraph.linearized_component_graphandComponentGraph.from_list#2556Updated
ComponentGraphto enforce requiring .x and .y inputs for each component in the graph #2563Renamed existing ensembler implementation from
StackedEnsemblerstoSklearnStackedEnsemblers#2578
- Testing Changes
Added test that makes sure
split_datadoes not shuffle for time series problems #2552
Warning
- Breaking Changes
Moved
get_hyperparameter_rangestoPipelineBaseclass from automl/utils module #2546Renamed
ComponentGraph’sget_parentstoget_inputs#2540Removed
ComponentGraph.linearized_component_graphandComponentGraph.from_list#2556Updated
ComponentGraphto enforce requiring .x and .y inputs for each component in the graph #2563
- v0.29.0 Jul. 21, 2021
- Enhancements
Updated 1-way partial dependence support for datetime features #2454
Added details on how to fix error caused by broken ww schema #2466
Added ability to use built-in pickle for saving AutoMLSearch #2463
Updated our components and component graphs to use latest features of ww 0.4.1, e.g.
concat_columnsand drop in-place. #2465Added new, concurrent.futures based engine for parallel AutoML #2506
Added support for new Woodwork
Unknowntype in AutoMLSearch #2477Updated our components with an attribute that describes if they modify features or targets and can be used in list API for pipeline initialization #2504
Updated
ComponentGraphto accept X and y as inputs #2507Removed unused
TARGET_BINARY_INVALID_VALUESfromDataCheckMessageCodeenum and fixed formatting of objective documentation #2520Added
EvalMLAlgorithm#2525Added support for NaN values in
TextFeaturizer#2532
- Fixes
Fixed
FraudCostobjective and reverted threshold optimization method for binary classification toGolden#2450Added custom exception message for partial dependence on features with scales that are too small #2455
Ensures the typing for Ordinal and Datetime ltypes are passed through _retain_custom_types_and_initalize_woodwork #2461
Updated to work with Pandas 1.3.0 #2442
Updated to work with sktime 0.7.0 #2499
- Testing Changes
Warning
- Breaking Changes
NaN values in the Natural Language type are no longer supported by the Imputer with the pandas upgrade. #2477
- v0.28.0 Jul. 2, 2021
- Fixes
Deleted unreachable line from
IterativeAlgorithm#2464
- v0.27.0 Jun. 22, 2021
- Enhancements
Adds force plots for prediction explanations #2157
Removed self-reference from
AutoMLSearch#2304Added support for nonlinear pipelines for
generate_pipeline_code#2332Added
inverse_transformmethod to pipelines #2256Add optional automatic update checker #2350
Added
search_ordertoAutoMLSearch’srankingsandfull_rankingstables #2345Updated threshold optimization method for binary classification #2315
Updated demos to pull data from S3 instead of including demo data in package #2387
Upgrade woodwork version to v0.4.1 #2379
- Fixes
Preserve user-specified woodwork types throughout pipeline fit/predict #2297
Fixed
ComponentGraphappending target tofinal_component_featuresif there is a component that returns both X and y #2358Fixed partial dependence graph method failing on multiclass problems when the class labels are numeric #2372
Added
thresholding_objectiveargument toAutoMLSearchfor binary classification problems #2320Added change for
k_neighborsparameter in SMOTE Oversamplers to automatically handle small samples #2375Changed naming for
Logistic Regression Classifierfile #2399Pinned pytest-timeout to fix minimum dependence checker #2425
Replaced
Elastic Net Classifierbase class withLogistsic Regressionto avoidNaNoutputs #2420
- Changes
Cleaned up
PipelineBase’scomponent_graphand_component_graphattributes. UpdatedPipelineBase__repr__and added__eq__forComponentGraph#2332Added and applied
blacklinting package to the EvalML repo in place ofautopep8#2306Separated custom_hyperparameters from pipelines and added them as an argument to
AutoMLSearch#2317Replaced allowed_pipelines with allowed_component_graphs #2364
Removed private method
_compute_features_during_fitfromPipelineBase#2359Updated
compute_orderinComponentGraphto be a read-only property #2408Unpinned PyZMQ version in requirements.txt #2389
Uncapping LightGBM version in requirements.txt #2405
Updated minimum version of plotly #2415
Removed
SensitivityLowAlertobjective from core objectives #2418
- Testing Changes
Update minimum unit tests to run on all pull requests #2314
Pass token to authorize uploading of codecov reports #2344
Add
pytest-timeout. All tests that run longer than 6 minutes will fail. #2374Separated the dask tests out into separate github action jobs to isolate dask failures. #2376
Refactored dask tests #2377
Added the combined dask/non-dask unit tests back and renamed the dask only unit tests. #2382
Sped up unit tests and split into separate jobs #2365
Change CI job names, run lint for python 3.9, run nightlies on python 3.8 at 3am EST #2395 #2398
Set fail-fast to false for CI jobs that run for PRs #2402
Warning
- Breaking Changes
AutoMLSearch will accept allowed_component_graphs instead of allowed_pipelines #2364
Removed
PipelineBase’s_component_graphattribute. UpdatedPipelineBase__repr__and added__eq__forComponentGraph#2332pipeline_parameters will no longer accept skopt.space variables since hyperparameter ranges will now be specified through custom_hyperparameters #2317
- v0.25.0 Jun. 01, 2021
Warning
- v0.24.2 May. 24, 2021
- Fixes
Set default n_jobs to 1 for StackedEnsembleClassifier and StackedEnsembleRegressor until fix for text-based parallelism in sklearn stacking can be found #2295
- Changes
Updated
start_iteration_callbackto accept a pipeline instance instead of a pipeline class and no longer accept pipeline parameters as a parameter #2290Refactored
calculate_permutation_importancemethod and add per-column permutation importance method #2302Updated logging information in
AutoMLSearch.__init__to clarify pipeline generation #2263
- Documentation Changes
Minor changes to the release procedure #2230
Warning
- Breaking Changes
Updated
start_iteration_callbackto accept a pipeline instance instead of a pipeline class and no longer accept pipeline parameters as a parameter #2290Moved
default_parameterstoComponentGraphfromPipelineBase. A pipeline’sdefault_parametersis now accessible viapipeline.component_graph.default_parameters#2307
- v0.24.1 May. 16, 2021
- Documentation Changes
Capped Sphinx version under 4.0.0 #2244
- v0.24.0 May. 04, 2021
- Enhancements
Added date_index as a required parameter for TimeSeries problems #2217
Have the
OneHotEncoderreturn the transformed columns as booleans rather than floats #2170Added Oversampler transformer component to EvalML #2079
Added Undersampler to AutoMLSearch, as well as arguments
_sampler_methodandsampler_balanced_ratio#2128Updated prediction explanations functions to allow pipelines with XGBoost estimators #2162
Added partial dependence for datetime columns #2180
Update precision-recall curve with positive label index argument, and fix for 2d predicted probabilities #2090
Add pct_null_rows to
HighlyNullDataCheck#2211Added a standalone AutoML search method for convenience, which runs data checks and then runs automl #2152
Make the first batch of AutoML have a predefined order, with linear models first and complex models last #2223 #2225
Added sampling dictionary support to
BalancedClassficationSampler#2235
- Changes
Deleted baseline pipeline classes #2202
Reverting user specified date feature PR #2155 until pmdarima installation fix is found #2214
Updated pipeline API to accept component graph and other class attributes as instance parameters. Old pipeline API still works but will not be supported long-term. #2091
Removed all old datasplitters from EvalML #2193
Deleted
make_pipeline_from_components#2218
- Documentation Changes
- Testing Changes
Use machineFL user token for dependency update bot, and add more reviewers #2189
Warning
- Breaking Changes
All baseline pipeline classes (
BaselineBinaryPipeline,BaselineMulticlassPipeline,BaselineRegressionPipeline, etc.) have been deleted #2202Updated pipeline API to accept component graph and other class attributes as instance parameters. Old pipeline API still works but will not be supported long-term. Pipelines can now be initialized by specifying the component graph as the first parameter, and then passing in optional arguments such as
custom_name,parameters, etc. For example,BinaryClassificationPipeline(["Random Forest Classifier"], parameters={}). #2091Removed all old datasplitters from EvalML #2193
Deleted utility method
make_pipeline_from_components#2218
- v0.23.0 Apr. 20, 2021
- Enhancements
Refactored
EngineBaseandSequentialEngineapi. AddingDaskEngine#1975.Added optional
engineargument toAutoMLSearch#1975Added a warning about how time series support is still in beta when a user passes in a time series problem to
AutoMLSearch#2118Added
NaturalLanguageNaNDataCheckdata check #2122Added ValueError to
partial_dependenceto prevent users from computing partial dependence on columns with all NaNs #2120Added standard deviation of cv scores to rankings table #2154
- Fixes
Fixed
BalancedClassificationDataCVSplit,BalancedClassificationDataTVSplit, andBalancedClassificationSamplerto useminority:majorityratio instead ofmajority:minority#2077Fixed bug where two-way partial dependence plots with categorical variables were not working correctly #2117
Fixed bug where
hyperparameterswere not displaying properly for pipelines with a listcomponent_graphand duplicate components #2133Fixed bug where
pipeline_parametersargument inAutoMLSearchwas not applied to pipelines passed in asallowed_pipelines#2133Fixed bug where
AutoMLSearchwas not applying custom hyperparameters to pipelines with a listcomponent_graphand duplicate components #2133
- Changes
Removed
hyperparameter_rangesfrom Undersampler and renamedbalanced_ratiotosampling_ratiofor samplers #2113Renamed
TARGET_BINARY_NOT_TWO_EXAMPLES_PER_CLASSdata check message code toTARGET_MULTICLASS_NOT_TWO_EXAMPLES_PER_CLASS#2126Modified one-way partial dependence plots of categorical features to display data with a bar plot #2117
Renamed
scorecolumn forautoml.rankingsasmean_cv_score#2135Remove ‘warning’ from docs tool output #2031
Warning
- Breaking Changes
Renamed
balanced_ratiotosampling_ratiofor theBalancedClassificationDataCVSplit,BalancedClassificationDataTVSplit,BalancedClassficationSampler, and Undersampler #2113Deleted the “errors” key from automl results #1975
Deleted the
raise_and_save_error_callbackand thelog_and_save_error_callback#1975Fixed
BalancedClassificationDataCVSplit,BalancedClassificationDataTVSplit, andBalancedClassificationSamplerto use minority:majority ratio instead of majority:minority #2077
- v0.22.0 Apr. 06, 2021
- Enhancements
Added a GitHub Action for
linux_unit_tests#2013Added recommended actions for
InvalidTargetDataCheck, updated_make_component_list_from_actionsto address new action, and addedTargetImputercomponent #1989Updated
AutoMLSearch._check_for_high_varianceto not emitRuntimeWarning#2024Added exception when pipeline passed to
explain_predictionsis aStacked Ensemblepipeline #2033Added sensitivity at low alert rates as an objective #2001
Added
Undersamplertransformer component #2030
- Fixes
Updated Engine’s
train_batchto apply undersampling #2038Fixed bug in where Time Series Classification pipelines were not encoding targets in
predictandpredict_proba#2040Fixed data splitting errors if target is float for classification problems #2050
Pinned
docutilsto <0.17 to fix ReadtheDocs warning issues #2088
Testing Changes
- v0.21.0 Mar. 24, 2021
- Enhancements
Changed
AutoMLSearchto defaultoptimize_thresholdsto True #1943Added multiple oversampling and undersampling sampling methods as data splitters for imbalanced classification #1775
Added params to balanced classification data splitters for visibility #1966
Updated
make_pipelineto not addImputerif input data does not have numeric or categorical columns #1967Updated
ClassImbalanceDataCheckto better handle multiclass imbalances #1986Added recommended actions for the output of data check’s
validatemethod #1968Added error message for
partial_dependencewhen features are mostly the same value #1994Updated
OneHotEncoderto drop one redundant feature by default for features with two categories #1997Added a
PolynomialDecomposercomponent #1992Added
DateTimeNaNDataCheckdata check #2039
Documentation Changes
Warning
- Breaking Changes
Changed
AutoMLSearchto defaultoptimize_thresholdsto True #1943Removed
data_checksparameter,data_check_resultsand data checks logic fromAutoMLSearch. To run the data checks which were previously run by default inAutoMLSearch, please callDefaultDataChecks().validate(X_train, y_train)or take a look at our documentation for more examples. #1935Deleted
random_stateargument #1985
- v0.20.0 Mar. 10, 2021
- Enhancements
Added a GitHub Action for Detecting dependency changes #1933
Create a separate CV split to train stacked ensembler on for AutoMLSearch #1814
Added a GitHub Action for Linux unit tests #1846
Added
ARIMARegressorestimator #1894Added
DataCheckActionclass andDataCheckActionCodeenum #1896Updated
Woodworkrequirement tov0.0.10#1900Added
BalancedClassificationDataCVSplitandBalancedClassificationDataTVSplitto AutoMLSearch #1875Update default classification data splitter to use downsampling for highly imbalanced data #1875
Updated
describe_pipelineto return more information, includingidof pipelines used for ensemble models #1909Added utility method to create list of components from a list of
DataCheckAction#1907Updated
validatemethod to include aactionkey in returned dictionary for allDataCheck``and ``DataChecks#1916Aggregating the shap values for predictions that we know the provenance of, e.g. OHE, text, and date-time. #1901
Improved error message when custom objective is passed as a string in
pipeline.score#1941Added
score_pipelinesandtrain_pipelinesmethods toAutoMLSearch#1913Added support for
pandasversion 1.2.0 #1708Added
score_batchandtrain_batchabstact methods toEngineBaseand implementations inSequentialEngine#1913Added ability to handle index columns in
AutoMLSearchandDataChecks#2138
- Fixes
Removed CI check for
check_dependencies_updated_linux#1950Added metaclass for time series pipelines and fix binary classification pipeline
predictnot using objective if it is passed as a named argument #1874Fixed stack trace in prediction explanation functions caused by mixed string/numeric pandas column names #1871
Fixed stack trace caused by passing pipelines with duplicate names to
AutoMLSearch#1932Fixed
AutoMLSearch.get_pipelinesreturning pipelines with the same attributes #1958
- Changes
Reversed GitHub Action for Linux unit tests until a fix for report generation is found #1920
Updated
add_resultsinAutoMLAlgorithmto take in entire pipeline results dictionary fromAutoMLSearch#1891Updated
ClassImbalanceDataCheckto look for severe class imbalance scenarios #1905Deleted the
explain_predictionfunction #1915Removed
HighVarianceCVDataCheckand convered it to anAutoMLSearchmethod instead #1928Removed warning in
InvalidTargetDataCheckreturned when numeric binary classification targets are not (0, 1) #1959
- Documentation Changes
Updated
model_understanding.ipynbto demo the two-way partial dependence capability #1919
Testing Changes
Warning
- v0.19.0 Feb. 23, 2021
- Enhancements
Added a GitHub Action for Python windows unit tests #1844
Added a GitHub Action for checking updated release notes #1849
Added a GitHub Action for Python lint checks #1837
Adjusted
explain_prediction,explain_predictionsandexplain_predictions_best_worstto handle timeseries problems. #1818Updated
InvalidTargetDataCheckto check for mismatched indices in target and features #1816Updated
Woodworkstructures returned from components to supportWoodworklogical type overrides set by the user #1784Updated estimators to keep track of input feature names during
fit()#1794Updated
visualize_decision_treeto include feature names in output #1813Added
is_bounded_like_percentageproperty for objectives. If true, thecalculate_percent_differencemethod will return the absolute difference rather than relative difference #1809Added full error traceback to AutoMLSearch logger file #1840
Changed
TargetEncoderto preserve custom indices in the data #1836Refactored
explain_predictionsandexplain_predictions_best_worstto only compute features once for all rows that need to be explained #1843Added custom random undersampler data splitter for classification #1857
Updated
OutliersDataCheckimplementation to calculate the probability of having no outliers #1855Added
Enginespipeline processing API #1838
- Fixes
Changed EngineBase random_state arg to random_seed and same for user guide docs #1889
- Changes
Modified
calculate_percent_differenceso that division by 0 is now inf rather than nan #1809Removed
text_columnsparameter fromLSAandTextFeaturizercomponents #1652Added
random_seedas an argument to our automl/pipeline/component API. Usingrandom_statewill raise a warning #1798Added
DataCheckErrormessage inInvalidTargetDataCheckif input target is None and removed exception raised #1866
Documentation Changes
Warning
- Breaking Changes
Added a deprecation warning to
explain_prediction. It will be deleted in the next release. #1860
- v0.18.2 Feb. 10, 2021
- Enhancements
Added uniqueness score data check #1785
Added “dataframe” output format for prediction explanations #1781
Updated LightGBM estimators to handle
pandas.MultiIndex#1770Sped up permutation importance for some pipelines #1762
Added sparsity data check #1797
Confirmed support for threshold tuning for binary time series classification problems #1803
Fixes
Changes
- Documentation Changes
Added section on conda to the contributing guide #1771
Updated release process to reflect freezing main before perf tests #1787
Moving some prs to the right section of the release notes #1789
Tweak README.md. #1800
Fixed back arrow on install page docs #1795
Fixed docstring for ClassImbalanceDataCheck.validate() #1817
Testing Changes
- v0.18.1 Feb. 1, 2021
- Enhancements
Added
graph_t_sneas a visualization tool for high dimensional data #1731Added the ability to see the linear coefficients of features in linear models terms #1738
Added support for
scikit-learnv0.24.0#1733Added support for
scipyv1.6.0#1752Added SVM Classifier and Regressor to estimators #1714 #1761
Testing Changes
Warning
- v0.18.0 Jan. 26, 2021
- Enhancements
Added RMSLE, MSLE, and MAPE to core objectives while checking for negative target values in
invalid_targets_data_check#1574Added validation checks for binary problems with regression-like datasets and multiclass problems without true multiclass targets in
invalid_targets_data_check#1665Added time series support for
make_pipeline#1566Added target name for output of pipeline
predictmethod #1578Added multiclass check to
InvalidTargetDataCheckfor two examples per class #1596Added support for
graphvizv0.16#1657Enhanced time series pipelines to accept empty features #1651
Added KNN Classifier to estimators. #1650
Added support for list inputs for objectives #1663
Added support for
AutoMLSearchto handle time series classification pipelines #1666Enhanced
DelayedFeaturesTransformerto encode categorical features and targets before delaying them #1691Added 2-way dependence plots. #1690
Added ability to directly iterate through components within Pipelines #1583
- Fixes
Fixed inconsistent attributes and added Exceptions to docs #1673
Fixed
TargetLeakageDataCheckto use Woodworkmutual_informationrather than using Pandas’ Pearson Correlation #1616Fixed thresholding for pipelines in
AutoMLSearchto only threshold binary classification pipelines #1622 #1626Updated
load_datato return Woodwork structures and update default parameter value forindextoNone#1610Pinned scipy at < 1.6.0 while we work on adding support #1629
Fixed data check message formatting in
AutoMLSearch#1633Addressed stacked ensemble component for
scikit-learnv0.24 support by settingshuffle=Truefor default CV #1613Fixed bug where
Imputerreset the index onX#1590Fixed
AutoMLSearchstacktrace when a cutom objective was passed in as a primary objective or additional objective #1575Fixed custom index bug for
MAPEobjective #1641Fixed index bug for
TextFeaturizerandLSAcomponents #1644Limited
load_frauddataset loaded intoautoml.ipynb#1646add_to_rankingsupdatesAutoMLSearch.best_pipelinewhen necessary #1647Fixed bug where time series baseline estimators were not receiving
gapandmax_delayinAutoMLSearch#1645Fixed jupyter notebooks to help the RTD buildtime #1654
Added
positive_onlyobjectives tonon_core_objectives#1661Fixed stacking argument
n_jobsfor IterativeAlgorithm #1706Updated CatBoost estimators to return self in
.fit()rather than the underlying model for consistency #1701Added ability to initialize pipeline parameters in
AutoMLSearchconstructor #1676
- Changes
Added labeling to
graph_confusion_matrix#1632Rerunning search for
AutoMLSearchresults in a message thrown rather than failing the search, and removedhas_searchedproperty #1647Changed tuner class to allow and ignore single parameter values as input #1686
Capped LightGBM version limit to remove bug in docs #1711
Removed support for np.random.RandomState in EvalML #1727
- Documentation Changes
Update Model Understanding in the user guide to include
visualize_decision_tree#1678Updated docs to include information about
AutoMLSearchcallback parameters and methods #1577Updated docs to prompt users to install graphiz on Mac #1656
Added
infer_feature_typesto thestart.ipynbguide #1700Added multicollinearity data check to API reference and docs #1707
Testing Changes
Warning
- Breaking Changes
Removed
has_searchedproperty fromAutoMLSearch#1647Components and pipelines return
Woodworkdata structures instead ofpandasdata structures #1668Removed support for np.random.RandomState in EvalML. Rather than passing
np.random.RandomStateas component and pipeline random_state values, we use int random_seed #1727
- v0.17.0 Dec. 29, 2020
- Enhancements
Added
save_plotthat allows for saving figures from different backends #1588Added
LightGBM Regressorto regression components #1459Added
visualize_decision_treefor tree visualization withdecision_tree_data_from_estimatoranddecision_tree_data_from_pipelineto reformat tree structure output #1511Added DFS Transformer component into transformer components #1454
Added
MAPEto the standard metrics for time series problems and update objectives #1510Added
graph_prediction_vs_actual_over_timeandget_prediction_vs_actual_over_time_datato the model understanding module for time series problems #1483Added a
ComponentGraphclass that will support future pipelines as directed acyclic graphs #1415Updated data checks to accept
Woodworkdata structures #1481Added parameter to
InvalidTargetDataCheckto show only top unique values rather than all unique values #1485Added multicollinearity data check #1515
Added baseline pipeline and components for time series regression problems #1496
Added more information to users about ensembling behavior in
AutoMLSearch#1527Add woodwork support for more utility and graph methods #1544
Changed
DateTimeFeaturizerto encode features as int #1479Return trained pipelines from
AutoMLSearch.best_pipeline#1547Added utility method so that users can set feature types without having to learn about Woodwork directly #1555
Added Linear Discriminant Analysis transformer for dimensionality reduction #1331
Added multiclass support for
partial_dependenceandgraph_partial_dependence#1554Added
TimeSeriesBinaryClassificationPipelineandTimeSeriesMulticlassClassificationPipelineclasses #1528Added
make_data_splittermethod for easier automl data split customization #1568Integrated
ComponentGraphclass into Pipelines for full non-linear pipeline support #1543Update
AutoMLSearchconstructor to take training data instead ofsearchandadd_to_leaderboard#1597Update
split_datahelper args #1597Add problem type utils
is_regression,is_classification,is_timeseries#1597Rename
AutoMLSearchdata_splitarg todata_splitter#1569
- Fixes
Fix AutoML not passing CV folds to
DefaultDataChecksfor usage byClassImbalanceDataCheck#1619Fix Windows CI jobs: install
numbavia conda, required forshap#1490Added custom-index support for reset-index-get_prediction_vs_actual_over_time_data #1494
Fix
generate_pipeline_codeto account for boolean and None differences between Python and JSON #1524 #1531Set max value for plotly and xgboost versions while we debug CI failures with newer versions #1532
Undo version pinning for plotly #1533
Fix ReadTheDocs build by updating the version of
setuptools#1561Set
random_stateof data splitter in AutoMLSearch to take int to keep consistency in the resulting splits #1579Pin sklearn version while we work on adding support #1594
Pin pandas at <1.2.0 while we work on adding support #1609
Pin graphviz at < 0.16 while we work on adding support #1609
- Changes
Reverting
save_graph#1550 to resolve kaleido build issues #1585Update circleci badge to apply to
main#1489Added script to generate github markdown for releases #1487
Updated selection using pandas
dtypesto selecting using Woodwork logical types #1551Updated dependencies to fix
ImportError: cannot import name 'MaskedArray' from 'sklearn.utils.fixes'error and to address Woodwork and Featuretool dependencies #1540Made
get_prediction_vs_actual_data()a public method #1553Updated
Woodworkversion requirement to v0.0.7 #1560Move data splitters from
evalml.automl.data_splitterstoevalml.preprocessing.data_splitters#1597Rename “# Testing” in automl log output to “# Validation” #1597
- Testing Changes
Set
n_jobs=1in most unit tests to reduce memory #1505
Warning
- Breaking Changes
Updated minimal dependencies:
numpy>=1.19.1,pandas>=1.1.0,scikit-learn>=0.23.1,scikit-optimize>=0.8.1Updated
AutoMLSearch.best_pipelineto return a trained pipeline. Pass intrain_best_pipeline=Falseto AutoMLSearch in order to return an untrained pipeline.Pipeline component instances can no longer be iterated through using
Pipeline.component_graph#1543Update
AutoMLSearchconstructor to take training data instead ofsearchandadd_to_leaderboard#1597Update
split_datahelper args #1597Move data splitters from
evalml.automl.data_splitterstoevalml.preprocessing.data_splitters#1597Rename
AutoMLSearchdata_splitarg todata_splitter#1569
- v0.16.1 Dec. 1, 2020
- v0.16.0 Nov. 24, 2020
- Enhancements
Updated pipelines and
make_pipelineto acceptWoodworkinputs #1393Updated components to accept
Woodworkinputs #1423Added ability to freeze hyperparameters for
AutoMLSearch#1284Added
Target Encoderinto transformer components #1401Added callback for error handling in
AutoMLSearch#1403Added the index id to the
explain_predictions_best_worstoutput to help users identify which rows in their data are included #1365The top_k features displayed in
explain_predictions_*functions are now determined by the magnitude of shap values as opposed to thetop_klargest and smallest shap values. #1374Added a problem type for time series regression #1386
Added a
is_defined_for_problem_typemethod toObjectiveBase#1386Added a
random_stateparameter tomake_pipeline_from_componentsfunction #1411Added
DelayedFeaturesTransformer#1396Added a
TimeSeriesRegressionPipelineclass #1418Removed
core-requirements.txtfrom the package distribution #1429Updated data check messages to include a “code” and “details” fields #1451, #1462
Added a
TimeSeriesSplitdata splitter for time series problems #1441Added a
problem_configurationparameter to AutoMLSearch #1457
- Fixes
Fixed
IndexErrorraised inAutoMLSearchwhenensembling = Truebut only one pipeline to iterate over #1397Fixed stacked ensemble input bug and LightGBM warning and bug in
AutoMLSearch#1388Updated enum classes to show possible enum values as attributes #1391
Updated calls to
Woodwork’sto_pandas()toto_series()andto_dataframe()#1428Fixed bug in OHE where column names were not guaranteed to be unique #1349
Fixed bug with percent improvement of
ExpVarianceobjective on data with highly skewed target #1467Fix SimpleImputer error which occurs when all features are bool type #1215
- Changes
Changed
OutliersDataCheckto return the list of columns, rather than rows, that contain outliers #1377Simplified and cleaned output for Code Generation #1371
Updated data checks to return dictionary of warnings and errors instead of a list #1448
Updated
AutoMLSearchto passWoodworkdata structures to every pipeline (instead of pandas DataFrames) #1450Update
AutoMLSearchto default tomax_batches=1instead ofmax_iterations=5#1452Updated _evaluate_pipelines to consolidate side effects #1410
- Documentation Changes
Added description of CLA to contributing guide, updated description of draft PRs #1402
Updated documentation to include all data checks,
DataChecks, and usage of data checks in AutoML #1412Updated docstrings from
np.arraytonp.ndarray#1417Added section on stacking ensembles in AutoMLSearch documentation #1425
- Testing Changes
Removed
category_encodersfrom test-requirements.txt #1373Tweak codecov.io settings again to avoid flakes #1413
Modified
make lintto check notebook versions in the docs #1431Modified
make lint-fixto standardize notebook versions in the docs #1431Use new version of pull request Github Action for dependency check (#1443)
Reduced number of workers for tests to 4 #1447
Warning
- Breaking Changes
The
top_kandtop_k_featuresparameters inexplain_predictions_*functions now returnkfeatures as opposed to2 * kfeatures #1374Renamed
problem_typetoproblem_typesinRegressionObjective,BinaryClassificationObjective, andMulticlassClassificationObjective#1319Data checks now return a dictionary of warnings and errors instead of a list #1448
- v0.15.0 Oct. 29, 2020
- Enhancements
Added stacked ensemble component classes (
StackedEnsembleClassifier,StackedEnsembleRegressor) #1134Added stacked ensemble components to
AutoMLSearch#1253Added
DecisionTreeClassifierandDecisionTreeRegressorto AutoML #1255Added
graph_prediction_vs_actualinmodel_understandingfor regression problems #1252Added parameter to
OneHotEncoderto enable filtering for features to encode for #1249Added percent-better-than-baseline for all objectives to automl.results #1244
Added
HighVarianceCVDataCheckand replaced synonymous warning inAutoMLSearch#1254Added PCA Transformer component for dimensionality reduction #1270
Added
generate_pipeline_codeandgenerate_component_codeto allow for code generation given a pipeline or component instance #1306Added
PCA Transformercomponent for dimensionality reduction #1270Updated
AutoMLSearchto supportWoodworkdata structures #1299Added cv_folds to
ClassImbalanceDataCheckand added this check toDefaultDataChecks#1333Make
max_batchesargument toAutoMLSearch.searchpublic #1320Added text support to automl search #1062
Added
_pipelines_per_batchas a private argument toAutoMLSearch#1355
- Fixes
Fixed ML performance issue with ordered datasets: always shuffle data in automl’s default CV splits #1265
Fixed broken
evalml infoCLI command #1293Fixed
boosting type='rf'for LightGBM Classifier, as well asnum_leaveserror #1302Fixed bug in
explain_predictions_best_worstwhere a custom index in the target variable would cause aValueError#1318Added stacked ensemble estimators to to
evalml.pipelines.__init__file #1326Fixed bug in OHE where calls to transform were not deterministic if
top_nwas less than the number of categories in a column #1324Fixed LightGBM warning messages during AutoMLSearch #1342
Fix warnings thrown during AutoMLSearch in
HighVarianceCVDataCheck#1346Fixed bug where TrainingValidationSplit would return invalid location indices for dataframes with a custom index #1348
Fixed bug where the AutoMLSearch
random_statewas not being passed to the created pipelines #1321
- Changes
Allow
add_to_rankingsto be called before AutoMLSearch is called #1250Removed Graphviz from test-requirements to add to requirements.txt #1327
Removed
max_pipelinesparameter fromAutoMLSearch#1264Include editable installs in all install make targets #1335
Made pip dependencies featuretools and nlp_primitives core dependencies #1062
Removed PartOfSpeechCount from TextFeaturizer transform primitives #1062
Added warning for
partial_dependencywhen the feature includes null values #1352
- Documentation Changes
Fixed and updated code blocks in Release Notes #1243
Added DecisionTree estimators to API Reference #1246
Changed class inheritance display to flow vertically #1248
Updated cost-benefit tutorial to use a holdout/test set #1159
Added
evalml infocommand to documentation #1293Miscellaneous doc updates #1269
Removed conda pre-release testing from the release process document #1282
Updates to contributing guide #1310
Added Alteryx footer to docs with Twitter and Github link #1312
Added documentation for evalml installation for Python 3.6 #1322
Added documentation changes to make the API Docs easier to understand #1323
Fixed documentation for
feature_importance#1353Added tutorial for running AutoML with text data #1357
Added documentation for woodwork integration with automl search #1361
- Testing Changes
Added tests for
jupyter_checkto handle IPython #1256Cleaned up
make_pipelinetests to test for all estimators #1257Added a test to check conda build after merge to main #1247
Removed code that was lacking codecov for
__main__.pyand unnecessary #1293Codecov: round coverage up instead of down #1334
Add DockerHub credentials to CI testing environment #1356
Add DockerHub credentials to conda testing environment #1363
Warning
- Breaking Changes
Renamed
LabelLeakageDataChecktoTargetLeakageDataCheck#1319max_pipelinesparameter has been removed fromAutoMLSearch. Please usemax_iterationsinstead. #1264AutoMLSearch.search()will now log a warning if the input is not aWoodworkdata structure (pandas,numpy) #1299Make
max_batchesargument toAutoMLSearch.searchpublic #1320Removed unused argument feature_types from AutoMLSearch.search #1062
- v0.14.1 Sep. 29, 2020
- Enhancements
Updated partial dependence methods to support calculating numeric columns in a dataset with non-numeric columns #1150
Added
get_feature_namesonOneHotEncoder#1193Added
detect_problem_typetoproblem_type/utils.pyto automatically detect the problem type given targets #1194Added LightGBM to
AutoMLSearch#1199Updated
scikit-learnandscikit-optimizeto use latest versions - 0.23.2 and 0.8.1 respectively #1141Added
__str__and__repr__for pipelines and components #1218Included internal target check for both training and validation data in
AutoMLSearch#1226Added
ProblemTypes.all_problem_typeshelper to get list of supported problem types #1219Added
DecisionTreeClassifierandDecisionTreeRegressorclasses #1223Added
ProblemTypes.all_problem_typeshelper to get list of supported problem types #1219DataCheckscan now be parametrized by passing a list ofDataCheckclasses and a parameter dictionary #1167Added first CV fold score as validation score in
AutoMLSearch.rankings#1221Updated
flake8configuration to enable linting on__init__.pyfiles #1234Refined
make_pipeline_from_componentsimplementation #1204
- Changes
Added
allow_writing_filesas a named argument to CatBoost estimators. #1202Added
solverandmulti_classas named arguments toLogisticRegressionClassifier#1202Replaced pipeline’s
._transformmethod to evaluate all the preprocessing steps of a pipeline with.compute_estimator_features#1231Changed default large dataset train/test splitting behavior #1205
- Documentation Changes
Included description of how to access the component instances and features for pipeline user guide #1163
Updated API docs to refer to target as “target” instead of “labels” for non-classification tasks and minor docs cleanup #1160
Added Class Imbalance Data Check to
api_reference.rst#1190 #1200Added pipeline properties to API reference #1209
Clarified what the objective parameter in AutoML is used for in AutoML API reference and AutoML user guide #1222
Updated API docs to include
skopt.space.Categoricaloption for component hyperparameter range definition #1228Added install documentation for
libompin order to use LightGBM on Mac #1233Improved description of
max_iterationsin documentation #1212Removed unused code from sphinx conf #1235
Testing Changes
Warning
- Breaking Changes
DefaultDataChecksnow accepts aproblem_typeparameter that must be specified #1167Pipeline’s
._transformmethod to evaluate all the preprocessing steps of a pipeline has been replaced with.compute_estimator_features#1231get_objectiveshas been renamed toget_core_objectives. This function will now return a list of valid objective instances #1230
- v0.13.2 Sep. 17, 2020
- Enhancements
Added
output_formatfield to explain predictions functions #1107Modified
get_objectiveandget_objectivesto be able to return any objective inevalml.objectives#1132Added a
return_instanceboolean parameter toget_objective#1132Added
ClassImbalanceDataCheckto determine whether target imbalance falls below a given threshold #1135Added label encoder to LightGBM for binary classification #1152
Added labels for the row index of confusion matrix #1154
Added
AutoMLSearchobject as another parameter in search callbacks #1156Added the corresponding probability threshold for each point displayed in
graph_roc_curve#1161Added
__eq__forComponentBaseandPipelineBase#1178Added support for multiclass classification for
roc_curve#1164Added
categoriesaccessor toOneHotEncoderfor listing the categories associated with a feature #1182Added utility function to create pipeline instances from a list of component instances #1176
- Fixes
Fixed XGBoost column names for partial dependence methods #1104
Removed dead code validating column type from
TextFeaturizer#1122Fixed issue where
Imputercannot fit when there is None in a categorical or boolean column #1144OneHotEncoderpreserves the custom index in the input data #1146Fixed representation for
ModelFamily#1165Removed duplicate
nbsphinxdependency indev-requirements.txt#1168Users can now pass in any valid kwargs to all estimators #1157
Remove broken accessor
OneHotEncoder.get_feature_namesand unneeded base class #1179Removed LightGBM Estimator from AutoML models #1186
- Documentation Changes
Fixed API docs for
AutoMLSearchadd_result_callback#1113Added a step to our release process for pushing our latest version to conda-forge #1118
Added warning for missing ipywidgets dependency for using
PipelineSearchPlotson Jupyterlab #1145Updated
README.mdexample to load demo dataset #1151Swapped mapping of breast cancer targets in
model_understanding.ipynb#1170
Warning
- Breaking Changes
get_objectivewill now return a class definition rather than an instance by default #1132Deleted
OPTIONSdictionary inevalml.objectives.utils.py#1132If specifying an objective by string, the string must now match the objective’s name field, case-insensitive #1132
- Passing “Cost Benefit Matrix”, “Fraud Cost”, “Lead Scoring”, “Mean Squared Log Error”,
“Recall”, “Recall Macro”, “Recall Micro”, “Recall Weighted”, or “Root Mean Squared Log Error” to
AutoMLSearchwill now result in aValueErrorrather than anObjectiveNotFoundError#1132
Search callbacks
start_iteration_callbackandadd_results_callbackhave changed to include a copy of the AutoMLSearch object as a third parameter #1156Deleted
OneHotEncoder.get_feature_namesmethod which had been broken for a while, in favor of pipelines’input_feature_names#1179Deleted empty base class
CategoricalEncoderwhichOneHotEncodercomponent was inheriting from #1176Results from
roc_curvewill now return as a list of dictionaries with each dictionary representing a class #1164max_pipelinesnow raises aDeprecationWarningand will be removed in the next release.max_iterationsshould be used instead. #1169
- v0.13.1 Aug. 25, 2020
- Enhancements
Added Cost-Benefit Matrix objective for binary classification #1038
Split
fill_valueintocategorical_fill_valueandnumeric_fill_valuefor Imputer #1019Added
explain_predictionsandexplain_predictions_best_worstfor explaining multiple predictions with SHAP #1016Added new LSA component for text featurization #1022
Added guide on installing with conda #1041
Added a “cost-benefit curve” util method to graph cost-benefit matrix scores vs. binary classification thresholds #1081
Standardized error when calling transform/predict before fit for pipelines #1048
Added
percent_better_than_baselineto AutoML search rankings and full rankings table #1050Added one-way partial dependence and partial dependence plots #1079
Added “Feature Value” column to prediction explanation reports. #1064
Added
max_batchesparameter toAutoMLSearch#1087
- Fixes
Updated
TextFeaturizercomponent to no longer require an internet connection to run #1022Fixed non-deterministic element of
TextFeaturizertransformations #1022Added a StandardScaler to all ElasticNet pipelines #1065
Updated cost-benefit matrix to normalize score #1099
Fixed logic in
calculate_percent_differenceso that it can handle negative values #1100
- Changes
Added
needs_fittingproperty toComponentBase#1044Updated references to data types to use datatype lists defined in
evalml.utils.gen_utils#1039Remove maximum version limit for SciPy dependency #1051
Moved
all_componentsand other component importers into runtime methods #1045Consolidated graphing utility methods under
evalml.utils.graph_utils#1060Made slight tweaks to how
TextFeaturizerusesfeaturetools, and did some refactoring of that and of LSA #1090Changed
show_all_featuresparameter intoimportance_threshold, which allows for thresholding feature importance #1097, #1103
Warning
- v0.12.2 Aug. 6, 2020
- v0.12.0 Aug. 3, 2020
- Enhancements
Added string and categorical targets support for binary and multiclass pipelines and check for numeric targets for
DetectLabelLeakagedata check #932Added clear exception for regression pipelines if target datatype is string or categorical #960
Added target column names and class labels in
predictandpredict_probaoutput for pipelines #951Added
_compute_shap_valuesandnormalize_valuestopipelines/explanationsmodule #958Added
explain_predictionfeature which explains single predictions with SHAP #974Added Imputer to allow different imputation strategies for numerical and categorical dtypes #991
Added support for configuring logfile path using env var, and don’t create logger if there are filesystem errors #975
Updated catboost estimators’ default parameters and automl hyperparameter ranges to speed up fit time #998
- Fixes
Fixed ReadtheDocs warning failure regarding embedded gif #943
Removed incorrect parameter passed to pipeline classes in
_add_baseline_pipelines#941Added universal error for calling
predict,predict_proba,transform, andfeature_importancesbefore fitting #969, #994Made
TextFeaturizercomponent and pip dependenciesfeaturetoolsandnlp_primitivesoptional #976Updated imputation strategy in automl to no longer limit impute strategy to
most_frequentfor all features if there are any categorical columns #991Fixed
UnboundLocalErrorforcv_pipelinewhen automl search errors #996Fixed
Imputerto reset dataframe index to preserve behavior expected fromSimpleImputer#1009
- Changes
Moved
get_estimatorstoevalml.pipelines.components.utils#934Modified Pipelines to raise
PipelineScoreErrorwhen they encounter an error during scoring #936Moved
evalml.model_families.list_model_familiestoevalml.pipelines.components.allowed_model_families#959Renamed
DateTimeFeaturizationtoDateTimeFeaturizer#977Added check to stop search and raise an error if all pipelines in a batch return NaN scores #1015
- Documentation Changes
Updated
README.md#963Reworded message when errors are returned from data checks in search #982
Added section on understanding model predictions with
explain_predictionto User Guide #981Added a section to the user guide and api reference about how XGBoost and CatBoost are not fully supported. #992
Added custom components section in user guide #993
Updated FAQ section formatting #997
Updated release process documentation #1003
Warning
- Breaking Changes
get_estimatorshas been moved toevalml.pipelines.components.utils(previously was underevalml.pipelines.utils) #934Removed the
raise_errorsflag in AutoML search. All errors during pipeline evaluation will be caught and logged. #936evalml.model_families.list_model_familieshas been moved toevalml.pipelines.components.allowed_model_families#959TextFeaturizer: thefeaturetoolsandnlp_primitivespackages must be installed after installing evalml in order to use this component #976Renamed
DateTimeFeaturizationtoDateTimeFeaturizer#977
- v0.11.2 July 16, 2020
- Enhancements
Added
NoVarianceDataChecktoDefaultDataChecks#893Added text processing and featurization component
TextFeaturizer#913, #924Added additional checks to
InvalidTargetDataCheckto handle invalid target data types #929AutoMLSearchwill now handleKeyboardInterruptand prompt user for confirmation #915
- Fixes
Makes automl results a read-only property #919
- Changes
Deleted static pipelines and refactored tests involving static pipelines, removed
all_pipelines()andget_pipelines()#904Moved
list_model_familiestoevalml.model_family.utils#903Updated
all_pipelines,all_estimators,all_componentsto use the same mechanism for dynamically generating their elements #898Rename
masterbranch tomain#918Add pypi release github action #923
Updated
AutoMLSearch.searchstdout output and logging and removed tqdm progress bar #921Moved automl config checks previously in
search()to init #933
- Testing Changes
Cleaned up fixture names and usages in tests #895
Warning
- Breaking Changes
list_model_familieshas been moved toevalml.model_family.utils(previously was underevalml.pipelines.utils) #903get_estimatorshas been moved toevalml.pipelines.components.utils(previously was underevalml.pipelines.utils) #934Static pipeline definitions have been removed, but similar pipelines can still be constructed via creating an instance of
PipelineBase#904all_pipelines()andget_pipelines()utility methods have been removed #904
- v0.11.0 June 30, 2020
- Enhancements
Added multiclass support for ROC curve graphing #832
Added preprocessing component to drop features whose percentage of NaN values exceeds a specified threshold #834
Added data check to check for problematic target labels #814
Added PerColumnImputer that allows imputation strategies per column #824
Added transformer to drop specific columns #827
Added support for
categories,handle_error, anddropparameters inOneHotEncoder#830 #897Added preprocessing component to handle DateTime columns featurization #838
Added ability to clone pipelines and components #842
Define getter method for component
parameters#847Added utility methods to calculate and graph permutation importances #860, #880
Added new utility functions necessary for generating dynamic preprocessing pipelines #852
Added kwargs to all components #863
Updated
AutoSearchBaseto use dynamically generated preprocessing pipelines #870Added SelectColumns transformer #873
Added ability to evaluate additional pipelines for automl search #874
Added
default_parametersclass property to components and pipelines #879Added better support for disabling data checks in automl search #892
Added ability to save and load AutoML objects to file #888
Updated
AutoSearchBase.get_pipelinesto return an untrained pipeline instance #876Saved learned binary classification thresholds in automl results cv data dict #876
- Fixes
Fixed bug where SimpleImputer cannot handle dropped columns #846
Fixed bug where PerColumnImputer cannot handle dropped columns #855
Enforce requirement that builtin components save all inputted values in their parameters dict #847
Don’t list base classes in
all_componentsoutput #847Standardize all components to output pandas data structures, and accept either pandas or numpy #853
Fixed rankings and full_rankings error when search has not been run #894
- Changes
Update
all_pipelinesandall_componentsto try initializing pipelines/components, and on failure exclude them #849Refactor
handle_componentstohandle_components_class, standardize toComponentBasesubclass instead of instance #850Refactor “blacklist”/”whitelist” to “allow”/”exclude” lists #854
Replaced
AutoClassificationSearchandAutoRegressionSearchwithAutoMLSearch#871Renamed feature_importances and permutation_importances methods to use singular names (feature_importance and permutation_importance) #883
Updated
automldefault data splitter to train/validation split for large datasets #877Added open source license, update some repo metadata #887
Removed dead code in
_get_preprocessing_components#896
- Documentation Changes
Fix some typos and update the EvalML logo #872
Warning
- Breaking Changes
Pipelines’ static
component_graphfield must contain eitherComponentBasesubclasses orstr, instead ofComponentBasesubclass instances #850Rename
handle_componenttohandle_component_class. Now standardizes toComponentBasesubclasses instead ofComponentBasesubclass instances #850Renamed automl’s
cvargument todata_split#877Pipelines’ and classifiers’
feature_importancesis renamedfeature_importance,graph_feature_importancesis renamedgraph_feature_importance#883Passing
data_checks=Noneto automl search will not perform any data checks as opposed to default checks. #892Pipelines to search for in AutoML are now determined automatically, rather than using the statically-defined pipeline classes. #870
Updated
AutoSearchBase.get_pipelinesto return an untrained pipeline instance, instead of one which happened to be trained on the final cross-validation fold #876
- v0.10.0 May 29, 2020
- Enhancements
Added baseline models for classification and regression, add functionality to calculate baseline models before searching in AutoML #746
Port over highly-null guardrail as a data check and define
DefaultDataChecksandDisableDataChecksclasses #745Update
Tunerclasses to work directly with pipeline parameters dicts instead of flat parameter lists #779Add Elastic Net as a pipeline option #812
Added new Pipeline option
ExtraTrees#790Added precicion-recall curve metrics and plot for binary classification problems in
evalml.pipeline.graph_utils#794Update the default automl algorithm to search in batches, starting with default parameters for each pipeline and iterating from there #793
Added
AutoMLAlgorithmclass andIterativeAlgorithmimpl, separated fromAutoSearchBase#793
- Fixes
Update pipeline
scoreto returnnanscore for any objective which throws an exception during scoring #787Fixed bug introduced in #787 where binary classification metrics requiring predicted probabilities error in scoring #798
CatBoost and XGBoost classifiers and regressors can no longer have a learning rate of 0 #795
- Changes
Cleanup pipeline
scorecode, and cleanup codecov #711Remove
passfor abstract methods for codecov #730Added __str__ for AutoSearch object #675
Add util methods to graph ROC and confusion matrix #720
Refactor
AutoBasetoAutoSearchBase#758Updated AutoBase with
data_checksparameter, removed previousdetect_label_leakageparameter, and added functionality to run data checks before search in AutoML #765Updated our logger to use Python’s logging utils #763
Refactor most of
AutoSearchBase._do_iterationimpl intoAutoSearchBase._evaluate#762Port over all guardrails to use the new DataCheck API #789
Expanded
import_or_raiseto catch all exceptions #759Adds RMSE, MSLE, RMSLE as standard metrics #788
Don’t allow
Recallto be used as an objective for AutoML #784Removed feature selection from pipelines #819
Update default estimator parameters to make automl search faster and more accurate #793
- Testing Changes
Delete codecov yml, use codecov.io’s default #732
Added unit tests for fraud cost, lead scoring, and standard metric objectives #741
Update codecov client #782
Updated AutoBase __str__ test to include no parameters case #783
Added unit tests for
ExtraTreespipeline #790If codecov fails to upload, fail build #810
Updated Python version of dependency action #816
Update the dependency update bot to use a suffix when creating branches #817
Warning
- Breaking Changes
The
detect_label_leakageparameter for AutoML classes has been removed and replaced by adata_checksparameter #765Moved ROC and confusion matrix methods from
evalml.pipeline.plot_utilstoevalml.pipeline.graph_utils#720Tunerclasses require a pipeline hyperparameter range dict as an init arg instead of a space definition #779Tuner.proposeandTuner.addwork directly with pipeline parameters dicts instead of flat parameter lists #779PipelineBase.hyperparametersandcustom_hyperparametersuse pipeline parameters dict format instead of being represented as a flat list #779All guardrail functions previously under
evalml.guardrails.utilswill be removed and replaced by data checks #789Recalldisallowed as an objective for AutoML #784AutoSearchBaseparametertunerhas been renamed totuner_class#793AutoSearchBaseparameterpossible_pipelinesandpossible_model_familieshave been renamed toallowed_pipelinesandallowed_model_families#793
- v0.9.0 Apr. 27, 2020
- Enhancements
Added
Accuracyas an standard objective #624Added verbose parameter to load_fraud #560
Added Balanced Accuracy metric for binary, multiclass #612 #661
Added XGBoost regressor and XGBoost regression pipeline #666
Added
Accuracymetric for multiclass #672Added objective name in
AutoBase.describe_pipeline#686Added
DataCheckandDataChecks,Messageclasses and relevant subclasses #739
- Fixes
Removed direct access to
cls.component_graph#595Add testing files to .gitignore #625
Remove circular dependencies from
Makefile#637Add error case for
normalize_confusion_matrix()#640Fixed
XGBoostClassifierandXGBoostRegressorbug with feature names that contain [, ], or < #659Update
make_pipeline_graphto not accidentally create empty file when testing if path is valid #649Fix pip installation warning about docsutils version, from boto dependency #664
Removed zero division warning for F1/precision/recall metrics #671
Fixed
summaryfor pipelines without estimators #707
- Changes
Updated default objective for binary/multiclass classification to log loss #613
Created classification and regression pipeline subclasses and removed objective as an attribute of pipeline classes #405
Changed the output of
scoreto return one dictionary #429Created binary and multiclass objective subclasses #504
Updated objectives API #445
Removed call to
get_plot_datafrom AutoML #615Set
raise_errorto default to True for AutoML classes #638Remove unnecessary “u” prefixes on some unicode strings #641
Changed one-hot encoder to return uint8 dtypes instead of ints #653
Pipeline
_namefield changed tocustom_name#650Removed
graphs.pyand moved methods intoPipelineBase#657, #665Remove s3fs as a dev dependency #664
Changed requirements-parser to be a core dependency #673
Replace
supported_problem_typesfield on pipelines withproblem_typeattribute on base classes #678Changed AutoML to only show best results for a given pipeline template in
rankings, addedfull_rankingsproperty to show all #682Update
ModelFamilyvalues: don’t list xgboost/catboost as classifiers now that we have regression pipelines for them #677Changed AutoML’s
describe_pipelineto get problem type from pipeline instead #685Standardize
import_or_raiseerror messages #683Updated argument order of objectives to align with sklearn’s #698
Renamed
pipeline.feature_importance_graphtopipeline.graph_feature_importances#700Moved ROC and confusion matrix methods to
evalml.pipelines.plot_utils#704Renamed
MultiClassificationObjectivetoMulticlassClassificationObjective, to align with pipeline naming scheme #715
- Documentation Changes
Fixed some sphinx warnings #593
Fixed docstring for
AutoClassificationSearchwith correct command #599Limit readthedocs formats to pdf, not htmlzip and epub #594 #600
Clean up objectives API documentation #605
Fixed function on Exploring search results page #604
Update release process doc #567
AutoClassificationSearchandAutoRegressionSearchshow inherited methods in API reference #651Fixed improperly formatted code in breaking changes for changelog #655
Added configuration to treat Sphinx warnings as errors #660
Removed separate plotting section for pipelines in API reference #657, #665
Have leads example notebook load S3 files using https, so we can delete s3fs dev dependency #664
Categorized components in API reference and added descriptions for each category #663
Fixed Sphinx warnings about
BalancedAccuracyobjective #669Updated API reference to include missing components and clean up pipeline docstrings #689
Reorganize API ref, and clarify pipeline sub-titles #688
Add and update preprocessing utils in API reference #687
Added inheritance diagrams to API reference #695
Documented which default objective AutoML optimizes for #699
Create seperate install page #701
Include more utils in API ref, like
import_or_raise#704Add more color to pipeline documentation #705
- Testing Changes
Matched install commands of
check_latest_dependenciestest and it’s GitHub action #578Added Github app to auto assign PR author as assignee #477
Removed unneeded conda installation of xgboost in windows checkin tests #618
Update graph tests to always use tmpfile dir #649
Changelog checkin test workaround for release PRs: If ‘future release’ section is empty of PR refs, pass check #658
Add changelog checkin test exception for
dep-updatebranch #723
Warning
Breaking Changes
Pipelines will now no longer take an objective parameter during instantiation, and will no longer have an objective attribute.
fit()andpredict()now use an optionalobjectiveparameter, which is only used in binary classification pipelines to fit for a specific objective.score()will now use a requiredobjectivesparameter that is used to determine all the objectives to score on. This differs from the previous behavior, where the pipeline’s objective was scored on regardless.score()will now return one dictionary of all objective scores.ROCandConfusionMatrixplot methods viaAuto(*).plothave been removed by #615 and are replaced byroc_curveandconfusion_matrixinevamlm.pipelines.plot_utilsin #704normalize_confusion_matrixhas been moved toevalml.pipelines.plot_utils#704Pipelines
_namefield changed tocustom_namePipelines
supported_problem_typesfield is removed because it is no longer necessary #678Updated argument order of objectives’
objective_functionto align with sklearn #698pipeline.feature_importance_graphhas been renamed topipeline.graph_feature_importancesin #700Removed unsupported
MSLEobjective #704
- v0.8.0 Apr. 1, 2020
- Enhancements
Add normalization option and information to confusion matrix #484
Add util function to drop rows with NaN values #487
Renamed
PipelineBase.nameasPipelineBase.summaryand redefinedPipelineBase.nameas class property #491Added access to parameters in Pipelines with
PipelineBase.parameters(used to be return ofPipelineBase.describe) #501Added
fill_valueparameter forSimpleImputer#509Added functionality to override component hyperparameters and made pipelines take hyperparemeters from components #516
Allow
numpy.random.RandomStatefor random_state parameters #556
- Fixes
Removed unused dependency
matplotlib, and movecategory_encodersto test reqs #572
- Changes
Undo version cap in XGBoost placed in #402 and allowed all released of XGBoost #407
Support pandas 1.0.0 #486
Made all references to the logger static #503
Refactored
model_typeparameter for components and pipelines tomodel_family#507Refactored
problem_typesfor pipelines and components intosupported_problem_types#515Moved
pipelines/utils.save_pipelineandpipelines/utils.load_pipelinetoPipelineBase.saveandPipelineBase.load#526Limit number of categories encoded by
OneHotEncoder#517
Warning
Breaking Changes
AutoClassificationSearchandAutoRegressionSearch’smodel_typesparameter has been refactored intoallowed_model_familiesModelTypesenum has been changed toModelFamilyComponents and Pipelines now have a
model_familyfield instead ofmodel_typeget_pipelinesutility function now acceptsmodel_familiesas an argument instead ofmodel_typesPipelineBase.nameno longer returns structure of pipeline and has been replaced byPipelineBase.summaryPipelineBase.problem_typesandEstimator.problem_typeshas been renamed tosupported_problem_typespipelines/utils.save_pipelineandpipelines/utils.load_pipelinemoved toPipelineBase.saveandPipelineBase.load
- v0.7.0 Mar. 9, 2020
- Enhancements
Added emacs buffers to .gitignore #350
Add CatBoost (gradient-boosted trees) classification and regression components and pipelines #247
Added Tuner abstract base class #351
Added
n_jobsas parameter forAutoClassificationSearchandAutoRegressionSearch#403Changed colors of confusion matrix to shades of blue and updated axis order to match scikit-learn’s #426
Added
PipelineBase.graphand.feature_importance_graphmethods, moved from previous location #423Added support for python 3.8 #462
- Changes
Added
n_estimatorsas a tunable parameter for XGBoost #307Remove unused parameter
ObjectiveBase.fit_needs_proba#320Remove extraneous parameter
component_typefrom all components #361Remove unused
rankings.csvfile #397Downloaded demo and test datasets so unit tests can run offline #408
Remove
_needs_fittingattribute from Components #398Changed plot.feature_importance to show only non-zero feature importances by default, added optional parameter to show all #413
Refactored
PipelineBaseto take in parameter dictionary and moved pipeline metadata to class attribute #421Dropped support for Python 3.5 #438
Removed unused
apply.pyfile #449Clean up
requirements.txtto remove unused deps #451Support installation without all required dependencies #459
- Documentation Changes
Update release.md with instructions to release to internal license key #354
- Testing Changes
Added tests for utils (and moved current utils to gen_utils) #297
Moved XGBoost install into it’s own separate step on Windows using Conda #313
Rewind pandas version to before 1.0.0, to diagnose test failures for that version #325
Added dependency update checkin test #324
Rewind XGBoost version to before 1.0.0 to diagnose test failures for that version #402
Update dependency check to use a whitelist #417
Update unit test jobs to not install dev deps #455
Warning
Breaking Changes
Python 3.5 will not be actively supported.
- v0.6.0 Dec. 16, 2019
- Enhancements
Added ability to create a plot of feature importances #133
Add early stopping to AutoML using patience and tolerance parameters #241
Added ROC and confusion matrix metrics and plot for classification problems and introduce PipelineSearchPlots class #242
Enhanced AutoML results with search order #260
Added utility function to show system and environment information #300
- Changes
Renamed automl classes to
AutoRegressionSearchandAutoClassificationSearch#287Updating demo datasets to retain column names #223
Moving pipeline visualization to
PipelinePlotclass #228Standarizing inputs as
pd.Dataframe/pd.Series#130Enforcing that pipelines must have an estimator as last component #277
Added
ipywidgetsas a dependency inrequirements.txt#278Added Random and Grid Search Tuners #240
Warning
Breaking Changes
The
fit()method forAutoClassifierandAutoRegressorhas been renamed tosearch().AutoClassifierhas been renamed toAutoClassificationSearchAutoRegressorhas been renamed toAutoRegressionSearchAutoClassificationSearch.resultsandAutoRegressionSearch.resultsnow is a dictionary withpipeline_resultsandsearch_orderkeys.pipeline_resultscan be used to access a dictionary that is identical to the old.resultsdictionary. Whereas,search_orderreturns a list of the search order in terms ofpipeline_id.Pipelines now require an estimator as the last component in
component_list. Slicing pipelines now throws anNotImplementedErrorto avoid returning pipelines without an estimator.
- v0.5.2 Nov. 18, 2019
- v0.5.1 Nov. 15, 2019
- v0.5.0 Oct. 29, 2019
- Enhancements
Added basic one hot encoding #73
Use enums for model_type #110
Support for splitting regression datasets #112
Auto-infer multiclass classification #99
Added support for other units in
max_time#125Detect highly null columns #121
Added additional regression objectives #100
Show an interactive iteration vs. score plot when using fit() #134
- v0.4.1 Sep. 16, 2019
- Enhancements
Added AutoML for classification and regressor using Autobase and Skopt #7 #9
Implemented standard classification and regression metrics #7
Added logistic regression, random forest, and XGBoost pipelines #7
Implemented support for custom objectives #15
Feature importance for pipelines #18
Serialization for pipelines #19
Allow fitting on objectives for optimal threshold #27
Added detect label leakage #31
Implemented callbacks #42
Allow for multiclass classification #21
Added support for additional objectives #79
- Testing Changes
Added testing for loading data #39
- v0.2.0 Aug. 13, 2019
- Enhancements
Created fraud detection objective #4
- v0.1.0 July. 31, 2019