Release Notes

Future Releases
  • Enhancements
    • Renamed DelayedFeatureTransformer to TimeSeriesFeaturizer and enhanced it to compute rolling features #3028

    • Added ability to impute only specific columns in PerColumnImputer #3123

    • Added TimeSeriesParametersDataCheck to verify the time series parameters are valid given the number of splits in cross validation #3111

  • Fixes
    • Default parameters for RFRegressorSelectFromModel and RFClassifierSelectFromModel has been fixed to avoid selecting all features #3110

  • Changes
    • Removed reliance on a datetime index for ARIMARegressor and ProphetRegressor #3104

    • Included target leakage check when fitting ARIMARegressor to account for the lack of TimeSeriesFeaturizer in ARIMARegressor based pipelines #3104

    • Cleaned up and refactored InvalidTargetDataCheck implementation and docstring #3122

    • Removed indices information from the output of HighlyNullDataCheck’s validate() method #3092

    • Added ReplaceNullableTypes component to prepare for handling pandas nullable types. #3090

    • Removed unused EnsembleMissingPipelinesError exception definition #3131

  • Documentation Changes

  • Testing Changes
    • Refactored tests to avoid using importorskip #3126

    • Added skip_during_conda test marker to skip tests that are not supposed to run during conda build #3127

    • Added skip_if_39 test marker to skip tests that are not supposed to run during python 3.9 #3133

Warning

Breaking Changes
  • Renamed DelayedFeatureTransformer to TimeSeriesFeaturizer #3028

  • ProphetRegressor now requires a datetime column in X represented by the date_index parameter #3104

  • Renamed module evalml.data_checks.invalid_target_data_check to evalml.data_checks.invalid_targets_data_check #3122

  • Removed unused EnsembleMissingPipelinesError exception definition #3131

v0.38.0 Nov. 27, 2021
  • Enhancements
    • Added data_check_name attribute to the data check action class #3034

    • Added NumWords and NumCharacters primitives to TextFeaturizer and renamed TextFeaturizer` to ``NaturalLanguageFeaturizer #3030

    • Added support for scikit-learn > 1.0.0 #3051

    • Required the date_index parameter to be specified for time series problems in AutoMLSearch #3041

    • Allowed time series pipelines to predict on test datasets whose length is less than or equal to the forecast_horizon. Also allowed the test set index to start at 0. #3071

    • Enabled time series pipeline to predict on data with features that are not known-in-advanced #3094

  • Fixes
    • Added in error message when fit and predict/predict_proba data types are different #3036

    • Fixed bug where ensembling components could not get converted to JSON format #3049

    • Fixed bug where components with tuned integer hyperparameters could not get converted to JSON format #3049

    • Fixed bug where force plots were not displaying correct feature values #3044

    • Included confusion matrix at the pipeline threshold for find_confusion_matrix_per_threshold #3080

    • Fixed bug where One Hot Encoder would error out if a non-categorical feature had a missing value #3083

    • Fixed bug where features created from categorical columns by Delayed Feature Transformer would be inferred as categorical #3083

  • Changes
    • Delete predict_uses_y estimator attribute #3069

    • Change DateTimeFeaturizer to use corresponding Featuretools primitives #3081

    • Updated TargetDistributionDataCheck to return metadata details as floats rather strings #3085

    • Removed dependency on psutil package #3093

  • Documentation Changes
    • Updated docs to use data check action methods rather than manually cleaning data #3050

  • Testing Changes
    • Updated integration tests to use make_pipeline_from_actions instead of private method #3047

Warning

Breaking Changes
  • Added data_check_name attribute to the data check action class #3034

  • Renamed TextFeaturizer` to ``NaturalLanguageFeaturizer #3030

  • Updated the Pipeline.graph_json function to return a dictionary of “from” and “to” edges instead of tuples #3049

  • Delete predict_uses_y estimator attribute #3069

  • Changed time series problems in AutoMLSearch to need a not-None date_index #3041

  • Changed the DelayedFeatureTransformer to throw a ValueError during fit if the date_index is None #3041

  • Passing X=None to DelayedFeatureTransformer is deprecated #3041

v0.37.0 Nov. 9, 2021
  • Enhancements
    • Added find_confusion_matrix_per_threshold to Model Understanding #2972

    • Limit computationally-intensive models during AutoMLSearch for certain multiclass problems, allow for opt-in with parameter allow_long_running_models #2982

    • Added support for stacked ensemble pipelines to prediction explanations module #2971

    • Added integration tests for data checks and data checks actions workflow #2883

    • Added a change in pipeline structure to handle categorical columns separately for pipelines in DefaultAlgorithm #2986

    • Added an algorithm to DelayedFeatureTransformer to select better lags #3005

    • Added test to ensure pickling pipelines preserves thresholds #3027

    • Added AutoML function to access ensemble pipeline’s input pipelines IDs #3011

    • Added ability to define which class is “positive” for label encoder in binary classification case #3033

  • Fixes
    • Fixed bug where Oversampler didn’t consider boolean columns to be categorical #2980

    • Fixed permutation importance failing when target is categorical #3017

    • Updated estimator and pipelines’ predict, predict_proba, transform, inverse_transform methods to preserve input indices #2979

    • Updated demo dataset link for daily min temperatures #3023

  • Changes
    • Updated OutliersDataCheck and UniquenessDataCheck and allow for the suspension of the Nullable types error #3018

  • Documentation Changes
    • Fixed cost benefit matrix demo formatting #2990

    • Update ReadMe.md with new badge links and updated installation instructions for conda #2998

    • Added more comprehensive doctests #3002

v0.36.0 Oct. 27, 2021
  • Enhancements
    • Added LIME as an algorithm option for explain_predictions and explain_predictions_best_worst #2905

    • Standardized data check messages and added default “rows” and “columns” to data check message details dictionary #2869

    • Added rows_of_interest to pipeline utils #2908

    • Added support for woodwork version 0.8.2 #2909

    • Enhanced the DateTimeFeaturizer to handle NaNs in date features #2909

    • Added support for woodwork logical types PostalCode, SubRegionCode, and CountryCode in model understanding tools #2946

    • Added Vowpal Wabbit regressor and classifiers #2846

    • Added NoSplit data splitter for future unsupervised learning searches #2958

    • Added method to convert actions into a preprocessing pipeline #2968

  • Fixes
    • Fixed bug where partial dependence was not respecting the ww schema #2929

    • Fixed calculate_permutation_importance for datetimes on StandardScaler #2938

    • Fixed SelectColumns to only select available features for feature selection in DefaultAlgorithm #2944

    • Fixed DropColumns component not receiving parameters in DefaultAlgorithm #2945

    • Fixed bug where trained binary thresholds were not being returned by get_pipeline or clone #2948

    • Fixed bug where Oversampler selected ww logical categorical instead of ww semantic category #2946

  • Changes
    • Changed make_pipeline function to place the DateTimeFeaturizer prior to the Imputer so that NaN dates can be imputed #2909

    • Refactored OutliersDataCheck and HighlyNullDataCheck to add more descriptive metadata #2907

    • Bumped minimum version of dask from 2021.2.0 to 2021.10.0 #2978

  • Documentation Changes
    • Added back Future Release section to release notes #2927

    • Updated CI to run doctest (docstring tests) and apply necessary fixes to docstrings #2933

    • Added documentation for BinaryClassificationPipeline thresholding #2937

  • Testing Changes
    • Fixed dependency checker to catch full names of packages #2930

    • Refactored build_conda_pkg to work from a local recipe #2925

    • Refactored component test for different environments #2957

Warning

Breaking Changes
  • Standardized data check messages and added default “rows” and “columns” to data check message details dictionary. This may change the number of messages returned from a data check. #2869

v0.35.0 Oct. 14, 2021
  • Enhancements
    • Added human-readable pipeline explanations to model understanding #2861

    • Updated to support Featuretools 1.0.0 and nlp-primitives 2.0.0 #2848

  • Fixes
    • Fixed bug where long mode for the top level search method was not respected #2875

    • Pinned cmdstan to 0.28.0 in cmdstan-builder to prevent future breaking of support for Prophet #2880

    • Added Jarque-Bera to the TargetDistributionDataCheck #2891

  • Changes
    • Updated pipelines to use a label encoder component instead of doing encoding on the pipeline level #2821

    • Deleted scikit-learn ensembler #2819

    • Refactored pipeline building logic out of AutoMLSearch and into IterativeAlgorithm #2854

    • Refactored names for methods in ComponentGraph and PipelineBase #2902

  • Documentation Changes
    • Updated install.ipynb to reflect flexibility for cmdstan version installation #2880

    • Updated the conda section of our contributing guide #2899

  • Testing Changes
    • Updated test_all_estimators to account for Prophet being allowed for Python 3.9 #2892

    • Updated linux tests to use cmdstan-builder==0.0.8 #2880

Warning

Breaking Changes
  • Updated pipelines to use a label encoder component instead of doing encoding on the pipeline level. This means that pipelines will no longer automatically encode non-numerical targets. Please use a label encoder if working with classification problems and non-numeric targets. #2821

  • Deleted scikit-learn ensembler #2819

  • IterativeAlgorithm now requires X, y, problem_type as required arguments as well as sampler_name, allowed_model_families, allowed_component_graphs, max_batches, and verbose as optional arguments #2854

  • Changed method names of fit_features and compute_final_component_features to fit_and_transform_all_but_final and transform_all_but_final in ComponentGraph, and compute_estimator_features to transform_all_but_final in pipeline classes #2902

v0.34.0 Sep. 30, 2021
  • Enhancements
    • Updated to work with Woodwork 0.8.1 #2783

    • Added validation that training_data and training_target are not None in prediction explanations #2787

    • Added support for training-only components in pipelines and component graphs #2776

    • Added default argument for the parameters value for ComponentGraph.instantiate #2796

    • Added TIME_SERIES_REGRESSION to LightGBMRegressor's supported problem types #2793

    • Provided a JSON representation of a pipeline’s DAG structure #2812

    • Added validation to holdout data passed to predict and predict_proba for time series #2804

    • Added information about which row indices are outliers in OutliersDataCheck #2818

    • Added verbose flag to top level search() method #2813

    • Added support for linting jupyter notebooks and clearing the executed cells and empty cells #2829 #2837

    • Added “DROP_ROWS” action to output of OutliersDataCheck.validate() #2820

    • Added the ability of AutoMLSearch to accept a SequentialEngine instance as engine input #2838

    • Added new label encoder component to EvalML #2853

    • Added our own partial dependence implementation #2834

  • Fixes
    • Fixed bug where calculate_permutation_importance was not calculating the right value for pipelines with target transformers #2782

    • Fixed bug where transformed target values were not used in fit for time series pipelines #2780

    • Fixed bug where score_pipelines method of AutoMLSearch would not work for time series problems #2786

    • Removed TargetTransformer class #2833

    • Added tests to verify ComponentGraph support by pipelines #2830

    • Fixed incorrect parameter for baseline regression pipeline in AutoMLSearch #2847

    • Fixed bug where the desired estimator family order was not respected in IterativeAlgorithm #2850

  • Changes
    • Changed woodwork initialization to use partial schemas #2774

    • Made Transformer.transform() an abstract method #2744

    • Deleted EmptyDataChecks class #2794

    • Removed data check for checking log distributions in make_pipeline #2806

    • Changed the minimum woodwork version to 0.8.0 #2783

    • Pinned woodwork version to 0.8.0 #2832

    • Removed model_family attribute from ComponentBase and transformers #2828

    • Limited scikit-learn until new features and errors can be addressed #2842

    • Show DeprecationWarning when Sklearn Ensemblers are called #2859

  • Testing Changes
    • Updated matched assertion message regarding monotonic indices in polynomial detrender tests #2811

    • Added a test to make sure pip versions match conda versions #2851

Warning

Breaking Changes
  • Made Transformer.transform() an abstract method #2744

  • Deleted EmptyDataChecks class #2794

  • Removed data check for checking log distributions in make_pipeline #2806

v0.33.0 Sep. 15, 2021
  • Enhancements

  • Fixes
    • Fixed bug where warnings during make_pipeline were not being raised to the user #2765

  • Changes
    • Refactored and removed SamplerBase class #2775

  • Documentation Changes
    • Added docstring linting packages pydocstyle and darglint to make-lint command #2670

  • Testing Changes

Warning

Breaking Changes

v0.32.1 Sep. 10, 2021
  • Enhancements
    • Added verbose flag to AutoMLSearch to run search in silent mode by default #2645

    • Added label encoder to XGBoostClassifier to remove the warning #2701

    • Set eval_metric to logloss for XGBoostClassifier #2741

    • Added support for woodwork versions 0.7.0 and 0.7.1 #2743

    • Changed explain_predictions functions to display original feature values #2759

    • Added X_train and y_train to graph_prediction_vs_actual_over_time and get_prediction_vs_actual_over_time_data #2762

    • Added forecast_horizon as a required parameter to time series pipelines and AutoMLSearch #2697

    • Added predict_in_sample and predict_proba_in_sample methods to time series pipelines to predict on data where the target is known, e.g. cross-validation #2697

  • Fixes
    • Fixed bug where _catch_warnings assumed all warnings were PipelineNotUsed #2753

    • Fixed bug where Imputer.transform would erase ww typing information prior to handing data to the SimpleImputer #2752

    • Fixed bug where Oversampler could not be copied #2755

  • Changes
    • Deleted drop_nan_target_rows utility method #2737

    • Removed default logging setup and debugging log file #2645

    • Changed the default n_jobs value for XGBoostClassifier and XGBoostRegressor to 12 #2757

    • Changed TimeSeriesBaselineEstimator to only work on a time series pipeline with a DelayedFeaturesTransformer #2697

    • Added X_train and y_train as optional parameters to pipeline predict, predict_proba. Only used for time series pipelines #2697

    • Added training_data and training_target as optional parameters to explain_predictions and explain_predictions_best_worst to support time series pipelines #2697

    • Changed time series pipeline predictions to no longer output series/dataframes padded with NaNs. A prediction will be returned for every row in the X input #2697

  • Documentation Changes
    • Specified installation steps for Prophet #2713

    • Added documentation for data exploration on data check actions #2696

    • Added a user guide entry for time series modelling #2697

  • Testing Changes
    • Fixed flaky TargetDistributionDataCheck test for very_lognormal distribution #2748

Warning

Breaking Changes
  • Removed default logging setup and debugging log file #2645

  • Added X_train and y_train to graph_prediction_vs_actual_over_time and get_prediction_vs_actual_over_time_data #2762

  • Added forecast_horizon as a required parameter to time series pipelines and AutoMLSearch #2697

  • Changed TimeSeriesBaselineEstimator to only work on a time series pipeline with a DelayedFeaturesTransformer #2697

  • Added X_train and y_train as required parameters for predict and predict_proba in time series pipelines #2697

  • Added training_data and training_target as required parameters to explain_predictions and explain_predictions_best_worst for time series pipelines #2697

v0.32.0 Aug. 31, 2021
  • Enhancements
    • Allow string for engine parameter for AutoMLSearch#2667

    • Add ProphetRegressor to AutoML #2619

    • Integrated DefaultAlgorithm into AutoMLSearch #2634

    • Removed SVM “linear” and “precomputed” kernel hyperparameter options, and improved default parameters #2651

    • Updated ComponentGraph initalization to raise ValueError when user attempts to use .y for a component that does not produce a tuple output #2662

    • Updated to support Woodwork 0.6.0 #2690

    • Updated pipeline graph() to distingush X and y edges #2654

    • Added DropRowsTransformer component #2692

    • Added DROP_ROWS to _make_component_list_from_actions and clean up metadata #2694

    • Add new ensembler component #2653

  • Fixes
    • Updated Oversampler logic to select best SMOTE based on component input instead of pipeline input #2695

    • Added ability to explicitly close DaskEngine resources to improve runtime and reduce Dask warnings #2667

    • Fixed partial dependence bug for ensemble pipelines #2714

    • Updated TargetLeakageDataCheck to maintain user-selected logical types #2711

  • Changes
    • Replaced SMOTEOversampler, SMOTENOversampler and SMOTENCOversampler with consolidated Oversampler component #2695

    • Removed LinearRegressor from the list of default AutoMLSearch estimators due to poor performance #2660

  • Documentation Changes
    • Added user guide documentation for using ComponentGraph and added ComponentGraph to API reference #2673

    • Updated documentation to make parallelization of AutoML clearer #2667

  • Testing Changes
    • Removes the process-level parallelism from the test_cancel_job test #2666

    • Installed numba 0.53 in windows CI to prevent problems installing version 0.54 #2710

Warning

Breaking Changes
  • Renamed the current top level search method to search_iterative and defined a new search method for the DefaultAlgorithm #2634

  • Replaced SMOTEOversampler, SMOTENOversampler and SMOTENCOversampler with consolidated Oversampler component #2695

  • Removed LinearRegressor from the list of default AutoMLSearch estimators due to poor performance #2660

v0.31.0 Aug. 19, 2021
  • Enhancements
    • Updated the high variance check in AutoMLSearch to be robust to a variety of objectives and cv scores #2622

    • Use Woodwork’s outlier detection for the OutliersDataCheck #2637

    • Added ability to utilize instantiated components when creating a pipeline #2643

    • Sped up the all Nan and unknown check in infer_feature_types #2661

  • Fixes

  • Changes
    • Deleted _put_into_original_order helper function #2639

    • Refactored time series pipeline code using a time series pipeline base class #2649

    • Renamed dask_tests to parallel_tests #2657

    • Removed commented out code in pipeline_meta.py #2659

  • Documentation Changes
    • Add complete install command to README and Install section #2627

    • Cleaned up documentation for MulticollinearityDataCheck #2664

  • Testing Changes
    • Speed up CI by splitting Prophet tests into a separate workflow in GitHub #2644

Warning

Breaking Changes
  • TimeSeriesRegressionPipeline no longer inherits from TimeSeriesRegressionPipeline #2649

v0.30.2 Aug. 16, 2021
  • Fixes
    • Updated changelog and version numbers to match the release. Release 0.30.1 was release erroneously without a change to the version numbers. 0.30.2 replaces it.

v0.30.1 Aug. 12, 2021
  • Enhancements
    • Added DatetimeFormatDataCheck for time series problems #2603

    • Added ProphetRegressor to estimators #2242

    • Updated ComponentGraph to handle not calling samplers’ transform during predict, and updated samplers’ transform methods s.t. fit_transform is equivalent to fit(X, y).transform(X, y) #2583

    • Updated ComponentGraph _validate_component_dict logic to be stricter about input values #2599

    • Patched bug in xgboost estimators where predicting on a feature matrix of only booleans would throw an exception. #2602

    • Updated ARIMARegressor to use relative forecasting to predict values #2613

    • Added support for creating pipelines without an estimator as the final component and added transform(X, y) method to pipelines and component graphs #2625

    • Updated to support Woodwork 0.5.1 #2610

  • Fixes
    • Updated AutoMLSearch to drop ARIMARegressor from allowed_estimators if an incompatible frequency is detected #2632

    • Updated get_best_sampler_for_data to consider all non-numeric datatypes as categorical for SMOTE #2590

    • Fixed inconsistent test results from TargetDistributionDataCheck #2608

    • Adopted vectorized pd.NA checking for Woodwork 0.5.1 support #2626

    • Pinned upper version of astroid to 2.6.6 to keep ReadTheDocs working. #2638

  • Changes
    • Renamed SMOTE samplers to SMOTE oversampler #2595

    • Changed partial_dependence and graph_partial_dependence to raise a PartialDependenceError instead of ValueError. This is not a breaking change because PartialDependenceError is a subclass of ValueError #2604

    • Cleaned up code duplication in ComponentGraph #2612

    • Stored predict_proba results in .x for intermediate estimators in ComponentGraph #2629

  • Documentation Changes
    • To avoid local docs build error, only add warning disable and download headers on ReadTheDocs builds, not locally #2617

  • Testing Changes
    • Updated partial_dependence tests to change the element-wise comparison per the Plotly 5.2.1 upgrade #2638

    • Changed the lint CI job to only check against python 3.9 via the -t flag #2586

    • Installed Prophet in linux nightlies test and fixed test_all_components #2598

    • Refactored and fixed all make_pipeline tests to assert correct order and address new Woodwork Unknown type inference #2572

    • Removed component_graphs as a global variable in test_component_graphs.py #2609

Warning

Breaking Changes
  • Renamed SMOTE samplers to SMOTE oversampler. Please use SMOTEOversampler, SMOTENCOversampler, SMOTENOversampler instead of SMOTESampler, SMOTENCSampler, and SMOTENSampler #2595

v0.30.0 Aug. 3, 2021
  • Enhancements
    • Added LogTransformer and TargetDistributionDataCheck #2487

    • Issue a warning to users when a pipeline parameter passed in isn’t used in the pipeline #2564

    • Added Gini coefficient as an objective #2544

    • Added repr to ComponentGraph #2565

    • Added components to extract features from URL and EmailAddress Logical Types #2550

    • Added support for NaN values in TextFeaturizer #2532

    • Added SelectByType transformer #2531

    • Added separate thresholds for percent null rows and columns in HighlyNullDataCheck #2562

    • Added support for NaN natural language values #2577

  • Fixes
    • Raised error message for types URL, NaturalLanguage, and EmailAddress in partial_dependence #2573

  • Changes
    • Updated PipelineBase implementation for creating pipelines from a list of components #2549

    • Moved get_hyperparameter_ranges to PipelineBase class from automl/utils module #2546

    • Renamed ComponentGraph’s get_parents to get_inputs #2540

    • Removed ComponentGraph.linearized_component_graph and ComponentGraph.from_list #2556

    • Updated ComponentGraph to enforce requiring .x and .y inputs for each component in the graph #2563

    • Renamed existing ensembler implementation from StackedEnsemblers to SklearnStackedEnsemblers #2578

  • Documentation Changes
    • Added documentation for DaskEngine and CFEngine parallel engines #2560

    • Improved detail of TextFeaturizer docstring and tutorial #2568

  • Testing Changes
    • Added test that makes sure split_data does not shuffle for time series problems #2552

Warning

Breaking Changes
  • Moved get_hyperparameter_ranges to PipelineBase class from automl/utils module #2546

  • Renamed ComponentGraph’s get_parents to get_inputs #2540

  • Removed ComponentGraph.linearized_component_graph and ComponentGraph.from_list #2556

  • Updated ComponentGraph to enforce requiring .x and .y inputs for each component in the graph #2563

v0.29.0 Jul. 21, 2021
  • Enhancements
    • Updated 1-way partial dependence support for datetime features #2454

    • Added details on how to fix error caused by broken ww schema #2466

    • Added ability to use built-in pickle for saving AutoMLSearch #2463

    • Updated our components and component graphs to use latest features of ww 0.4.1, e.g. concat_columns and drop in-place. #2465

    • Added new, concurrent.futures based engine for parallel AutoML #2506

    • Added support for new Woodwork Unknown type in AutoMLSearch #2477

    • Updated our components with an attribute that describes if they modify features or targets and can be used in list API for pipeline initialization #2504

    • Updated ComponentGraph to accept X and y as inputs #2507

    • Removed unused TARGET_BINARY_INVALID_VALUES from DataCheckMessageCode enum and fixed formatting of objective documentation #2520

    • Added EvalMLAlgorithm #2525

    • Added support for NaN values in TextFeaturizer #2532

  • Fixes
    • Fixed FraudCost objective and reverted threshold optimization method for binary classification to Golden #2450

    • Added custom exception message for partial dependence on features with scales that are too small #2455

    • Ensures the typing for Ordinal and Datetime ltypes are passed through _retain_custom_types_and_initalize_woodwork #2461

    • Updated to work with Pandas 1.3.0 #2442

    • Updated to work with sktime 0.7.0 #2499

  • Changes
    • Updated XGBoost dependency to >=1.4.2 #2484, #2498

    • Added a DeprecationWarning about deprecating the list API for ComponentGraph #2488

    • Updated make_pipeline for AutoML to create dictionaries, not lists, to initialize pipelines #2504

    • No longer installing graphviz on windows in our CI pipelines because release 0.17 breaks windows 3.7 #2516

  • Documentation Changes
    • Moved docstrings from __init__ to class pages, added missing docstrings for missing classes, and updated missing default values #2452

    • Build documentation with sphinx-autoapi #2458

    • Change autoapi_ignore to only ignore files in evalml/tests/* #2530

  • Testing Changes
    • Fixed flaky dask tests #2471

    • Removed shellcheck action from build_conda_pkg action #2514

    • Added a tmp_dir fixture that deletes its contents after tests run #2505

    • Added a test that makes sure all pipelines in AutoMLSearch get the same data splits #2513

    • Condensed warning output in test logs #2521

Warning

Breaking Changes
  • NaN values in the Natural Language type are no longer supported by the Imputer with the pandas upgrade. #2477

v0.28.0 Jul. 2, 2021
  • Enhancements
    • Added support for showing a Individual Conditional Expectations plot when graphing Partial Dependence #2386

    • Exposed thread_count for Catboost estimators as n_jobs parameter #2410

    • Updated Objectives API to allow for sample weighting #2433

  • Fixes
    • Deleted unreachable line from IterativeAlgorithm #2464

  • Changes
    • Pinned Woodwork version between 0.4.1 and 0.4.2 #2460

    • Updated psutils minimum version in requirements #2438

    • Updated log_error_callback to not include filepath in logged message #2429

  • Documentation Changes
    • Sped up docs #2430

    • Removed mentions of DataTable and DataColumn from the docs #2445

  • Testing Changes
    • Added slack integration for nightlies tests #2436

    • Changed build_conda_pkg CI job to run only when dependencies are updates #2446

    • Updated workflows to store pytest runtimes as test artifacts #2448

    • Added AutoMLTestEnv test fixture for making it easy to mock automl tests #2406

v0.27.0 Jun. 22, 2021
  • Enhancements
    • Adds force plots for prediction explanations #2157

    • Removed self-reference from AutoMLSearch #2304

    • Added support for nonlinear pipelines for generate_pipeline_code #2332

    • Added inverse_transform method to pipelines #2256

    • Add optional automatic update checker #2350

    • Added search_order to AutoMLSearch’s rankings and full_rankings tables #2345

    • Updated threshold optimization method for binary classification #2315

    • Updated demos to pull data from S3 instead of including demo data in package #2387

    • Upgrade woodwork version to v0.4.1 #2379

  • Fixes
    • Preserve user-specified woodwork types throughout pipeline fit/predict #2297

    • Fixed ComponentGraph appending target to final_component_features if there is a component that returns both X and y #2358

    • Fixed partial dependence graph method failing on multiclass problems when the class labels are numeric #2372

    • Added thresholding_objective argument to AutoMLSearch for binary classification problems #2320

    • Added change for k_neighbors parameter in SMOTE Oversamplers to automatically handle small samples #2375

    • Changed naming for Logistic Regression Classifier file #2399

    • Pinned pytest-timeout to fix minimum dependence checker #2425

    • Replaced Elastic Net Classifier base class with Logistsic Regression to avoid NaN outputs #2420

  • Changes
    • Cleaned up PipelineBase’s component_graph and _component_graph attributes. Updated PipelineBase __repr__ and added __eq__ for ComponentGraph #2332

    • Added and applied black linting package to the EvalML repo in place of autopep8 #2306

    • Separated custom_hyperparameters from pipelines and added them as an argument to AutoMLSearch #2317

    • Replaced allowed_pipelines with allowed_component_graphs #2364

    • Removed private method _compute_features_during_fit from PipelineBase #2359

    • Updated compute_order in ComponentGraph to be a read-only property #2408

    • Unpinned PyZMQ version in requirements.txt #2389

    • Uncapping LightGBM version in requirements.txt #2405

    • Updated minimum version of plotly #2415

    • Removed SensitivityLowAlert objective from core objectives #2418

  • Documentation Changes
    • Fixed lead scoring weights in the demos documentation #2315

    • Fixed start page code and description dataset naming discrepancy #2370

  • Testing Changes
    • Update minimum unit tests to run on all pull requests #2314

    • Pass token to authorize uploading of codecov reports #2344

    • Add pytest-timeout. All tests that run longer than 6 minutes will fail. #2374

    • Separated the dask tests out into separate github action jobs to isolate dask failures. #2376

    • Refactored dask tests #2377

    • Added the combined dask/non-dask unit tests back and renamed the dask only unit tests. #2382

    • Sped up unit tests and split into separate jobs #2365

    • Change CI job names, run lint for python 3.9, run nightlies on python 3.8 at 3am EST #2395 #2398

    • Set fail-fast to false for CI jobs that run for PRs #2402

Warning

Breaking Changes
  • AutoMLSearch will accept allowed_component_graphs instead of allowed_pipelines #2364

  • Removed PipelineBase’s _component_graph attribute. Updated PipelineBase __repr__ and added __eq__ for ComponentGraph #2332

  • pipeline_parameters will no longer accept skopt.space variables since hyperparameter ranges will now be specified through custom_hyperparameters #2317

v0.25.0 Jun. 01, 2021
  • Enhancements
    • Upgraded minimum woodwork to version 0.3.1. Previous versions will not be supported #2181

    • Added a new callback parameter for explain_predictions_best_worst #2308

  • Fixes

  • Changes
    • Deleted the return_pandas flag from our demo data loaders #2181

    • Moved default_parameters to ComponentGraph from PipelineBase #2307

  • Documentation Changes
    • Updated the release procedure documentation #2230

  • Testing Changes
    • Ignoring test_saving_png_file while building conda package #2323

Warning

Breaking Changes
  • Deleted the return_pandas flag from our demo data loaders #2181

  • Upgraded minimum woodwork to version 0.3.1. Previous versions will not be supported #2181

  • Due to the weak-ref in woodwork, set the result of infer_feature_types to a variable before accessing woodwork #2181

v0.24.2 May. 24, 2021
  • Enhancements
    • Added oversamplers to AutoMLSearch #2213 #2286

    • Added dictionary input functionality for Undersampler component #2271

    • Changed the default parameter values for Elastic Net Classifier and Elastic Net Regressor #2269

    • Added dictionary input functionality for the Oversampler components #2288

  • Fixes
    • Set default n_jobs to 1 for StackedEnsembleClassifier and StackedEnsembleRegressor until fix for text-based parallelism in sklearn stacking can be found #2295

  • Changes
    • Updated start_iteration_callback to accept a pipeline instance instead of a pipeline class and no longer accept pipeline parameters as a parameter #2290

    • Refactored calculate_permutation_importance method and add per-column permutation importance method #2302

    • Updated logging information in AutoMLSearch.__init__ to clarify pipeline generation #2263

  • Documentation Changes
    • Minor changes to the release procedure #2230

  • Testing Changes
    • Use codecov action to update coverage reports #2238

    • Removed MarkupSafe dependency version pin from requirements.txt and moved instead into RTD docs build CI #2261

Warning

Breaking Changes
  • Updated start_iteration_callback to accept a pipeline instance instead of a pipeline class and no longer accept pipeline parameters as a parameter #2290

  • Moved default_parameters to ComponentGraph from PipelineBase. A pipeline’s default_parameters is now accessible via pipeline.component_graph.default_parameters #2307

v0.24.1 May. 16, 2021
  • Enhancements
    • Integrated ARIMARegressor into AutoML #2009

    • Updated HighlyNullDataCheck to also perform a null row check #2222

    • Set max_depth to 1 in calls to featuretools dfs #2231

  • Fixes
    • Removed data splitter sampler calls during training #2253

    • Set minimum required version for for pyzmq, colorama, and docutils #2254

    • Changed BaseSampler to return None instead of y #2272

  • Changes
    • Removed ensemble split and indices in AutoMLSearch #2260

    • Updated pipeline repr() and generate_pipeline_code to return pipeline instances without generating custom pipeline class #2227

  • Documentation Changes
    • Capped Sphinx version under 4.0.0 #2244

  • Testing Changes
    • Change number of cores for pytest from 4 to 2 #2266

    • Add minimum dependency checker to generate minimum requirement files #2267

    • Add unit tests with minimum dependencies #2277

v0.24.0 May. 04, 2021
  • Enhancements
    • Added date_index as a required parameter for TimeSeries problems #2217

    • Have the OneHotEncoder return the transformed columns as booleans rather than floats #2170

    • Added Oversampler transformer component to EvalML #2079

    • Added Undersampler to AutoMLSearch, as well as arguments _sampler_method and sampler_balanced_ratio #2128

    • Updated prediction explanations functions to allow pipelines with XGBoost estimators #2162

    • Added partial dependence for datetime columns #2180

    • Update precision-recall curve with positive label index argument, and fix for 2d predicted probabilities #2090

    • Add pct_null_rows to HighlyNullDataCheck #2211

    • Added a standalone AutoML search method for convenience, which runs data checks and then runs automl #2152

    • Make the first batch of AutoML have a predefined order, with linear models first and complex models last #2223 #2225

    • Added sampling dictionary support to BalancedClassficationSampler #2235

  • Fixes
    • Fixed partial dependence not respecting grid resolution parameter for numerical features #2180

    • Enable prediction explanations for catboost for multiclass problems #2224

  • Changes
    • Deleted baseline pipeline classes #2202

    • Reverting user specified date feature PR #2155 until pmdarima installation fix is found #2214

    • Updated pipeline API to accept component graph and other class attributes as instance parameters. Old pipeline API still works but will not be supported long-term. #2091

    • Removed all old datasplitters from EvalML #2193

    • Deleted make_pipeline_from_components #2218

  • Documentation Changes
    • Renamed dataset to clarify that its gzipped but not a tarball #2183

    • Updated documentation to use pipeline instances instead of pipeline subclasses #2195

    • Updated contributing guide with a note about GitHub Actions permissions #2090

    • Updated automl and model understanding user guides #2090

  • Testing Changes
    • Use machineFL user token for dependency update bot, and add more reviewers #2189

Warning

Breaking Changes
  • All baseline pipeline classes (BaselineBinaryPipeline, BaselineMulticlassPipeline, BaselineRegressionPipeline, etc.) have been deleted #2202

  • Updated pipeline API to accept component graph and other class attributes as instance parameters. Old pipeline API still works but will not be supported long-term. Pipelines can now be initialized by specifying the component graph as the first parameter, and then passing in optional arguments such as custom_name, parameters, etc. For example, BinaryClassificationPipeline(["Random Forest Classifier"], parameters={}). #2091

  • Removed all old datasplitters from EvalML #2193

  • Deleted utility method make_pipeline_from_components #2218

v0.23.0 Apr. 20, 2021
  • Enhancements
    • Refactored EngineBase and SequentialEngine api. Adding DaskEngine #1975.

    • Added optional engine argument to AutoMLSearch #1975

    • Added a warning about how time series support is still in beta when a user passes in a time series problem to AutoMLSearch #2118

    • Added NaturalLanguageNaNDataCheck data check #2122

    • Added ValueError to partial_dependence to prevent users from computing partial dependence on columns with all NaNs #2120

    • Added standard deviation of cv scores to rankings table #2154

  • Fixes
    • Fixed BalancedClassificationDataCVSplit, BalancedClassificationDataTVSplit, and BalancedClassificationSampler to use minority:majority ratio instead of majority:minority #2077

    • Fixed bug where two-way partial dependence plots with categorical variables were not working correctly #2117

    • Fixed bug where hyperparameters were not displaying properly for pipelines with a list component_graph and duplicate components #2133

    • Fixed bug where pipeline_parameters argument in AutoMLSearch was not applied to pipelines passed in as allowed_pipelines #2133

    • Fixed bug where AutoMLSearch was not applying custom hyperparameters to pipelines with a list component_graph and duplicate components #2133

  • Changes
    • Removed hyperparameter_ranges from Undersampler and renamed balanced_ratio to sampling_ratio for samplers #2113

    • Renamed TARGET_BINARY_NOT_TWO_EXAMPLES_PER_CLASS data check message code to TARGET_MULTICLASS_NOT_TWO_EXAMPLES_PER_CLASS #2126

    • Modified one-way partial dependence plots of categorical features to display data with a bar plot #2117

    • Renamed score column for automl.rankings as mean_cv_score #2135

    • Remove ‘warning’ from docs tool output #2031

  • Documentation Changes
    • Fixed conf.py file #2112

    • Added a sentence to the automl user guide stating that our support for time series problems is still in beta. #2118

    • Fixed documentation demos #2139

    • Update test badge in README to use GitHub Actions #2150

  • Testing Changes
    • Fixed test_describe_pipeline for pandas v1.2.4 #2129

    • Added a GitHub Action for building the conda package #1870 #2148

Warning

Breaking Changes
  • Renamed balanced_ratio to sampling_ratio for the BalancedClassificationDataCVSplit, BalancedClassificationDataTVSplit, BalancedClassficationSampler, and Undersampler #2113

  • Deleted the “errors” key from automl results #1975

  • Deleted the raise_and_save_error_callback and the log_and_save_error_callback #1975

  • Fixed BalancedClassificationDataCVSplit, BalancedClassificationDataTVSplit, and BalancedClassificationSampler to use minority:majority ratio instead of majority:minority #2077

v0.22.0 Apr. 06, 2021
  • Enhancements
    • Added a GitHub Action for linux_unit_tests#2013

    • Added recommended actions for InvalidTargetDataCheck, updated _make_component_list_from_actions to address new action, and added TargetImputer component #1989

    • Updated AutoMLSearch._check_for_high_variance to not emit RuntimeWarning #2024

    • Added exception when pipeline passed to explain_predictions is a Stacked Ensemble pipeline #2033

    • Added sensitivity at low alert rates as an objective #2001

    • Added Undersampler transformer component #2030

  • Fixes
    • Updated Engine’s train_batch to apply undersampling #2038

    • Fixed bug in where Time Series Classification pipelines were not encoding targets in predict and predict_proba #2040

    • Fixed data splitting errors if target is float for classification problems #2050

    • Pinned docutils to <0.17 to fix ReadtheDocs warning issues #2088

  • Changes
    • Removed lists as acceptable hyperparameter ranges in AutoMLSearch #2028

    • Renamed “details” to “metadata” for data check actions #2008

  • Documentation Changes
    • Catch and suppress warnings in documentation #1991 #2097

    • Change spacing in start.ipynb to provide clarity for AutoMLSearch #2078

    • Fixed start code on README #2108

  • Testing Changes

v0.21.0 Mar. 24, 2021
  • Enhancements
    • Changed AutoMLSearch to default optimize_thresholds to True #1943

    • Added multiple oversampling and undersampling sampling methods as data splitters for imbalanced classification #1775

    • Added params to balanced classification data splitters for visibility #1966

    • Updated make_pipeline to not add Imputer if input data does not have numeric or categorical columns #1967

    • Updated ClassImbalanceDataCheck to better handle multiclass imbalances #1986

    • Added recommended actions for the output of data check’s validate method #1968

    • Added error message for partial_dependence when features are mostly the same value #1994

    • Updated OneHotEncoder to drop one redundant feature by default for features with two categories #1997

    • Added a PolynomialDetrender component #1992

    • Added DateTimeNaNDataCheck data check #2039

  • Fixes
    • Changed best pipeline to train on the entire dataset rather than just ensemble indices for ensemble problems #2037

    • Updated binary classification pipelines to use objective decision function during scoring of custom objectives #1934

  • Changes
    • Removed data_checks parameter, data_check_results and data checks logic from AutoMLSearch #1935

    • Deleted random_state argument #1985

    • Updated Woodwork version requirement to v0.0.11 #1996

  • Documentation Changes

  • Testing Changes
    • Removed build_docs CI job in favor of RTD GH builder #1974

    • Added tests to confirm support for Python 3.9 #1724

    • Added tests to support Dask AutoML/Engine #1990

    • Changed build_conda_pkg job to use latest_release_changes branch in the feedstock. #1979

Warning

Breaking Changes
  • Changed AutoMLSearch to default optimize_thresholds to True #1943

  • Removed data_checks parameter, data_check_results and data checks logic from AutoMLSearch. To run the data checks which were previously run by default in AutoMLSearch, please call DefaultDataChecks().validate(X_train, y_train) or take a look at our documentation for more examples. #1935

  • Deleted random_state argument #1985

v0.20.0 Mar. 10, 2021
  • Enhancements
    • Added a GitHub Action for Detecting dependency changes #1933

    • Create a separate CV split to train stacked ensembler on for AutoMLSearch #1814

    • Added a GitHub Action for Linux unit tests #1846

    • Added ARIMARegressor estimator #1894

    • Added DataCheckAction class and DataCheckActionCode enum #1896

    • Updated Woodwork requirement to v0.0.10 #1900

    • Added BalancedClassificationDataCVSplit and BalancedClassificationDataTVSplit to AutoMLSearch #1875

    • Update default classification data splitter to use downsampling for highly imbalanced data #1875

    • Updated describe_pipeline to return more information, including id of pipelines used for ensemble models #1909

    • Added utility method to create list of components from a list of DataCheckAction #1907

    • Updated validate method to include a action key in returned dictionary for all DataCheck``and ``DataChecks #1916

    • Aggregating the shap values for predictions that we know the provenance of, e.g. OHE, text, and date-time. #1901

    • Improved error message when custom objective is passed as a string in pipeline.score #1941

    • Added score_pipelines and train_pipelines methods to AutoMLSearch #1913

    • Added support for pandas version 1.2.0 #1708

    • Added score_batch and train_batch abstact methods to EngineBase and implementations in SequentialEngine #1913

    • Added ability to handle index columns in AutoMLSearch and DataChecks #2138

  • Fixes
    • Removed CI check for check_dependencies_updated_linux #1950

    • Added metaclass for time series pipelines and fix binary classification pipeline predict not using objective if it is passed as a named argument #1874

    • Fixed stack trace in prediction explanation functions caused by mixed string/numeric pandas column names #1871

    • Fixed stack trace caused by passing pipelines with duplicate names to AutoMLSearch #1932

    • Fixed AutoMLSearch.get_pipelines returning pipelines with the same attributes #1958

  • Changes
    • Reversed GitHub Action for Linux unit tests until a fix for report generation is found #1920

    • Updated add_results in AutoMLAlgorithm to take in entire pipeline results dictionary from AutoMLSearch #1891

    • Updated ClassImbalanceDataCheck to look for severe class imbalance scenarios #1905

    • Deleted the explain_prediction function #1915

    • Removed HighVarianceCVDataCheck and convered it to an AutoMLSearch method instead #1928

    • Removed warning in InvalidTargetDataCheck returned when numeric binary classification targets are not (0, 1) #1959

  • Documentation Changes
    • Updated model_understanding.ipynb to demo the two-way partial dependence capability #1919

  • Testing Changes

Warning

Breaking Changes
  • Deleted the explain_prediction function #1915

  • Removed HighVarianceCVDataCheck and convered it to an AutoMLSearch method instead #1928

  • Added score_batch and train_batch abstact methods to EngineBase. These need to be implemented in Engine subclasses #1913

v0.19.0 Feb. 23, 2021
  • Enhancements
    • Added a GitHub Action for Python windows unit tests #1844

    • Added a GitHub Action for checking updated release notes #1849

    • Added a GitHub Action for Python lint checks #1837

    • Adjusted explain_prediction, explain_predictions and explain_predictions_best_worst to handle timeseries problems. #1818

    • Updated InvalidTargetDataCheck to check for mismatched indices in target and features #1816

    • Updated Woodwork structures returned from components to support Woodwork logical type overrides set by the user #1784

    • Updated estimators to keep track of input feature names during fit() #1794

    • Updated visualize_decision_tree to include feature names in output #1813

    • Added is_bounded_like_percentage property for objectives. If true, the calculate_percent_difference method will return the absolute difference rather than relative difference #1809

    • Added full error traceback to AutoMLSearch logger file #1840

    • Changed TargetEncoder to preserve custom indices in the data #1836

    • Refactored explain_predictions and explain_predictions_best_worst to only compute features once for all rows that need to be explained #1843

    • Added custom random undersampler data splitter for classification #1857

    • Updated OutliersDataCheck implementation to calculate the probability of having no outliers #1855

    • Added Engines pipeline processing API #1838

  • Fixes
    • Changed EngineBase random_state arg to random_seed and same for user guide docs #1889

  • Changes
    • Modified calculate_percent_difference so that division by 0 is now inf rather than nan #1809

    • Removed text_columns parameter from LSA and TextFeaturizer components #1652

    • Added random_seed as an argument to our automl/pipeline/component API. Using random_state will raise a warning #1798

    • Added DataCheckError message in InvalidTargetDataCheck if input target is None and removed exception raised #1866

  • Documentation Changes

  • Testing Changes
    • Added back coverage for _get_feature_provenance in TextFeaturizer after text_columns was removed #1842

    • Pin graphviz version for windows builds #1847

    • Unpin graphviz version for windows builds #1851

Warning

Breaking Changes
  • Added a deprecation warning to explain_prediction. It will be deleted in the next release. #1860

v0.18.2 Feb. 10, 2021
  • Enhancements
    • Added uniqueness score data check #1785

    • Added “dataframe” output format for prediction explanations #1781

    • Updated LightGBM estimators to handle pandas.MultiIndex #1770

    • Sped up permutation importance for some pipelines #1762

    • Added sparsity data check #1797

    • Confirmed support for threshold tuning for binary time series classification problems #1803

  • Fixes

  • Changes

  • Documentation Changes
    • Added section on conda to the contributing guide #1771

    • Updated release process to reflect freezing main before perf tests #1787

    • Moving some prs to the right section of the release notes #1789

    • Tweak README.md. #1800

    • Fixed back arrow on install page docs #1795

    • Fixed docstring for ClassImbalanceDataCheck.validate() #1817

  • Testing Changes

v0.18.1 Feb. 1, 2021
  • Enhancements
    • Added graph_t_sne as a visualization tool for high dimensional data #1731

    • Added the ability to see the linear coefficients of features in linear models terms #1738

    • Added support for scikit-learn v0.24.0 #1733

    • Added support for scipy v1.6.0 #1752

    • Added SVM Classifier and Regressor to estimators #1714 #1761

  • Fixes
    • Addressed bug with partial_dependence and categorical data with more categories than grid resolution #1748

    • Removed random_state arg from get_pipelines in AutoMLSearch #1719

    • Pinned pyzmq at less than 22.0.0 till we add support #1756

  • Changes
    • Updated components and pipelines to return Woodwork data structures #1668

    • Updated clone() for pipelines and components to copy over random state automatically #1753

    • Dropped support for Python version 3.6 #1751

    • Removed deprecated verbose flag from AutoMLSearch parameters #1772

  • Documentation Changes
    • Add Twitter and Github link to documentation toolbar #1754

    • Added Open Graph info to documentation #1758

  • Testing Changes

Warning

Breaking Changes
  • Components and pipelines return Woodwork data structures instead of pandas data structures #1668

  • Python 3.6 will not be actively supported due to discontinued support from EvalML dependencies.

  • Deprecated verbose flag is removed for AutoMLSearch #1772

v0.18.0 Jan. 26, 2021
  • Enhancements
    • Added RMSLE, MSLE, and MAPE to core objectives while checking for negative target values in invalid_targets_data_check #1574

    • Added validation checks for binary problems with regression-like datasets and multiclass problems without true multiclass targets in invalid_targets_data_check #1665

    • Added time series support for make_pipeline #1566

    • Added target name for output of pipeline predict method #1578

    • Added multiclass check to InvalidTargetDataCheck for two examples per class #1596

    • Added support for graphviz v0.16 #1657

    • Enhanced time series pipelines to accept empty features #1651

    • Added KNN Classifier to estimators. #1650

    • Added support for list inputs for objectives #1663

    • Added support for AutoMLSearch to handle time series classification pipelines #1666

    • Enhanced DelayedFeaturesTransformer to encode categorical features and targets before delaying them #1691

    • Added 2-way dependence plots. #1690

    • Added ability to directly iterate through components within Pipelines #1583

  • Fixes
    • Fixed inconsistent attributes and added Exceptions to docs #1673

    • Fixed TargetLeakageDataCheck to use Woodwork mutual_information rather than using Pandas’ Pearson Correlation #1616

    • Fixed thresholding for pipelines in AutoMLSearch to only threshold binary classification pipelines #1622 #1626

    • Updated load_data to return Woodwork structures and update default parameter value for index to None #1610

    • Pinned scipy at < 1.6.0 while we work on adding support #1629

    • Fixed data check message formatting in AutoMLSearch #1633

    • Addressed stacked ensemble component for scikit-learn v0.24 support by setting shuffle=True for default CV #1613

    • Fixed bug where Imputer reset the index on X #1590

    • Fixed AutoMLSearch stacktrace when a cutom objective was passed in as a primary objective or additional objective #1575

    • Fixed custom index bug for MAPE objective #1641

    • Fixed index bug for TextFeaturizer and LSA components #1644

    • Limited load_fraud dataset loaded into automl.ipynb #1646

    • add_to_rankings updates AutoMLSearch.best_pipeline when necessary #1647

    • Fixed bug where time series baseline estimators were not receiving gap and max_delay in AutoMLSearch #1645

    • Fixed jupyter notebooks to help the RTD buildtime #1654

    • Added positive_only objectives to non_core_objectives #1661

    • Fixed stacking argument n_jobs for IterativeAlgorithm #1706

    • Updated CatBoost estimators to return self in .fit() rather than the underlying model for consistency #1701

    • Added ability to initialize pipeline parameters in AutoMLSearch constructor #1676

  • Changes
    • Added labeling to graph_confusion_matrix #1632

    • Rerunning search for AutoMLSearch results in a message thrown rather than failing the search, and removed has_searched property #1647

    • Changed tuner class to allow and ignore single parameter values as input #1686

    • Capped LightGBM version limit to remove bug in docs #1711

    • Removed support for np.random.RandomState in EvalML #1727

  • Documentation Changes
    • Update Model Understanding in the user guide to include visualize_decision_tree #1678

    • Updated docs to include information about AutoMLSearch callback parameters and methods #1577

    • Updated docs to prompt users to install graphiz on Mac #1656

    • Added infer_feature_types to the start.ipynb guide #1700

    • Added multicollinearity data check to API reference and docs #1707

  • Testing Changes

Warning

Breaking Changes
  • Removed has_searched property from AutoMLSearch #1647

  • Components and pipelines return Woodwork data structures instead of pandas data structures #1668

  • Removed support for np.random.RandomState in EvalML. Rather than passing np.random.RandomState as component and pipeline random_state values, we use int random_seed #1727

v0.17.0 Dec. 29, 2020
  • Enhancements
    • Added save_plot that allows for saving figures from different backends #1588

    • Added LightGBM Regressor to regression components #1459

    • Added visualize_decision_tree for tree visualization with decision_tree_data_from_estimator and decision_tree_data_from_pipeline to reformat tree structure output #1511

    • Added DFS Transformer component into transformer components #1454

    • Added MAPE to the standard metrics for time series problems and update objectives #1510

    • Added graph_prediction_vs_actual_over_time and get_prediction_vs_actual_over_time_data to the model understanding module for time series problems #1483

    • Added a ComponentGraph class that will support future pipelines as directed acyclic graphs #1415

    • Updated data checks to accept Woodwork data structures #1481

    • Added parameter to InvalidTargetDataCheck to show only top unique values rather than all unique values #1485

    • Added multicollinearity data check #1515

    • Added baseline pipeline and components for time series regression problems #1496

    • Added more information to users about ensembling behavior in AutoMLSearch #1527

    • Add woodwork support for more utility and graph methods #1544

    • Changed DateTimeFeaturizer to encode features as int #1479

    • Return trained pipelines from AutoMLSearch.best_pipeline #1547

    • Added utility method so that users can set feature types without having to learn about Woodwork directly #1555

    • Added Linear Discriminant Analysis transformer for dimensionality reduction #1331

    • Added multiclass support for partial_dependence and graph_partial_dependence #1554

    • Added TimeSeriesBinaryClassificationPipeline and TimeSeriesMulticlassClassificationPipeline classes #1528

    • Added make_data_splitter method for easier automl data split customization #1568

    • Integrated ComponentGraph class into Pipelines for full non-linear pipeline support #1543

    • Update AutoMLSearch constructor to take training data instead of search and add_to_leaderboard #1597

    • Update split_data helper args #1597

    • Add problem type utils is_regression, is_classification, is_timeseries #1597

    • Rename AutoMLSearch data_split arg to data_splitter #1569

  • Fixes
    • Fix AutoML not passing CV folds to DefaultDataChecks for usage by ClassImbalanceDataCheck #1619

    • Fix Windows CI jobs: install numba via conda, required for shap #1490

    • Added custom-index support for reset-index-get_prediction_vs_actual_over_time_data #1494

    • Fix generate_pipeline_code to account for boolean and None differences between Python and JSON #1524 #1531

    • Set max value for plotly and xgboost versions while we debug CI failures with newer versions #1532

    • Undo version pinning for plotly #1533

    • Fix ReadTheDocs build by updating the version of setuptools #1561

    • Set random_state of data splitter in AutoMLSearch to take int to keep consistency in the resulting splits #1579

    • Pin sklearn version while we work on adding support #1594

    • Pin pandas at <1.2.0 while we work on adding support #1609

    • Pin graphviz at < 0.16 while we work on adding support #1609

  • Changes
    • Reverting save_graph #1550 to resolve kaleido build issues #1585

    • Update circleci badge to apply to main #1489

    • Added script to generate github markdown for releases #1487

    • Updated selection using pandas dtypes to selecting using Woodwork logical types #1551

    • Updated dependencies to fix ImportError: cannot import name 'MaskedArray' from 'sklearn.utils.fixes' error and to address Woodwork and Featuretool dependencies #1540

    • Made get_prediction_vs_actual_data() a public method #1553

    • Updated Woodwork version requirement to v0.0.7 #1560

    • Move data splitters from evalml.automl.data_splitters to evalml.preprocessing.data_splitters #1597

    • Rename “# Testing” in automl log output to “# Validation” #1597

  • Documentation Changes
    • Added partial dependence methods to API reference #1537

    • Updated documentation for confusion matrix methods #1611

  • Testing Changes
    • Set n_jobs=1 in most unit tests to reduce memory #1505

Warning

Breaking Changes
  • Updated minimal dependencies: numpy>=1.19.1, pandas>=1.1.0, scikit-learn>=0.23.1, scikit-optimize>=0.8.1

  • Updated AutoMLSearch.best_pipeline to return a trained pipeline. Pass in train_best_pipeline=False to AutoMLSearch in order to return an untrained pipeline.

  • Pipeline component instances can no longer be iterated through using Pipeline.component_graph #1543

  • Update AutoMLSearch constructor to take training data instead of search and add_to_leaderboard #1597

  • Update split_data helper args #1597

  • Move data splitters from evalml.automl.data_splitters to evalml.preprocessing.data_splitters #1597

  • Rename AutoMLSearch data_split arg to data_splitter #1569

v0.16.1 Dec. 1, 2020
  • Enhancements
    • Pin woodwork version to v0.0.6 to avoid breaking changes #1484

    • Updated Woodwork to >=0.0.5 in core-requirements.txt #1473

    • Removed copy_dataframe parameter for Woodwork, updated Woodwork to >=0.0.6 in core-requirements.txt #1478

    • Updated detect_problem_type to use pandas.api.is_numeric_dtype #1476

  • Changes
    • Changed make clean to delete coverage reports as a convenience for developers #1464

    • Set n_jobs=-1 by default for stacked ensemble components #1472

  • Documentation Changes
    • Updated pipeline and component documentation and demos to use Woodwork #1466

  • Testing Changes
    • Update dependency update checker to use everything from core and optional dependencies #1480

v0.16.0 Nov. 24, 2020
  • Enhancements
    • Updated pipelines and make_pipeline to accept Woodwork inputs #1393

    • Updated components to accept Woodwork inputs #1423

    • Added ability to freeze hyperparameters for AutoMLSearch #1284

    • Added Target Encoder into transformer components #1401

    • Added callback for error handling in AutoMLSearch #1403

    • Added the index id to the explain_predictions_best_worst output to help users identify which rows in their data are included #1365

    • The top_k features displayed in explain_predictions_* functions are now determined by the magnitude of shap values as opposed to the top_k largest and smallest shap values. #1374

    • Added a problem type for time series regression #1386

    • Added a is_defined_for_problem_type method to ObjectiveBase #1386

    • Added a random_state parameter to make_pipeline_from_components function #1411

    • Added DelayedFeaturesTransformer #1396

    • Added a TimeSeriesRegressionPipeline class #1418

    • Removed core-requirements.txt from the package distribution #1429

    • Updated data check messages to include a “code” and “details” fields #1451, #1462

    • Added a TimeSeriesSplit data splitter for time series problems #1441

    • Added a problem_configuration parameter to AutoMLSearch #1457

  • Fixes
    • Fixed IndexError raised in AutoMLSearch when ensembling = True but only one pipeline to iterate over #1397

    • Fixed stacked ensemble input bug and LightGBM warning and bug in AutoMLSearch #1388

    • Updated enum classes to show possible enum values as attributes #1391

    • Updated calls to Woodwork’s to_pandas() to to_series() and to_dataframe() #1428

    • Fixed bug in OHE where column names were not guaranteed to be unique #1349

    • Fixed bug with percent improvement of ExpVariance objective on data with highly skewed target #1467

    • Fix SimpleImputer error which occurs when all features are bool type #1215

  • Changes
    • Changed OutliersDataCheck to return the list of columns, rather than rows, that contain outliers #1377

    • Simplified and cleaned output for Code Generation #1371

    • Reverted changes from #1337 #1409

    • Updated data checks to return dictionary of warnings and errors instead of a list #1448

    • Updated AutoMLSearch to pass Woodwork data structures to every pipeline (instead of pandas DataFrames) #1450

    • Update AutoMLSearch to default to max_batches=1 instead of max_iterations=5 #1452

    • Updated _evaluate_pipelines to consolidate side effects #1410

  • Documentation Changes
    • Added description of CLA to contributing guide, updated description of draft PRs #1402

    • Updated documentation to include all data checks, DataChecks, and usage of data checks in AutoML #1412

    • Updated docstrings from np.array to np.ndarray #1417

    • Added section on stacking ensembles in AutoMLSearch documentation #1425

  • Testing Changes
    • Removed category_encoders from test-requirements.txt #1373

    • Tweak codecov.io settings again to avoid flakes #1413

    • Modified make lint to check notebook versions in the docs #1431

    • Modified make lint-fix to standardize notebook versions in the docs #1431

    • Use new version of pull request Github Action for dependency check (#1443)

    • Reduced number of workers for tests to 4 #1447

Warning

Breaking Changes
  • The top_k and top_k_features parameters in explain_predictions_* functions now return k features as opposed to 2 * k features #1374

  • Renamed problem_type to problem_types in RegressionObjective, BinaryClassificationObjective, and MulticlassClassificationObjective #1319

  • Data checks now return a dictionary of warnings and errors instead of a list #1448

v0.15.0 Oct. 29, 2020
  • Enhancements
    • Added stacked ensemble component classes (StackedEnsembleClassifier, StackedEnsembleRegressor) #1134

    • Added stacked ensemble components to AutoMLSearch #1253

    • Added DecisionTreeClassifier and DecisionTreeRegressor to AutoML #1255

    • Added graph_prediction_vs_actual in model_understanding for regression problems #1252

    • Added parameter to OneHotEncoder to enable filtering for features to encode for #1249

    • Added percent-better-than-baseline for all objectives to automl.results #1244

    • Added HighVarianceCVDataCheck and replaced synonymous warning in AutoMLSearch #1254

    • Added PCA Transformer component for dimensionality reduction #1270

    • Added generate_pipeline_code and generate_component_code to allow for code generation given a pipeline or component instance #1306

    • Added PCA Transformer component for dimensionality reduction #1270

    • Updated AutoMLSearch to support Woodwork data structures #1299

    • Added cv_folds to ClassImbalanceDataCheck and added this check to DefaultDataChecks #1333

    • Make max_batches argument to AutoMLSearch.search public #1320

    • Added text support to automl search #1062

    • Added _pipelines_per_batch as a private argument to AutoMLSearch #1355

  • Fixes
    • Fixed ML performance issue with ordered datasets: always shuffle data in automl’s default CV splits #1265

    • Fixed broken evalml info CLI command #1293

    • Fixed boosting type='rf' for LightGBM Classifier, as well as num_leaves error #1302

    • Fixed bug in explain_predictions_best_worst where a custom index in the target variable would cause a ValueError #1318

    • Added stacked ensemble estimators to to evalml.pipelines.__init__ file #1326

    • Fixed bug in OHE where calls to transform were not deterministic if top_n was less than the number of categories in a column #1324

    • Fixed LightGBM warning messages during AutoMLSearch #1342

    • Fix warnings thrown during AutoMLSearch in HighVarianceCVDataCheck #1346

    • Fixed bug where TrainingValidationSplit would return invalid location indices for dataframes with a custom index #1348

    • Fixed bug where the AutoMLSearch random_state was not being passed to the created pipelines #1321

  • Changes
    • Allow add_to_rankings to be called before AutoMLSearch is called #1250

    • Removed Graphviz from test-requirements to add to requirements.txt #1327

    • Removed max_pipelines parameter from AutoMLSearch #1264

    • Include editable installs in all install make targets #1335

    • Made pip dependencies featuretools and nlp_primitives core dependencies #1062

    • Removed PartOfSpeechCount from TextFeaturizer transform primitives #1062

    • Added warning for partial_dependency when the feature includes null values #1352

  • Documentation Changes
    • Fixed and updated code blocks in Release Notes #1243

    • Added DecisionTree estimators to API Reference #1246

    • Changed class inheritance display to flow vertically #1248

    • Updated cost-benefit tutorial to use a holdout/test set #1159

    • Added evalml info command to documentation #1293

    • Miscellaneous doc updates #1269

    • Removed conda pre-release testing from the release process document #1282

    • Updates to contributing guide #1310

    • Added Alteryx footer to docs with Twitter and Github link #1312

    • Added documentation for evalml installation for Python 3.6 #1322

    • Added documentation changes to make the API Docs easier to understand #1323

    • Fixed documentation for feature_importance #1353

    • Added tutorial for running AutoML with text data #1357

    • Added documentation for woodwork integration with automl search #1361

  • Testing Changes
    • Added tests for jupyter_check to handle IPython #1256

    • Cleaned up make_pipeline tests to test for all estimators #1257

    • Added a test to check conda build after merge to main #1247

    • Removed code that was lacking codecov for __main__.py and unnecessary #1293

    • Codecov: round coverage up instead of down #1334

    • Add DockerHub credentials to CI testing environment #1356

    • Add DockerHub credentials to conda testing environment #1363

Warning

Breaking Changes
  • Renamed LabelLeakageDataCheck to TargetLeakageDataCheck #1319

  • max_pipelines parameter has been removed from AutoMLSearch. Please use max_iterations instead. #1264

  • AutoMLSearch.search() will now log a warning if the input is not a Woodwork data structure (pandas, numpy) #1299

  • Make max_batches argument to AutoMLSearch.search public #1320

  • Removed unused argument feature_types from AutoMLSearch.search #1062

v0.14.1 Sep. 29, 2020
  • Enhancements
    • Updated partial dependence methods to support calculating numeric columns in a dataset with non-numeric columns #1150

    • Added get_feature_names on OneHotEncoder #1193

    • Added detect_problem_type to problem_type/utils.py to automatically detect the problem type given targets #1194

    • Added LightGBM to AutoMLSearch #1199

    • Updated scikit-learn and scikit-optimize to use latest versions - 0.23.2 and 0.8.1 respectively #1141

    • Added __str__ and __repr__ for pipelines and components #1218

    • Included internal target check for both training and validation data in AutoMLSearch #1226

    • Added ProblemTypes.all_problem_types helper to get list of supported problem types #1219

    • Added DecisionTreeClassifier and DecisionTreeRegressor classes #1223

    • Added ProblemTypes.all_problem_types helper to get list of supported problem types #1219

    • DataChecks can now be parametrized by passing a list of DataCheck classes and a parameter dictionary #1167

    • Added first CV fold score as validation score in AutoMLSearch.rankings #1221

    • Updated flake8 configuration to enable linting on __init__.py files #1234

    • Refined make_pipeline_from_components implementation #1204

  • Fixes
    • Updated GitHub URL after migration to Alteryx GitHub org #1207

    • Changed Problem Type enum to be more similar to the string name #1208

    • Wrapped call to scikit-learn’s partial dependence method in a try/finally block #1232

  • Changes
    • Added allow_writing_files as a named argument to CatBoost estimators. #1202

    • Added solver and multi_class as named arguments to LogisticRegressionClassifier #1202

    • Replaced pipeline’s ._transform method to evaluate all the preprocessing steps of a pipeline with .compute_estimator_features #1231

    • Changed default large dataset train/test splitting behavior #1205

  • Documentation Changes
    • Included description of how to access the component instances and features for pipeline user guide #1163

    • Updated API docs to refer to target as “target” instead of “labels” for non-classification tasks and minor docs cleanup #1160

    • Added Class Imbalance Data Check to api_reference.rst #1190 #1200

    • Added pipeline properties to API reference #1209

    • Clarified what the objective parameter in AutoML is used for in AutoML API reference and AutoML user guide #1222

    • Updated API docs to include skopt.space.Categorical option for component hyperparameter range definition #1228

    • Added install documentation for libomp in order to use LightGBM on Mac #1233

    • Improved description of max_iterations in documentation #1212

    • Removed unused code from sphinx conf #1235

  • Testing Changes

Warning

Breaking Changes
  • DefaultDataChecks now accepts a problem_type parameter that must be specified #1167

  • Pipeline’s ._transform method to evaluate all the preprocessing steps of a pipeline has been replaced with .compute_estimator_features #1231

  • get_objectives has been renamed to get_core_objectives. This function will now return a list of valid objective instances #1230

v0.13.2 Sep. 17, 2020
  • Enhancements
    • Added output_format field to explain predictions functions #1107

    • Modified get_objective and get_objectives to be able to return any objective in evalml.objectives #1132

    • Added a return_instance boolean parameter to get_objective #1132

    • Added ClassImbalanceDataCheck to determine whether target imbalance falls below a given threshold #1135

    • Added label encoder to LightGBM for binary classification #1152

    • Added labels for the row index of confusion matrix #1154

    • Added AutoMLSearch object as another parameter in search callbacks #1156

    • Added the corresponding probability threshold for each point displayed in graph_roc_curve #1161

    • Added __eq__ for ComponentBase and PipelineBase #1178

    • Added support for multiclass classification for roc_curve #1164

    • Added categories accessor to OneHotEncoder for listing the categories associated with a feature #1182

    • Added utility function to create pipeline instances from a list of component instances #1176

  • Fixes
    • Fixed XGBoost column names for partial dependence methods #1104

    • Removed dead code validating column type from TextFeaturizer #1122

    • Fixed issue where Imputer cannot fit when there is None in a categorical or boolean column #1144

    • OneHotEncoder preserves the custom index in the input data #1146

    • Fixed representation for ModelFamily #1165

    • Removed duplicate nbsphinx dependency in dev-requirements.txt #1168

    • Users can now pass in any valid kwargs to all estimators #1157

    • Remove broken accessor OneHotEncoder.get_feature_names and unneeded base class #1179

    • Removed LightGBM Estimator from AutoML models #1186

  • Changes
    • Pinned scikit-optimize version to 0.7.4 #1136

    • Removed tqdm as a dependency #1177

    • Added lightgbm version 3.0.0 to latest_dependency_versions.txt #1185

    • Rename max_pipelines to max_iterations #1169

  • Documentation Changes
    • Fixed API docs for AutoMLSearch add_result_callback #1113

    • Added a step to our release process for pushing our latest version to conda-forge #1118

    • Added warning for missing ipywidgets dependency for using PipelineSearchPlots on Jupyterlab #1145

    • Updated README.md example to load demo dataset #1151

    • Swapped mapping of breast cancer targets in model_understanding.ipynb #1170

  • Testing Changes
    • Added test confirming TextFeaturizer never outputs null values #1122

    • Changed Python version of Update Dependencies action to 3.8.x #1137

    • Fixed release notes check-in test for Update Dependencies actions #1172

Warning

Breaking Changes
  • get_objective will now return a class definition rather than an instance by default #1132

  • Deleted OPTIONS dictionary in evalml.objectives.utils.py #1132

  • If specifying an objective by string, the string must now match the objective’s name field, case-insensitive #1132

  • Passing “Cost Benefit Matrix”, “Fraud Cost”, “Lead Scoring”, “Mean Squared Log Error”,

    “Recall”, “Recall Macro”, “Recall Micro”, “Recall Weighted”, or “Root Mean Squared Log Error” to AutoMLSearch will now result in a ValueError rather than an ObjectiveNotFoundError #1132

  • Search callbacks start_iteration_callback and add_results_callback have changed to include a copy of the AutoMLSearch object as a third parameter #1156

  • Deleted OneHotEncoder.get_feature_names method which had been broken for a while, in favor of pipelines’ input_feature_names #1179

  • Deleted empty base class CategoricalEncoder which OneHotEncoder component was inheriting from #1176

  • Results from roc_curve will now return as a list of dictionaries with each dictionary representing a class #1164

  • max_pipelines now raises a DeprecationWarning and will be removed in the next release. max_iterations should be used instead. #1169

v0.13.1 Aug. 25, 2020
  • Enhancements
    • Added Cost-Benefit Matrix objective for binary classification #1038

    • Split fill_value into categorical_fill_value and numeric_fill_value for Imputer #1019

    • Added explain_predictions and explain_predictions_best_worst for explaining multiple predictions with SHAP #1016

    • Added new LSA component for text featurization #1022

    • Added guide on installing with conda #1041

    • Added a “cost-benefit curve” util method to graph cost-benefit matrix scores vs. binary classification thresholds #1081

    • Standardized error when calling transform/predict before fit for pipelines #1048

    • Added percent_better_than_baseline to AutoML search rankings and full rankings table #1050

    • Added one-way partial dependence and partial dependence plots #1079

    • Added “Feature Value” column to prediction explanation reports. #1064

    • Added LightGBM classification estimator #1082, #1114

    • Added max_batches parameter to AutoMLSearch #1087

  • Fixes
    • Updated TextFeaturizer component to no longer require an internet connection to run #1022

    • Fixed non-deterministic element of TextFeaturizer transformations #1022

    • Added a StandardScaler to all ElasticNet pipelines #1065

    • Updated cost-benefit matrix to normalize score #1099

    • Fixed logic in calculate_percent_difference so that it can handle negative values #1100

  • Changes
    • Added needs_fitting property to ComponentBase #1044

    • Updated references to data types to use datatype lists defined in evalml.utils.gen_utils #1039

    • Remove maximum version limit for SciPy dependency #1051

    • Moved all_components and other component importers into runtime methods #1045

    • Consolidated graphing utility methods under evalml.utils.graph_utils #1060

    • Made slight tweaks to how TextFeaturizer uses featuretools, and did some refactoring of that and of LSA #1090

    • Changed show_all_features parameter into importance_threshold, which allows for thresholding feature importance #1097, #1103

  • Documentation Changes
    • Update setup.py URL to point to the github repo #1037

    • Added tutorial for using the cost-benefit matrix objective #1088

    • Updated model_understanding.ipynb to include documentation for using plotly on Jupyter Lab #1108

  • Testing Changes
    • Refactor CircleCI tests to use matrix jobs (#1043)

    • Added a test to check that all test directories are included in evalml package #1054

Warning

Breaking Changes
  • confusion_matrix and normalize_confusion_matrix have been moved to evalml.utils #1038

  • All graph utility methods previously under evalml.pipelines.graph_utils have been moved to evalml.utils.graph_utils #1060

v0.12.2 Aug. 6, 2020
  • Enhancements
    • Add save/load method to components #1023

    • Expose pickle protocol as optional arg to save/load #1023

    • Updated estimators used in AutoML to include ExtraTrees and ElasticNet estimators #1030

  • Fixes

  • Changes
    • Removed DeprecationWarning for SimpleImputer #1018

  • Documentation Changes
    • Add note about version numbers to release process docs #1034

  • Testing Changes
    • Test files are now included in the evalml package #1029

v0.12.0 Aug. 3, 2020
  • Enhancements
    • Added string and categorical targets support for binary and multiclass pipelines and check for numeric targets for DetectLabelLeakage data check #932

    • Added clear exception for regression pipelines if target datatype is string or categorical #960

    • Added target column names and class labels in predict and predict_proba output for pipelines #951

    • Added _compute_shap_values and normalize_values to pipelines/explanations module #958

    • Added explain_prediction feature which explains single predictions with SHAP #974

    • Added Imputer to allow different imputation strategies for numerical and categorical dtypes #991

    • Added support for configuring logfile path using env var, and don’t create logger if there are filesystem errors #975

    • Updated catboost estimators’ default parameters and automl hyperparameter ranges to speed up fit time #998

  • Fixes
    • Fixed ReadtheDocs warning failure regarding embedded gif #943

    • Removed incorrect parameter passed to pipeline classes in _add_baseline_pipelines #941

    • Added universal error for calling predict, predict_proba, transform, and feature_importances before fitting #969, #994

    • Made TextFeaturizer component and pip dependencies featuretools and nlp_primitives optional #976

    • Updated imputation strategy in automl to no longer limit impute strategy to most_frequent for all features if there are any categorical columns #991

    • Fixed UnboundLocalError for cv_pipeline when automl search errors #996

    • Fixed Imputer to reset dataframe index to preserve behavior expected from SimpleImputer #1009

  • Changes
    • Moved get_estimators to evalml.pipelines.components.utils #934

    • Modified Pipelines to raise PipelineScoreError when they encounter an error during scoring #936

    • Moved evalml.model_families.list_model_families to evalml.pipelines.components.allowed_model_families #959

    • Renamed DateTimeFeaturization to DateTimeFeaturizer #977

    • Added check to stop search and raise an error if all pipelines in a batch return NaN scores #1015

  • Documentation Changes
    • Updated README.md #963

    • Reworded message when errors are returned from data checks in search #982

    • Added section on understanding model predictions with explain_prediction to User Guide #981

    • Added a section to the user guide and api reference about how XGBoost and CatBoost are not fully supported. #992

    • Added custom components section in user guide #993

    • Updated FAQ section formatting #997

    • Updated release process documentation #1003

  • Testing Changes
    • Moved predict_proba and predict tests regarding string / categorical targets to test_pipelines.py #972

    • Fixed dependency update bot by updating python version to 3.7 to avoid frequent github version updates #1002

Warning

Breaking Changes
  • get_estimators has been moved to evalml.pipelines.components.utils (previously was under evalml.pipelines.utils) #934

  • Removed the raise_errors flag in AutoML search. All errors during pipeline evaluation will be caught and logged. #936

  • evalml.model_families.list_model_families has been moved to evalml.pipelines.components.allowed_model_families #959

  • TextFeaturizer: the featuretools and nlp_primitives packages must be installed after installing evalml in order to use this component #976

  • Renamed DateTimeFeaturization to DateTimeFeaturizer #977

v0.11.2 July 16, 2020
  • Enhancements
    • Added NoVarianceDataCheck to DefaultDataChecks #893

    • Added text processing and featurization component TextFeaturizer #913, #924

    • Added additional checks to InvalidTargetDataCheck to handle invalid target data types #929

    • AutoMLSearch will now handle KeyboardInterrupt and prompt user for confirmation #915

  • Fixes
    • Makes automl results a read-only property #919

  • Changes
    • Deleted static pipelines and refactored tests involving static pipelines, removed all_pipelines() and get_pipelines() #904

    • Moved list_model_families to evalml.model_family.utils #903

    • Updated all_pipelines, all_estimators, all_components to use the same mechanism for dynamically generating their elements #898

    • Rename master branch to main #918

    • Add pypi release github action #923

    • Updated AutoMLSearch.search stdout output and logging and removed tqdm progress bar #921

    • Moved automl config checks previously in search() to init #933

  • Documentation Changes
    • Reorganized and rewrote documentation #937

    • Updated to use pydata sphinx theme #937

    • Updated docs to use release_notes instead of changelog #942

  • Testing Changes
    • Cleaned up fixture names and usages in tests #895

Warning

Breaking Changes
  • list_model_families has been moved to evalml.model_family.utils (previously was under evalml.pipelines.utils) #903

  • get_estimators has been moved to evalml.pipelines.components.utils (previously was under evalml.pipelines.utils) #934

  • Static pipeline definitions have been removed, but similar pipelines can still be constructed via creating an instance of PipelineBase #904

  • all_pipelines() and get_pipelines() utility methods have been removed #904

v0.11.0 June 30, 2020
  • Enhancements
    • Added multiclass support for ROC curve graphing #832

    • Added preprocessing component to drop features whose percentage of NaN values exceeds a specified threshold #834

    • Added data check to check for problematic target labels #814

    • Added PerColumnImputer that allows imputation strategies per column #824

    • Added transformer to drop specific columns #827

    • Added support for categories, handle_error, and drop parameters in OneHotEncoder #830 #897

    • Added preprocessing component to handle DateTime columns featurization #838

    • Added ability to clone pipelines and components #842

    • Define getter method for component parameters #847

    • Added utility methods to calculate and graph permutation importances #860, #880

    • Added new utility functions necessary for generating dynamic preprocessing pipelines #852

    • Added kwargs to all components #863

    • Updated AutoSearchBase to use dynamically generated preprocessing pipelines #870

    • Added SelectColumns transformer #873

    • Added ability to evaluate additional pipelines for automl search #874

    • Added default_parameters class property to components and pipelines #879

    • Added better support for disabling data checks in automl search #892

    • Added ability to save and load AutoML objects to file #888

    • Updated AutoSearchBase.get_pipelines to return an untrained pipeline instance #876

    • Saved learned binary classification thresholds in automl results cv data dict #876

  • Fixes
    • Fixed bug where SimpleImputer cannot handle dropped columns #846

    • Fixed bug where PerColumnImputer cannot handle dropped columns #855

    • Enforce requirement that builtin components save all inputted values in their parameters dict #847

    • Don’t list base classes in all_components output #847

    • Standardize all components to output pandas data structures, and accept either pandas or numpy #853

    • Fixed rankings and full_rankings error when search has not been run #894

  • Changes
    • Update all_pipelines and all_components to try initializing pipelines/components, and on failure exclude them #849

    • Refactor handle_components to handle_components_class, standardize to ComponentBase subclass instead of instance #850

    • Refactor “blacklist”/”whitelist” to “allow”/”exclude” lists #854

    • Replaced AutoClassificationSearch and AutoRegressionSearch with AutoMLSearch #871

    • Renamed feature_importances and permutation_importances methods to use singular names (feature_importance and permutation_importance) #883

    • Updated automl default data splitter to train/validation split for large datasets #877

    • Added open source license, update some repo metadata #887

    • Removed dead code in _get_preprocessing_components #896

  • Documentation Changes
    • Fix some typos and update the EvalML logo #872

  • Testing Changes
    • Update the changelog check job to expect the new branching pattern for the deps update bot #836

    • Check that all components output pandas datastructures, and can accept either pandas or numpy #853

    • Replaced AutoClassificationSearch and AutoRegressionSearch with AutoMLSearch #871

Warning

Breaking Changes
  • Pipelines’ static component_graph field must contain either ComponentBase subclasses or str, instead of ComponentBase subclass instances #850

  • Rename handle_component to handle_component_class. Now standardizes to ComponentBase subclasses instead of ComponentBase subclass instances #850

  • Renamed automl’s cv argument to data_split #877

  • Pipelines’ and classifiers’ feature_importances is renamed feature_importance, graph_feature_importances is renamed graph_feature_importance #883

  • Passing data_checks=None to automl search will not perform any data checks as opposed to default checks. #892

  • Pipelines to search for in AutoML are now determined automatically, rather than using the statically-defined pipeline classes. #870

  • Updated AutoSearchBase.get_pipelines to return an untrained pipeline instance, instead of one which happened to be trained on the final cross-validation fold #876

v0.10.0 May 29, 2020
  • Enhancements
    • Added baseline models for classification and regression, add functionality to calculate baseline models before searching in AutoML #746

    • Port over highly-null guardrail as a data check and define DefaultDataChecks and DisableDataChecks classes #745

    • Update Tuner classes to work directly with pipeline parameters dicts instead of flat parameter lists #779

    • Add Elastic Net as a pipeline option #812

    • Added new Pipeline option ExtraTrees #790

    • Added precicion-recall curve metrics and plot for binary classification problems in evalml.pipeline.graph_utils #794

    • Update the default automl algorithm to search in batches, starting with default parameters for each pipeline and iterating from there #793

    • Added AutoMLAlgorithm class and IterativeAlgorithm impl, separated from AutoSearchBase #793

  • Fixes
    • Update pipeline score to return nan score for any objective which throws an exception during scoring #787

    • Fixed bug introduced in #787 where binary classification metrics requiring predicted probabilities error in scoring #798

    • CatBoost and XGBoost classifiers and regressors can no longer have a learning rate of 0 #795

  • Changes
    • Cleanup pipeline score code, and cleanup codecov #711

    • Remove pass for abstract methods for codecov #730

    • Added __str__ for AutoSearch object #675

    • Add util methods to graph ROC and confusion matrix #720

    • Refactor AutoBase to AutoSearchBase #758

    • Updated AutoBase with data_checks parameter, removed previous detect_label_leakage parameter, and added functionality to run data checks before search in AutoML #765

    • Updated our logger to use Python’s logging utils #763

    • Refactor most of AutoSearchBase._do_iteration impl into AutoSearchBase._evaluate #762

    • Port over all guardrails to use the new DataCheck API #789

    • Expanded import_or_raise to catch all exceptions #759

    • Adds RMSE, MSLE, RMSLE as standard metrics #788

    • Don’t allow Recall to be used as an objective for AutoML #784

    • Removed feature selection from pipelines #819

    • Update default estimator parameters to make automl search faster and more accurate #793

  • Documentation Changes
    • Add instructions to freeze master on release.md #726

    • Update release instructions with more details #727 #733

    • Add objective base classes to API reference #736

    • Fix components API to match other modules #747

  • Testing Changes
    • Delete codecov yml, use codecov.io’s default #732

    • Added unit tests for fraud cost, lead scoring, and standard metric objectives #741

    • Update codecov client #782

    • Updated AutoBase __str__ test to include no parameters case #783

    • Added unit tests for ExtraTrees pipeline #790

    • If codecov fails to upload, fail build #810

    • Updated Python version of dependency action #816

    • Update the dependency update bot to use a suffix when creating branches #817

Warning

Breaking Changes
  • The detect_label_leakage parameter for AutoML classes has been removed and replaced by a data_checks parameter #765

  • Moved ROC and confusion matrix methods from evalml.pipeline.plot_utils to evalml.pipeline.graph_utils #720

  • Tuner classes require a pipeline hyperparameter range dict as an init arg instead of a space definition #779

  • Tuner.propose and Tuner.add work directly with pipeline parameters dicts instead of flat parameter lists #779

  • PipelineBase.hyperparameters and custom_hyperparameters use pipeline parameters dict format instead of being represented as a flat list #779

  • All guardrail functions previously under evalml.guardrails.utils will be removed and replaced by data checks #789

  • Recall disallowed as an objective for AutoML #784

  • AutoSearchBase parameter tuner has been renamed to tuner_class #793

  • AutoSearchBase parameter possible_pipelines and possible_model_families have been renamed to allowed_pipelines and allowed_model_families #793

v0.9.0 Apr. 27, 2020
  • Enhancements
    • Added Accuracy as an standard objective #624

    • Added verbose parameter to load_fraud #560

    • Added Balanced Accuracy metric for binary, multiclass #612 #661

    • Added XGBoost regressor and XGBoost regression pipeline #666

    • Added Accuracy metric for multiclass #672

    • Added objective name in AutoBase.describe_pipeline #686

    • Added DataCheck and DataChecks, Message classes and relevant subclasses #739

  • Fixes
    • Removed direct access to cls.component_graph #595

    • Add testing files to .gitignore #625

    • Remove circular dependencies from Makefile #637

    • Add error case for normalize_confusion_matrix() #640

    • Fixed XGBoostClassifier and XGBoostRegressor bug with feature names that contain [, ], or < #659

    • Update make_pipeline_graph to not accidentally create empty file when testing if path is valid #649

    • Fix pip installation warning about docsutils version, from boto dependency #664

    • Removed zero division warning for F1/precision/recall metrics #671

    • Fixed summary for pipelines without estimators #707

  • Changes
    • Updated default objective for binary/multiclass classification to log loss #613

    • Created classification and regression pipeline subclasses and removed objective as an attribute of pipeline classes #405

    • Changed the output of score to return one dictionary #429

    • Created binary and multiclass objective subclasses #504

    • Updated objectives API #445

    • Removed call to get_plot_data from AutoML #615

    • Set raise_error to default to True for AutoML classes #638

    • Remove unnecessary “u” prefixes on some unicode strings #641

    • Changed one-hot encoder to return uint8 dtypes instead of ints #653

    • Pipeline _name field changed to custom_name #650

    • Removed graphs.py and moved methods into PipelineBase #657, #665

    • Remove s3fs as a dev dependency #664

    • Changed requirements-parser to be a core dependency #673

    • Replace supported_problem_types field on pipelines with problem_type attribute on base classes #678

    • Changed AutoML to only show best results for a given pipeline template in rankings, added full_rankings property to show all #682

    • Update ModelFamily values: don’t list xgboost/catboost as classifiers now that we have regression pipelines for them #677

    • Changed AutoML’s describe_pipeline to get problem type from pipeline instead #685

    • Standardize import_or_raise error messages #683

    • Updated argument order of objectives to align with sklearn’s #698

    • Renamed pipeline.feature_importance_graph to pipeline.graph_feature_importances #700

    • Moved ROC and confusion matrix methods to evalml.pipelines.plot_utils #704

    • Renamed MultiClassificationObjective to MulticlassClassificationObjective, to align with pipeline naming scheme #715

  • Documentation Changes
    • Fixed some sphinx warnings #593

    • Fixed docstring for AutoClassificationSearch with correct command #599

    • Limit readthedocs formats to pdf, not htmlzip and epub #594 #600

    • Clean up objectives API documentation #605

    • Fixed function on Exploring search results page #604

    • Update release process doc #567

    • AutoClassificationSearch and AutoRegressionSearch show inherited methods in API reference #651

    • Fixed improperly formatted code in breaking changes for changelog #655

    • Added configuration to treat Sphinx warnings as errors #660

    • Removed separate plotting section for pipelines in API reference #657, #665

    • Have leads example notebook load S3 files using https, so we can delete s3fs dev dependency #664

    • Categorized components in API reference and added descriptions for each category #663

    • Fixed Sphinx warnings about BalancedAccuracy objective #669

    • Updated API reference to include missing components and clean up pipeline docstrings #689

    • Reorganize API ref, and clarify pipeline sub-titles #688

    • Add and update preprocessing utils in API reference #687

    • Added inheritance diagrams to API reference #695

    • Documented which default objective AutoML optimizes for #699

    • Create seperate install page #701

    • Include more utils in API ref, like import_or_raise #704

    • Add more color to pipeline documentation #705

  • Testing Changes
    • Matched install commands of check_latest_dependencies test and it’s GitHub action #578

    • Added Github app to auto assign PR author as assignee #477

    • Removed unneeded conda installation of xgboost in windows checkin tests #618

    • Update graph tests to always use tmpfile dir #649

    • Changelog checkin test workaround for release PRs: If ‘future release’ section is empty of PR refs, pass check #658

    • Add changelog checkin test exception for dep-update branch #723

Warning

Breaking Changes

  • Pipelines will now no longer take an objective parameter during instantiation, and will no longer have an objective attribute.

  • fit() and predict() now use an optional objective parameter, which is only used in binary classification pipelines to fit for a specific objective.

  • score() will now use a required objectives parameter that is used to determine all the objectives to score on. This differs from the previous behavior, where the pipeline’s objective was scored on regardless.

  • score() will now return one dictionary of all objective scores.

  • ROC and ConfusionMatrix plot methods via Auto(*).plot have been removed by #615 and are replaced by roc_curve and confusion_matrix in evamlm.pipelines.plot_utils in #704

  • normalize_confusion_matrix has been moved to evalml.pipelines.plot_utils #704

  • Pipelines _name field changed to custom_name

  • Pipelines supported_problem_types field is removed because it is no longer necessary #678

  • Updated argument order of objectives’ objective_function to align with sklearn #698

  • pipeline.feature_importance_graph has been renamed to pipeline.graph_feature_importances in #700

  • Removed unsupported MSLE objective #704

v0.8.0 Apr. 1, 2020
  • Enhancements
    • Add normalization option and information to confusion matrix #484

    • Add util function to drop rows with NaN values #487

    • Renamed PipelineBase.name as PipelineBase.summary and redefined PipelineBase.name as class property #491

    • Added access to parameters in Pipelines with PipelineBase.parameters (used to be return of PipelineBase.describe) #501

    • Added fill_value parameter for SimpleImputer #509

    • Added functionality to override component hyperparameters and made pipelines take hyperparemeters from components #516

    • Allow numpy.random.RandomState for random_state parameters #556

  • Fixes
    • Removed unused dependency matplotlib, and move category_encoders to test reqs #572

  • Changes
    • Undo version cap in XGBoost placed in #402 and allowed all released of XGBoost #407

    • Support pandas 1.0.0 #486

    • Made all references to the logger static #503

    • Refactored model_type parameter for components and pipelines to model_family #507

    • Refactored problem_types for pipelines and components into supported_problem_types #515

    • Moved pipelines/utils.save_pipeline and pipelines/utils.load_pipeline to PipelineBase.save and PipelineBase.load #526

    • Limit number of categories encoded by OneHotEncoder #517

  • Documentation Changes
    • Updated API reference to remove PipelinePlot and added moved PipelineBase plotting methods #483

    • Add code style and github issue guides #463 #512

    • Updated API reference for to surface class variables for pipelines and components #537

    • Fixed README documentation link #535

    • Unhid PR references in changelog #656

  • Testing Changes
    • Added automated dependency check PR #482, #505

    • Updated automated dependency check comment #497

    • Have build_docs job use python executor, so that env vars are set properly #547

    • Added simple test to make sure OneHotEncoder’s top_n works with large number of categories #552

    • Run windows unit tests on PRs #557

Warning

Breaking Changes

  • AutoClassificationSearch and AutoRegressionSearch’s model_types parameter has been refactored into allowed_model_families

  • ModelTypes enum has been changed to ModelFamily

  • Components and Pipelines now have a model_family field instead of model_type

  • get_pipelines utility function now accepts model_families as an argument instead of model_types

  • PipelineBase.name no longer returns structure of pipeline and has been replaced by PipelineBase.summary

  • PipelineBase.problem_types and Estimator.problem_types has been renamed to supported_problem_types

  • pipelines/utils.save_pipeline and pipelines/utils.load_pipeline moved to PipelineBase.save and PipelineBase.load

v0.7.0 Mar. 9, 2020
  • Enhancements
    • Added emacs buffers to .gitignore #350

    • Add CatBoost (gradient-boosted trees) classification and regression components and pipelines #247

    • Added Tuner abstract base class #351

    • Added n_jobs as parameter for AutoClassificationSearch and AutoRegressionSearch #403

    • Changed colors of confusion matrix to shades of blue and updated axis order to match scikit-learn’s #426

    • Added PipelineBase .graph and .feature_importance_graph methods, moved from previous location #423

    • Added support for python 3.8 #462

  • Fixes
    • Fixed ROC and confusion matrix plots not being calculated if user passed own additional_objectives #276

    • Fixed ReadtheDocs FileNotFoundError exception for fraud dataset #439

  • Changes
    • Added n_estimators as a tunable parameter for XGBoost #307

    • Remove unused parameter ObjectiveBase.fit_needs_proba #320

    • Remove extraneous parameter component_type from all components #361

    • Remove unused rankings.csv file #397

    • Downloaded demo and test datasets so unit tests can run offline #408

    • Remove _needs_fitting attribute from Components #398

    • Changed plot.feature_importance to show only non-zero feature importances by default, added optional parameter to show all #413

    • Refactored PipelineBase to take in parameter dictionary and moved pipeline metadata to class attribute #421

    • Dropped support for Python 3.5 #438

    • Removed unused apply.py file #449

    • Clean up requirements.txt to remove unused deps #451

    • Support installation without all required dependencies #459

  • Documentation Changes
    • Update release.md with instructions to release to internal license key #354

  • Testing Changes
    • Added tests for utils (and moved current utils to gen_utils) #297

    • Moved XGBoost install into it’s own separate step on Windows using Conda #313

    • Rewind pandas version to before 1.0.0, to diagnose test failures for that version #325

    • Added dependency update checkin test #324

    • Rewind XGBoost version to before 1.0.0 to diagnose test failures for that version #402

    • Update dependency check to use a whitelist #417

    • Update unit test jobs to not install dev deps #455

Warning

Breaking Changes

  • Python 3.5 will not be actively supported.

v0.6.0 Dec. 16, 2019
  • Enhancements
    • Added ability to create a plot of feature importances #133

    • Add early stopping to AutoML using patience and tolerance parameters #241

    • Added ROC and confusion matrix metrics and plot for classification problems and introduce PipelineSearchPlots class #242

    • Enhanced AutoML results with search order #260

    • Added utility function to show system and environment information #300

  • Fixes
    • Lower botocore requirement #235

    • Fixed decision_function calculation for FraudCost objective #254

    • Fixed return value of Recall metrics #264

    • Components return self on fit #289

  • Changes
    • Renamed automl classes to AutoRegressionSearch and AutoClassificationSearch #287

    • Updating demo datasets to retain column names #223

    • Moving pipeline visualization to PipelinePlot class #228

    • Standarizing inputs as pd.Dataframe / pd.Series #130

    • Enforcing that pipelines must have an estimator as last component #277

    • Added ipywidgets as a dependency in requirements.txt #278

    • Added Random and Grid Search Tuners #240

  • Documentation Changes
    • Adding class properties to API reference #244

    • Fix and filter FutureWarnings from scikit-learn #249, #257

    • Adding Linear Regression to API reference and cleaning up some Sphinx warnings #227

  • Testing Changes
    • Added support for testing on Windows with CircleCI #226

    • Added support for doctests #233

Warning

Breaking Changes

  • The fit() method for AutoClassifier and AutoRegressor has been renamed to search().

  • AutoClassifier has been renamed to AutoClassificationSearch

  • AutoRegressor has been renamed to AutoRegressionSearch

  • AutoClassificationSearch.results and AutoRegressionSearch.results now is a dictionary with pipeline_results and search_order keys. pipeline_results can be used to access a dictionary that is identical to the old .results dictionary. Whereas, search_order returns a list of the search order in terms of pipeline_id.

  • Pipelines now require an estimator as the last component in component_list. Slicing pipelines now throws an NotImplementedError to avoid returning pipelines without an estimator.

v0.5.2 Nov. 18, 2019
  • Enhancements
    • Adding basic pipeline structure visualization #211

  • Documentation Changes
    • Added notebooks to build process #212

v0.5.1 Nov. 15, 2019
  • Enhancements
    • Added basic outlier detection guardrail #151

    • Added basic ID column guardrail #135

    • Added support for unlimited pipelines with a max_time limit #70

    • Updated .readthedocs.yaml to successfully build #188

  • Fixes
    • Removed MSLE from default additional objectives #203

    • Fixed random_state passed in pipelines #204

    • Fixed slow down in RFRegressor #206

  • Changes
    • Pulled information for describe_pipeline from pipeline’s new describe method #190

    • Refactored pipelines #108

    • Removed guardrails from Auto(*) #202, #208

  • Documentation Changes
    • Updated documentation to show max_time enhancements #189

    • Updated release instructions for RTD #193

    • Added notebooks to build process #212

    • Added contributing instructions #213

    • Added new content #222

v0.5.0 Oct. 29, 2019
  • Enhancements
    • Added basic one hot encoding #73

    • Use enums for model_type #110

    • Support for splitting regression datasets #112

    • Auto-infer multiclass classification #99

    • Added support for other units in max_time #125

    • Detect highly null columns #121

    • Added additional regression objectives #100

    • Show an interactive iteration vs. score plot when using fit() #134

  • Fixes
    • Reordered describe_pipeline #94

    • Added type check for model_type #109

    • Fixed s units when setting string max_time #132

    • Fix objectives not appearing in API documentation #150

  • Changes
    • Reorganized tests #93

    • Moved logging to its own module #119

    • Show progress bar history #111

    • Using cloudpickle instead of pickle to allow unloading of custom objectives #113

    • Removed render.py #154

  • Documentation Changes
    • Update release instructions #140

    • Include additional_objectives parameter #124

    • Added Changelog #136

  • Testing Changes
    • Code coverage #90

    • Added CircleCI tests for other Python versions #104

    • Added doc notebooks as tests #139

    • Test metadata for CircleCI and 2 core parallelism #137

v0.4.1 Sep. 16, 2019
  • Enhancements
    • Added AutoML for classification and regressor using Autobase and Skopt #7 #9

    • Implemented standard classification and regression metrics #7

    • Added logistic regression, random forest, and XGBoost pipelines #7

    • Implemented support for custom objectives #15

    • Feature importance for pipelines #18

    • Serialization for pipelines #19

    • Allow fitting on objectives for optimal threshold #27

    • Added detect label leakage #31

    • Implemented callbacks #42

    • Allow for multiclass classification #21

    • Added support for additional objectives #79

  • Fixes
    • Fixed feature selection in pipelines #13

    • Made random_seed usage consistent #45

  • Documentation Changes
    • Documentation Changes

    • Added docstrings #6

    • Created notebooks for docs #6

    • Initialized readthedocs EvalML #6

    • Added favicon #38

  • Testing Changes
    • Added testing for loading data #39

v0.2.0 Aug. 13, 2019
  • Enhancements
    • Created fraud detection objective #4

v0.1.0 July. 31, 2019
  • First Release

  • Enhancements
    • Added lead scoring objecitve #1

    • Added basic classifier #1

  • Documentation Changes
    • Initialized Sphinx for docs #1