EvalML optimizes machine learning pipelines on custom practical objectives instead of vague machine learning loss functions so that it will find the best pipelines for your specific needs. Furthermore, EvalML pipelines are able to take in all kinds of data (missing values, categorical, etc.) as long as the data are in a single table. EvalML also allows you to build your own pipelines with existing or custom components so you can have more control over the AutoML process. Moreover, EvalML also provides you with support in the form of data checks to ensure that you are aware of potential issues your data may cause with machine learning algorithms.
EvalML contains imputation components in its pipelines so that missing values are taken care of. EvalML optimizes over different types of imputation to search for the best possible pipeline. You can find more information about components here and in the API reference here.
EvalML provides a one-hot-encoding component in its pipelines for categorical variables. EvalML plans to support other encoders in the future.
EvalML currently utilizes scikit-learn’s SelectFromModel with a Random Forest classifier/regressor to handle feature selection. EvalML plans on supporting more feature selectors in the future. You can find more information in the API reference here.
Feature importance depends on the estimator used. Variable coefficients are used for regression-based estimators (Logistic Regression and Linear Regression) and Gini importance is used for tree-based estimators (Random Forest and XGBoost).
EvalML tunes hyperparameters for its pipelines through Bayesian optimization. In the future we plan to support more optimization techniques such as random search.
Yes you can! You can create your own custom objective so that EvalML optimizes the best model for your needs.
EvalML provides data checks to combat overfitting. Such data checks include detecting label leakage, unstable pipelines, hold-out datasets and cross validation. EvalML defaults to using Stratified K-Fold cross-validation for classification problems and K-Fold cross-validation for regression problems but allows you to utilize your own cross-validation methods as well.
Yes! EvalML allows you to create custom pipelines using modular components. This allows you to customize EvalML pipelines for your own needs or for AutoML.
EvalML is constantly improving and adding new components and will allow your own algorithms to be used as components in our pipelines.