Simply examining a model’s performance metrics is not enough to select a model and promote it for use in a production setting. While developing an ML algorithm, it is important to understand how the model behaves on the data, to examine the key factors influencing its predictions and to consider where it may be deficient. Determination of what “success” may mean for an ML project depends first and foremost on the user’s domain expertise.
EvalML includes a variety of tools for understanding models.
First, let’s train a pipeline on some data.
[1]:
import evalml class RFBinaryClassificationPipeline(evalml.pipelines.BinaryClassificationPipeline): component_graph = ['Simple Imputer', 'Random Forest Classifier'] X, y = evalml.demos.load_breast_cancer() pipeline = RFBinaryClassificationPipeline({}) pipeline.fit(X, y) print(pipeline.score(X, y, objectives=['log_loss_binary']))
2020-08-06 20:16:51,058 featuretools - WARNING Featuretools failed to load plugin nlp_primitives from library nlp_primitives. For a full stack trace, set logging to debug.
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-evalml/envs/v0.12.2/lib/python3.7/site-packages/evalml/pipelines/components/transformers/preprocessing/text_featurizer.py:31: RuntimeWarning: No text columns were given to TextFeaturizer, component will have no effect warnings.warn("No text columns were given to TextFeaturizer, component will have no effect", RuntimeWarning)
OrderedDict([('Log Loss Binary', 0.038403828027876195)])
We can get the importance associated with each feature of the resulting pipeline
[2]:
pipeline.feature_importance
We can also create a bar plot of the feature importances
[3]:
pipeline.graph_feature_importance()
We can also compute and plot the permutation importance of the pipeline.
[4]:
evalml.pipelines.calculate_permutation_importance(pipeline, X, y, 'log_loss_binary')
[5]:
evalml.pipelines.graph_permutation_importance(pipeline, X, y, 'log_loss_binary')
For binary or multiclass classification, we can view a confusion matrix of the classifier’s predictions
[6]:
y_pred = pipeline.predict(X) evalml.pipelines.graph_utils.graph_confusion_matrix(y, y_pred)
For binary classification, we can view the precision-recall curve of the pipeline.
[7]:
# get the predicted probabilities associated with the "true" label y = y.map({'malignant': 0, 'benign': 1}) y_pred_proba = pipeline.predict_proba(X)["benign"] evalml.pipelines.graph_utils.graph_precision_recall_curve(y, y_pred_proba)
For binary and multiclass classification, we can view the Receiver Operating Characteristic (ROC) curve of the pipeline.
[8]:
# get the predicted probabilities associated with the "benign" label y_pred_proba = pipeline.predict_proba(X)["benign"] evalml.pipelines.graph_utils.graph_roc_curve(y, y_pred_proba)
We can explain why the model made an individual prediction with the explain_prediction function. This will use the Shapley Additive Explanations (SHAP) algorithms to identify the top features that explain the predicted value.
explain_prediction
This function can explain both classification and regression models - all you need to do is provide the pipeline, the input features (must correspond to one row of the input data) and the training data. The function will return a table that you can print summarizing the top 3 most positive and negative contributing features to the predicted value.
In the example below, we explain the prediction for the third data point in the data set. We see that the worst concave points feature increased the estimated probability that the tumor is malignant by 20% while the worst radius feature decreased the probability the tumor is malignant by 5%.
worst concave points
worst radius
[9]:
from evalml.pipelines.prediction_explanations import explain_prediction table = explain_prediction(pipeline=pipeline, input_features=X.iloc[3:4], training_data=X, include_shap_values=True) print(table)
Positive Label Feature Name Contribution to Prediction SHAP Value ============================================================== worst concave points ++ 0.200 mean concave points + 0.110 mean concavity + 0.080 worst area - -0.030 worst perimeter - -0.050 worst radius - -0.050
The interpretation of the table is the same for regression problems - but the SHAP value now corresponds to the change in the estimated value of the dependent variable rather than a change in probability. For multiclass classification problems, a table will be output for each possible class.
This functionality is currently not supported for XGBoost models or CatBoost multiclass classifiers.