Model Understanding#
Simply examining a model’s performance metrics is not enough to select a model and promote it for use in a production setting. While developing an ML algorithm, it is important to understand how the model behaves on the data, to examine the key factors influencing its predictions and to consider where it may be deficient. Determination of what “success” may mean for an ML project depends first and foremost on the user’s domain expertise.
EvalML includes a variety of tools for understanding models, from graphing utilities to methods for explaining predictions.
** Graphing methods on Jupyter Notebook and Jupyter Lab require ipywidgets to be installed.
** If graphing on Jupyter Lab, jupyterlab-plotly required. To download this, make sure you have npm installed.
Explaining Feature Influence#
The EvalML package offers a variety of methods for understanding which features in a dataset have an impact on the output of the model. We can investigate this either through feature importance or through permutation importance, and leverage either in generating more readable explanations.
First, let’s train a pipeline on some data.
[1]:
import evalml
from evalml.pipelines import BinaryClassificationPipeline
X, y = evalml.demos.load_breast_cancer()
X_train, X_holdout, y_train, y_holdout = evalml.preprocessing.split_data(
    X, y, problem_type="binary", test_size=0.2, random_seed=0
)
pipeline_binary = BinaryClassificationPipeline(
    component_graph={
        "Label Encoder": ["Label Encoder", "X", "y"],
        "Imputer": ["Imputer", "X", "Label Encoder.y"],
        "Random Forest Classifier": [
            "Random Forest Classifier",
            "Imputer.x",
            "Label Encoder.y",
        ],
    }
)
pipeline_binary.fit(X_train, y_train)
print(pipeline_binary.score(X_holdout, y_holdout, objectives=["log loss binary"]))
         Number of Features
Numeric                  30
Number of training examples: 569
Targets
benign       62.74%
malignant    37.26%
Name: count, dtype: object
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-evalml/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-evalml/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-evalml/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
OrderedDict([('Log Loss Binary', 0.1686746297113362)])
Feature Importance#
We can get the importance associated with each feature of the resulting pipeline
[2]:
pipeline_binary.feature_importance
[2]:
| feature | importance | |
|---|---|---|
| 0 | mean concave points | 0.138857 | 
| 1 | worst perimeter | 0.137780 | 
| 2 | worst concave points | 0.117782 | 
| 3 | worst radius | 0.100584 | 
| 4 | mean concavity | 0.086402 | 
| 5 | worst area | 0.072027 | 
| 6 | mean perimeter | 0.046500 | 
| 7 | worst concavity | 0.043408 | 
| 8 | mean radius | 0.037664 | 
| 9 | mean area | 0.033683 | 
| 10 | radius error | 0.025036 | 
| 11 | area error | 0.019324 | 
| 12 | worst texture | 0.014754 | 
| 13 | worst compactness | 0.014462 | 
| 14 | mean texture | 0.013856 | 
| 15 | worst smoothness | 0.013710 | 
| 16 | worst symmetry | 0.011395 | 
| 17 | perimeter error | 0.010284 | 
| 18 | mean compactness | 0.008162 | 
| 19 | mean smoothness | 0.008154 | 
| 20 | worst fractal dimension | 0.007034 | 
| 21 | fractal dimension error | 0.005502 | 
| 22 | compactness error | 0.004953 | 
| 23 | smoothness error | 0.004728 | 
| 24 | texture error | 0.004384 | 
| 25 | symmetry error | 0.004250 | 
| 26 | mean fractal dimension | 0.004164 | 
| 27 | concavity error | 0.004089 | 
| 28 | mean symmetry | 0.003997 | 
| 29 | concave points error | 0.003076 | 
We can also create a bar plot of the feature importances
[3]:
pipeline_binary.graph_feature_importance()
If we have a linear model, we can also view feature importance by simply inspecting the coefficients of the model.
[4]:
from evalml.model_understanding import get_linear_coefficients
pipeline_linear = BinaryClassificationPipeline(
    component_graph={
        "Label Encoder": ["Label Encoder", "X", "y"],
        "Imputer": ["Imputer", "X", "Label Encoder.y"],
        "Logistic Regression Classifier": [
            "Logistic Regression Classifier",
            "Imputer.x",
            "Label Encoder.y",
        ],
    }
)
pipeline_linear.fit(X_train, y_train)
get_linear_coefficients(pipeline_linear.estimator, features=X.columns)
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-evalml/envs/latest/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:469: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
[4]:
Intercept                 -0.339181
worst radius              -1.777283
mean radius               -1.674112
texture error             -0.740383
perimeter error           -0.288266
mean texture              -0.081338
radius error              -0.076170
mean perimeter            -0.069128
mean area                  0.002720
fractal dimension error    0.005759
smoothness error           0.006098
symmetry error             0.019005
mean fractal dimension     0.020053
worst area                 0.021615
concave points error       0.022536
compactness error          0.058227
mean smoothness            0.073213
concavity error            0.084693
mean symmetry              0.086924
worst fractal dimension    0.098952
area error                 0.115528
worst smoothness           0.126151
mean concave points        0.183110
worst texture              0.258570
worst symmetry             0.274830
worst perimeter            0.296383
mean compactness           0.308766
worst concave points       0.348138
mean concavity             0.423376
worst compactness          0.945473
worst concavity            1.189651
dtype: float64
Permutation Importance#
We can also compute and plot the permutation importance of the pipeline.
[5]:
from evalml.model_understanding import calculate_permutation_importance
calculate_permutation_importance(
    pipeline_binary, X_holdout, y_holdout, "log loss binary"
)
[5]:
| feature | importance | |
|---|---|---|
| 0 | worst perimeter | 0.063657 | 
| 1 | worst area | 0.045759 | 
| 2 | worst radius | 0.041926 | 
| 3 | mean concave points | 0.029325 | 
| 4 | worst concave points | 0.021045 | 
| 5 | worst concavity | 0.010105 | 
| 6 | worst texture | 0.010044 | 
| 7 | mean texture | 0.006178 | 
| 8 | mean symmetry | 0.005857 | 
| 9 | mean area | 0.004745 | 
| 10 | worst smoothness | 0.003190 | 
| 11 | area error | 0.003113 | 
| 12 | mean perimeter | 0.002478 | 
| 13 | mean fractal dimension | 0.001981 | 
| 14 | compactness error | 0.001968 | 
| 15 | concavity error | 0.001947 | 
| 16 | texture error | 0.000291 | 
| 17 | smoothness error | -0.000206 | 
| 18 | mean smoothness | -0.000745 | 
| 19 | fractal dimension error | -0.000835 | 
| 20 | worst compactness | -0.002392 | 
| 21 | mean concavity | -0.003188 | 
| 22 | mean compactness | -0.005377 | 
| 23 | radius error | -0.006229 | 
| 24 | mean radius | -0.006870 | 
| 25 | worst fractal dimension | -0.007415 | 
| 26 | symmetry error | -0.008175 | 
| 27 | perimeter error | -0.008980 | 
| 28 | concave points error | -0.010415 | 
| 29 | worst symmetry | -0.018645 | 
[6]:
from evalml.model_understanding import graph_permutation_importance
graph_permutation_importance(pipeline_binary, X_holdout, y_holdout, "log loss binary")
Human Readable Importance#
We can generate a more human-comprehensible understanding of either the feature or permutation importance by using readable_explanation(pipeline). This picks out a subset of features that have the highest impact on the output of the model, sorting them into either “heavily” or “somewhat” influential on the model. These features are selected either by feature importance or permutation importance with a given objective. If there are any features that actively decrease the performance of the
pipeline, this function highlights those and recommends removal.
Note that permutation importance runs on the original input features, while feature importance runs on the features as they were passed in to the final estimator, having gone through a number of preprocessing steps. The two methods will highlight different features as being important, and feature names may vary as well.
[7]:
from evalml.model_understanding import readable_explanation
readable_explanation(
    pipeline_binary,
    X_holdout,
    y_holdout,
    objective="log loss binary",
    importance_method="permutation",
)
Random Forest Classifier: The output as measured by log loss binary is heavily influenced by worst perimeter, and is somewhat influenced by worst area, worst radius, mean concave points, and worst concave points.
The features smoothness error, mean smoothness, fractal dimension error, worst compactness, mean concavity, mean compactness, radius error, mean radius, worst fractal dimension, symmetry error, perimeter error, concave points error, and worst symmetry detracted from model performance. We suggest removing these features.
[8]:
readable_explanation(
    pipeline_binary, importance_method="feature"
)  # feature importance doesn't require X and y
Random Forest Classifier: The output is somewhat influenced by mean concave points, worst perimeter, worst concave points, worst radius, and mean concavity.
We can adjust the number of most important features visible with the max_features argument, or modify the minimum threshold for “importance” with min_importance_threshold. However, these values will not affect any detrimental features displayed, as this function always displays all of them.
Metrics for Model Understanding#
Confusion Matrix#
For binary or multiclass classification, we can view a confusion matrix of the classifier’s predictions. In the DataFrame output of confusion_matrix(), the column header represents the predicted labels while row header represents the actual labels.
[9]:
from evalml.model_understanding.metrics import confusion_matrix
y_pred = pipeline_binary.predict(X_holdout)
confusion_matrix(y_holdout, y_pred)
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-evalml/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning:
Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
[9]:
| benign | malignant | |
|---|---|---|
| benign | 0.930556 | 0.069444 | 
| malignant | 0.023810 | 0.976190 | 
[10]:
from evalml.model_understanding.metrics import graph_confusion_matrix
y_pred = pipeline_binary.predict(X_holdout)
graph_confusion_matrix(y_holdout, y_pred)
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-evalml/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning:
Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
Precision-Recall Curve#
For binary classification, we can view the precision-recall curve of the pipeline.
[11]:
from evalml.model_understanding.metrics import graph_precision_recall_curve
# get the predicted probabilities associated with the "true" label
import woodwork as ww
y_encoded = y_holdout.ww.map({"benign": 0, "malignant": 1})
y_pred_proba = pipeline_binary.predict_proba(X_holdout)["malignant"]
graph_precision_recall_curve(y_encoded, y_pred_proba)
ROC Curve#
For binary and multiclass classification, we can view the Receiver Operating Characteristic (ROC) curve of the pipeline.
[12]:
from evalml.model_understanding.metrics import graph_roc_curve
# get the predicted probabilities associated with the "malignant" label
y_pred_proba = pipeline_binary.predict_proba(X_holdout)["malignant"]
graph_roc_curve(y_encoded, y_pred_proba)
The ROC curve can also be generated for multiclass classification problems. For multiclass problems, the graph will show a one-vs-many ROC curve for each class.
[13]:
from evalml.pipelines import MulticlassClassificationPipeline
X_multi, y_multi = evalml.demos.load_wine()
pipeline_multi = MulticlassClassificationPipeline(
    ["Simple Imputer", "Random Forest Classifier"]
)
pipeline_multi.fit(X_multi, y_multi)
y_pred_proba = pipeline_multi.predict_proba(X_multi)
graph_roc_curve(y_multi, y_pred_proba)
         Number of Features
Numeric                  13
Number of training examples: 178
Targets
class_1    39.89%
class_0    33.15%
class_2    26.97%
Name: count, dtype: object
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-evalml/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning:
Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-evalml/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning:
Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-evalml/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning:
Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
Visualizations#
Binary Objective Score vs. Threshold Graph#
Some binary classification objectives (objectives that have score_needs_proba set to False) are sensitive to a decision threshold. For those objectives, we can obtain and graph the scores for thresholds from zero to one, calculated at evenly-spaced intervals determined by steps.
[14]:
from evalml.model_understanding.visualizations import binary_objective_vs_threshold
binary_objective_vs_threshold(pipeline_binary, X_holdout, y_holdout, "f1", steps=10)
[14]:
| threshold | score | |
|---|---|---|
| 0 | 0.0 | 0.538462 | 
| 1 | 0.1 | 0.811881 | 
| 2 | 0.2 | 0.891304 | 
| 3 | 0.3 | 0.901099 | 
| 4 | 0.4 | 0.931818 | 
| 5 | 0.5 | 0.931818 | 
| 6 | 0.6 | 0.941176 | 
| 7 | 0.7 | 0.951220 | 
| 8 | 0.8 | 0.936709 | 
| 9 | 0.9 | 0.923077 | 
| 10 | 1.0 | 0.000000 | 
[15]:
from evalml.model_understanding.visualizations import (
    graph_binary_objective_vs_threshold,
)
graph_binary_objective_vs_threshold(
    pipeline_binary, X_holdout, y_holdout, "f1", steps=100
)
Predicted Vs Actual Values Graph for Regression Problems#
We can also create a scatterplot comparing predicted vs actual values for regression problems. We can specify an outlier_threshold to color values differently if the absolute difference between the actual and predicted values are outside of a given threshold.
[16]:
from evalml.model_understanding.visualizations import graph_prediction_vs_actual
from evalml.pipelines import RegressionPipeline
X_regress, y_regress = evalml.demos.load_diabetes()
X_train_reg, X_test_reg, y_train_reg, y_test_reg = evalml.preprocessing.split_data(
    X_regress, y_regress, problem_type="regression"
)
pipeline_regress = RegressionPipeline(["One Hot Encoder", "Linear Regressor"])
pipeline_regress.fit(X_train_reg, y_train_reg)
y_pred = pipeline_regress.predict(X_test_reg)
graph_prediction_vs_actual(y_test_reg, y_pred, outlier_threshold=50)
         Number of Features
Numeric                  10
Number of training examples: 442
Targets
72     1.36%
200    1.36%
178    1.13%
71     1.13%
90     1.13%
       ...
136    0.23%
295    0.23%
79     0.23%
25     0.23%
195    0.23%
Name: count, Length: 214, dtype: object
Tree Visualization#
Now let’s train a decision tree on some data. We can visualize the structure of the Decision Tree that was fit to that data, and save it if necessary.
[17]:
pipeline_dt = BinaryClassificationPipeline(
    ["Simple Imputer", "Decision Tree Classifier"]
)
pipeline_dt.fit(X_train, y_train)
[17]:
pipeline = BinaryClassificationPipeline(component_graph={'Simple Imputer': ['Simple Imputer', 'X', 'y'], 'Decision Tree Classifier': ['Decision Tree Classifier', 'Simple Imputer.x', 'y']}, parameters={'Simple Imputer':{'impute_strategy': 'most_frequent', 'fill_value': None}, 'Decision Tree Classifier':{'criterion': 'gini', 'max_features': 'sqrt', 'max_depth': 6, 'min_samples_split': 2, 'min_weight_fraction_leaf': 0.0}}, random_seed=0)
[18]:
from evalml.model_understanding.visualizations import visualize_decision_tree
visualize_decision_tree(
    pipeline_dt.estimator, max_depth=2, rotate=False, filled=True, filepath=None
)
[18]:
Confusion Matrix and Thresholds for Binary Classification Pipelines#
For binary classification pipelines, EvalML also provides the ability to compare the actual positive and actual negative histograms, as well as obtaining the confusion matrices and ideal thresholds per objective.
[19]:
from evalml.model_understanding import find_confusion_matrix_per_thresholds
df, objective_thresholds = find_confusion_matrix_per_thresholds(
    pipeline_binary, X, y, n_bins=10
)
df.head(10)
[19]:
| true_pos_count | true_neg_count | true_positives | true_negatives | false_positives | false_negatives | data_in_bins | |
|---|---|---|---|---|---|---|---|
| 0.1 | 1 | 309 | 211 | 309 | 48 | 1 | [19, 20, 21, 37, 46] | 
| 0.2 | 0 | 35 | 211 | 344 | 13 | 1 | [68, 92, 123, 133, 147] | 
| 0.3 | 0 | 5 | 211 | 349 | 8 | 1 | [112, 157, 484, 491, 505] | 
| 0.4 | 0 | 3 | 211 | 352 | 5 | 1 | [208, 340, 465] | 
| 0.5 | 0 | 0 | 211 | 352 | 5 | 1 | [] | 
| 0.6 | 3 | 2 | 208 | 354 | 3 | 4 | [40, 89, 128, 263, 297] | 
| 0.7 | 2 | 2 | 206 | 356 | 1 | 6 | [13, 81, 385, 421] | 
| 0.8 | 9 | 1 | 197 | 357 | 0 | 15 | [38, 41, 54, 73, 86] | 
| 0.9 | 15 | 0 | 182 | 357 | 0 | 30 | [39, 44, 91, 99, 100] | 
| 1.0 | 182 | 0 | 0 | 357 | 0 | 212 | [0, 1, 2, 3, 4] | 
[20]:
objective_thresholds
[20]:
{'accuracy': {'objective score': 0.9894551845342706, 'threshold value': 0.4},
 'balanced_accuracy': {'objective score': 0.9906387083135141,
  'threshold value': 0.4},
 'precision': {'objective score': 1.0, 'threshold value': 0.8},
 'f1': {'objective score': 0.9859813084112149, 'threshold value': 0.4}}
In the above results, the first dataframe contains the histograms for the actual positive and negative classes, indicated by true_pos_count and true_neg_count. The columns true_positives, true_negatives, false_positives, and false_negatives contain the confusion matrix information for the associated threshold, and the data_in_bins holds a random subset of row indices (both postive and negative) that belong in each bin. The index of the dataframe represents the
associated threshold. For instance, at index 0.1, there is 1 positive and 309 negative rows that fall between [0.0, 0.1].
The returned objective_thresholds dictionary has the objective measure as the key, and the dictionary value associated contains both the best objective score and the threshold that results in the associated score.
Visualize high dimensional data in lower space#
We can use T-SNE to visualize data with many features on a 2D plot, making it easier to see relationships in your data.
[21]:
# Our data is highly dimensional, we can't plot this in a way we understand
print(len(X.columns))
30
[22]:
from evalml.model_understanding import graph_t_sne
fig = graph_t_sne(X)
fig
Partial Dependence Plots#
We can calculate the one-way partial dependence plots for a feature.
[23]:
from evalml.model_understanding import partial_dependence
partial_dependence(
    pipeline_binary, X_holdout, features="mean radius", grid_resolution=5
)
[23]:
| feature_values | partial_dependence | class_label | |
|---|---|---|---|
| 0 | 9.69092 | 0.392453 | malignant | 
| 1 | 12.40459 | 0.395962 | malignant | 
| 2 | 15.11826 | 0.417396 | malignant | 
| 3 | 17.83193 | 0.429542 | malignant | 
| 4 | 20.54560 | 0.429717 | malignant | 
[24]:
from evalml.model_understanding import graph_partial_dependence
graph_partial_dependence(
    pipeline_binary, X_holdout, features="mean radius", grid_resolution=5
)
We can also compute the partial dependence for a categorical feature. We will demonstrate this on the fraud dataset.
[25]:
X_fraud, y_fraud = evalml.demos.load_fraud(100, verbose=False)
X_fraud.ww.init(
    logical_types={
        "provider": "Categorical",
        "region": "Categorical",
        "currency": "Categorical",
        "expiration_date": "Categorical",
    }
)
fraud_pipeline = BinaryClassificationPipeline(
    ["DateTime Featurizer", "One Hot Encoder", "Random Forest Classifier"]
)
fraud_pipeline.fit(X_fraud, y_fraud)
graph_partial_dependence(fraud_pipeline, X_fraud, features="provider")
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-evalml/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning:
Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-evalml/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning:
Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-evalml/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning:
Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-evalml/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning:
Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-evalml/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning:
Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-evalml/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning:
Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-evalml/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning:
Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-evalml/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning:
Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-evalml/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning:
Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-evalml/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning:
Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-evalml/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning:
Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
Two-way partial dependence plots are also possible and invoke the same API.
[26]:
partial_dependence(
    pipeline_binary,
    X_holdout,
    features=("worst perimeter", "worst radius"),
    grid_resolution=5,
)
[26]:
| 10.6876 | 14.404924999999999 | 18.12225 | 21.839575 | 25.5569 | class_label | |
|---|---|---|---|---|---|---|
| 69.140700 | 0.279038 | 0.282898 | 0.435179 | 0.435355 | 0.435355 | malignant | 
| 94.334275 | 0.304335 | 0.308194 | 0.458283 | 0.458458 | 0.458458 | malignant | 
| 119.527850 | 0.464455 | 0.468314 | 0.612137 | 0.616932 | 0.616932 | malignant | 
| 144.721425 | 0.483437 | 0.487297 | 0.631120 | 0.635915 | 0.635915 | malignant | 
| 169.915000 | 0.483437 | 0.487297 | 0.631120 | 0.635915 | 0.635915 | malignant | 
[27]:
graph_partial_dependence(
    pipeline_binary,
    X_holdout,
    features=("worst perimeter", "worst radius"),
    grid_resolution=5,
)
Explaining Predictions#
We can explain why the model made certain predictions with the explain_predictions function. This can use either the Shapley Additive Explanations (SHAP) algorithm or the Local Interpretable Model-agnostic Explanations (LIME) algorithm to identify the top features that explain the predicted value.
This function can explain both classification and regression models - all you need to do is provide the pipeline, the input features, and a list of rows corresponding to the indices of the input features you want to explain. The function will return a table that you can print summarizing the top 3 most positive and negative contributing features to the predicted value.
In the example below, we explain the prediction for the third data point in the data set. We see that the worst concave points feature increased the estimated probability that the tumor is malignant by 20% while the worst radius feature decreased the probability the tumor is malignant by 5%.
[28]:
from evalml.model_understanding.prediction_explanations import explain_predictions
table = explain_predictions(
    pipeline=pipeline_binary,
    input_features=X_holdout,
    y=None,
    indices_to_explain=[3],
    top_k_features=6,
    include_explainer_values=True,
)
print(table)
Random Forest Classifier w/ Label Encoder + Imputer
{'Label Encoder': {'positive_label': None}, 'Imputer': {'categorical_impute_strategy': 'most_frequent', 'numeric_impute_strategy': 'mean', 'boolean_impute_strategy': 'most_frequent', 'categorical_fill_value': None, 'numeric_fill_value': None, 'boolean_fill_value': None}, 'Random Forest Classifier': {'n_estimators': 100, 'max_depth': 6, 'n_jobs': -1}}
        1 of 1
                   Feature Name       Feature Value   Contribution to Prediction   SHAP Value
                =============================================================================
                  worst concavity         0.18                    -                  -0.02
                  mean concavity          0.04                    -                  -0.03
                    worst area           599.50                   -                  -0.03
                   worst radius           14.04                   -                  -0.05
                mean concave points       0.03                    -                  -0.05
                  worst perimeter         92.80                   -                  -0.06
The interpretation of the table is the same for regression problems - but the SHAP value now corresponds to the change in the estimated value of the dependent variable rather than a change in probability. For multiclass classification problems, a table will be output for each possible class.
Below is an example of how you would explain three predictions with explain_predictions.
[29]:
from evalml.model_understanding.prediction_explanations import explain_predictions
report = explain_predictions(
    pipeline=pipeline_binary,
    input_features=X_holdout,
    y=y_holdout,
    indices_to_explain=[0, 4, 9],
    include_explainer_values=True,
    output_format="text",
)
print(report)
Random Forest Classifier w/ Label Encoder + Imputer
{'Label Encoder': {'positive_label': None}, 'Imputer': {'categorical_impute_strategy': 'most_frequent', 'numeric_impute_strategy': 'mean', 'boolean_impute_strategy': 'most_frequent', 'categorical_fill_value': None, 'numeric_fill_value': None, 'boolean_fill_value': None}, 'Random Forest Classifier': {'n_estimators': 100, 'max_depth': 6, 'n_jobs': -1}}
        1 of 3
                    Feature Name       Feature Value   Contribution to Prediction   SHAP Value
                ==============================================================================
                  worst perimeter         101.20                   -                  -0.04
                worst concave points       0.06                    -                  -0.05
                mean concave points        0.01                    -                  -0.05
        2 of 3
                   Feature Name       Feature Value   Contribution to Prediction   SHAP Value
                =============================================================================
                   worst radius           11.94                   -                  -0.05
                  worst perimeter         80.78                   -                  -0.06
                mean concave points       0.02                    -                  -0.06
        3 of 3
                    Feature Name       Feature Value   Contribution to Prediction   SHAP Value
                ==============================================================================
                worst concave points       0.10                    -                  -0.05
                  worst perimeter          99.21                   -                  -0.06
                mean concave points        0.03                    -                  -0.08
The above examples used the SHAP algorithm, since that is what explain_predictions uses by default. If you would like to use LIME instead, you can change that with the algorithm="lime" argument.
[30]:
from evalml.model_understanding.prediction_explanations import explain_predictions
table = explain_predictions(
    pipeline=pipeline_binary,
    input_features=X_holdout,
    y=None,
    indices_to_explain=[3],
    top_k_features=6,
    include_explainer_values=True,
    algorithm="lime",
)
print(table)
Random Forest Classifier w/ Label Encoder + Imputer
{'Label Encoder': {'positive_label': None}, 'Imputer': {'categorical_impute_strategy': 'most_frequent', 'numeric_impute_strategy': 'mean', 'boolean_impute_strategy': 'most_frequent', 'categorical_fill_value': None, 'numeric_fill_value': None, 'boolean_fill_value': None}, 'Random Forest Classifier': {'n_estimators': 100, 'max_depth': 6, 'n_jobs': -1}}
        1 of 1
                    Feature Name       Feature Value   Contribution to Prediction   LIME Value
                ==============================================================================
                    worst radius           14.04                   +                   0.06
                  worst perimeter          92.80                   +                   0.06
                     worst area           599.50                   +                   0.05
                mean concave points        0.03                    +                   0.04
                worst concave points       0.12                    +                   0.04
                  worst concavity          0.18                    +                   0.03
[31]:
from evalml.model_understanding.prediction_explanations import explain_predictions
report = explain_predictions(
    pipeline=pipeline_binary,
    input_features=X_holdout,
    y=None,
    indices_to_explain=[0, 4, 9],
    include_explainer_values=True,
    output_format="text",
    algorithm="lime",
)
print(report)
Random Forest Classifier w/ Label Encoder + Imputer
{'Label Encoder': {'positive_label': None}, 'Imputer': {'categorical_impute_strategy': 'most_frequent', 'numeric_impute_strategy': 'mean', 'boolean_impute_strategy': 'most_frequent', 'categorical_fill_value': None, 'numeric_fill_value': None, 'boolean_fill_value': None}, 'Random Forest Classifier': {'n_estimators': 100, 'max_depth': 6, 'n_jobs': -1}}
        1 of 3
                 Feature Name     Feature Value   Contribution to Prediction   LIME Value
                =========================================================================
                 worst radius         15.14                   +                   0.06
                worst perimeter      101.20                   +                   0.06
                  worst area         718.90                   +                   0.05
        2 of 3
                 Feature Name     Feature Value   Contribution to Prediction   LIME Value
                =========================================================================
                worst perimeter       80.78                   +                   0.06
                 worst radius         11.94                   +                   0.06
                  worst area         433.10                   +                   0.05
        3 of 3
                 Feature Name     Feature Value   Contribution to Prediction   LIME Value
                =========================================================================
                worst perimeter       99.21                   +                   0.06
                 worst radius         14.42                   +                   0.06
                  worst area         634.30                   +                   0.05
Explaining Best and Worst Predictions#
When debugging machine learning models, it is often useful to analyze the best and worst predictions the model made. The explain_predictions_best_worst function can help us with this.
This function will display the output of explain_predictions for the best 2 and worst 2 predictions. By default, the best and worst predictions are determined by the absolute error for regression problems and cross entropy for classification problems.
We can specify our own ranking function by passing in a function to the metric parameter. This function will be called on y_true and y_pred. By convention, lower scores are better.
At the top of each table, we can see the predicted probabilities, target value, error, and row index for that prediction. For a regression problem, we would see the predicted value instead of predicted probabilities.
[32]:
from evalml.model_understanding.prediction_explanations import (
    explain_predictions_best_worst,
)
shap_report = explain_predictions_best_worst(
    pipeline=pipeline_binary,
    input_features=X_holdout,
    y_true=y_holdout,
    include_explainer_values=True,
    top_k_features=6,
    num_to_explain=2,
)
print(shap_report)
Random Forest Classifier w/ Label Encoder + Imputer
{'Label Encoder': {'positive_label': None}, 'Imputer': {'categorical_impute_strategy': 'most_frequent', 'numeric_impute_strategy': 'mean', 'boolean_impute_strategy': 'most_frequent', 'categorical_fill_value': None, 'numeric_fill_value': None, 'boolean_fill_value': None}, 'Random Forest Classifier': {'n_estimators': 100, 'max_depth': 6, 'n_jobs': -1}}
        Best 1 of 2
                Predicted Probabilities: [benign: 1.0, malignant: 0.0]
                Predicted Value: benign
                Target Value: benign
                Cross Entropy: 0.0
                Index ID: 502
                    Feature Name       Feature Value   Contribution to Prediction   SHAP Value
                ==============================================================================
                   mean concavity          0.06                    -                  -0.03
                     worst area           552.00                   -                  -0.03
                worst concave points       0.08                    -                  -0.05
                    worst radius           13.57                   -                  -0.05
                mean concave points        0.03                    -                  -0.05
                  worst perimeter          86.67                   -                  -0.06
        Best 2 of 2
                Predicted Probabilities: [benign: 1.0, malignant: 0.0]
                Predicted Value: benign
                Target Value: benign
                Cross Entropy: 0.0
                Index ID: 313
                    Feature Name       Feature Value   Contribution to Prediction   SHAP Value
                ==============================================================================
                  worst concavity          0.08                    -                  -0.02
                     worst area           467.80                   -                  -0.03
                    worst radius           12.34                   -                  -0.04
                worst concave points       0.05                    -                  -0.04
                mean concave points        0.01                    -                  -0.05
                  worst perimeter          81.23                   -                  -0.05
        Worst 1 of 2
                Predicted Probabilities: [benign: 0.266, malignant: 0.734]
                Predicted Value: malignant
                Target Value: benign
                Cross Entropy: 1.325
                Index ID: 363
                 Feature Name     Feature Value   Contribution to Prediction   SHAP Value
                =========================================================================
                worst perimeter      117.20                   +                   0.13
                 worst radius         18.13                   +                   0.12
                  worst area         1009.00                  +                   0.11
                   mean area         838.10                   +                   0.06
                  mean radius         16.50                   +                   0.05
                worst concavity       0.17                    -                  -0.05
        Worst 2 of 2
                Predicted Probabilities: [benign: 1.0, malignant: 0.0]
                Predicted Value: benign
                Target Value: malignant
                Cross Entropy: 7.987
                Index ID: 135
                    Feature Name       Feature Value   Contribution to Prediction   SHAP Value
                ==============================================================================
                   mean concavity          0.05                    -                  -0.03
                     worst area           653.60                   -                  -0.04
                worst concave points       0.09                    -                  -0.05
                    worst radius           14.49                   -                  -0.05
                  worst perimeter          92.04                   -                  -0.06
                mean concave points        0.03                    -                  -0.06
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-evalml/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning:
Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-evalml/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning:
Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
[33]:
lime_report = explain_predictions_best_worst(
    pipeline=pipeline_binary,
    input_features=X_holdout,
    y_true=y_holdout,
    include_explainer_values=True,
    top_k_features=6,
    num_to_explain=2,
    algorithm="lime",
)
print(lime_report)
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-evalml/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning:
Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-evalml/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning:
Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
Random Forest Classifier w/ Label Encoder + Imputer
{'Label Encoder': {'positive_label': None}, 'Imputer': {'categorical_impute_strategy': 'most_frequent', 'numeric_impute_strategy': 'mean', 'boolean_impute_strategy': 'most_frequent', 'categorical_fill_value': None, 'numeric_fill_value': None, 'boolean_fill_value': None}, 'Random Forest Classifier': {'n_estimators': 100, 'max_depth': 6, 'n_jobs': -1}}
        Best 1 of 2
                Predicted Probabilities: [benign: 1.0, malignant: 0.0]
                Predicted Value: benign
                Target Value: benign
                Cross Entropy: 0.0
                Index ID: 502
                    Feature Name       Feature Value   Contribution to Prediction   LIME Value
                ==============================================================================
                    worst radius           13.57                   +                   0.06
                  worst perimeter          86.67                   +                   0.06
                     worst area           552.00                   +                   0.05
                mean concave points        0.03                    +                   0.05
                worst concave points       0.08                    +                   0.05
                  worst concavity          0.19                    +                   0.03
        Best 2 of 2
                Predicted Probabilities: [benign: 1.0, malignant: 0.0]
                Predicted Value: benign
                Target Value: benign
                Cross Entropy: 0.0
                Index ID: 313
                    Feature Name       Feature Value   Contribution to Prediction   LIME Value
                ==============================================================================
                  worst perimeter          81.23                   +                   0.06
                    worst radius           12.34                   +                   0.06
                     worst area           467.80                   +                   0.05
                mean concave points        0.01                    +                   0.04
                worst concave points       0.05                    +                   0.04
                  worst concavity          0.08                    +                   0.03
        Worst 1 of 2
                Predicted Probabilities: [benign: 0.266, malignant: 0.734]
                Predicted Value: malignant
                Target Value: benign
                Cross Entropy: 1.325
                Index ID: 363
                    Feature Name       Feature Value   Contribution to Prediction   LIME Value
                ==============================================================================
                  worst concavity          0.17                    -                  -0.03
                worst concave points       0.09                    -                  -0.04
                mean concave points        0.05                    -                  -0.04
                     worst area           1009.00                  -                  -0.05
                    worst radius           18.13                   -                  -0.06
                  worst perimeter         117.20                   -                  -0.06
        Worst 2 of 2
                Predicted Probabilities: [benign: 1.0, malignant: 0.0]
                Predicted Value: benign
                Target Value: malignant
                Cross Entropy: 7.987
                Index ID: 135
                    Feature Name       Feature Value   Contribution to Prediction   LIME Value
                ==============================================================================
                  worst perimeter          92.04                   +                   0.06
                    worst radius           14.49                   +                   0.06
                     worst area           653.60                   +                   0.05
                mean concave points        0.03                    +                   0.04
                worst concave points       0.09                    +                   0.04
                  worst concavity          0.22                    +                   0.03
We use a custom metric (hinge loss) for selecting the best and worst predictions. See this example:
[34]:
import numpy as np
def hinge_loss(y_true, y_pred_proba):
    probabilities = np.clip(y_pred_proba.iloc[:, 1], 0.001, 0.999)
    y_true[y_true == 0] = -1
    return np.clip(
        1 - y_true * np.log(probabilities / (1 - probabilities)), a_min=0, a_max=None
    )
report = explain_predictions_best_worst(
    pipeline=pipeline_binary,
    input_features=X,
    y_true=y,
    include_explainer_values=True,
    num_to_explain=5,
    metric=hinge_loss,
)
print(report)
Random Forest Classifier w/ Label Encoder + Imputer
{'Label Encoder': {'positive_label': None}, 'Imputer': {'categorical_impute_strategy': 'most_frequent', 'numeric_impute_strategy': 'mean', 'boolean_impute_strategy': 'most_frequent', 'categorical_fill_value': None, 'numeric_fill_value': None, 'boolean_fill_value': None}, 'Random Forest Classifier': {'n_estimators': 100, 'max_depth': 6, 'n_jobs': -1}}
        Best 1 of 5
                Predicted Probabilities: [benign: 1.0, malignant: 0.0]
                Predicted Value: benign
                Target Value: benign
                hinge_loss: 0.0
                Index ID: 381
                   Feature Name       Feature Value   Contribution to Prediction   SHAP Value
                =============================================================================
                   worst radius           12.09                   -                  -0.04
                mean concave points       0.02                    -                  -0.05
                  worst perimeter         79.73                   -                  -0.06
        Best 2 of 5
                Predicted Probabilities: [benign: 0.0, malignant: 1.0]
                Predicted Value: malignant
                Target Value: malignant
                hinge_loss: 0.0
                Index ID: 373
                    Feature Name       Feature Value   Contribution to Prediction   SHAP Value
                ==============================================================================
                  worst perimeter         166.80                   +                   0.10
                worst concave points       0.21                    +                   0.08
                mean concave points        0.09                    +                   0.08
        Best 3 of 5
                Predicted Probabilities: [benign: 0.999, malignant: 0.001]
                Predicted Value: benign
                Target Value: benign
                hinge_loss: 0.0
                Index ID: 374
                    Feature Name       Feature Value   Contribution to Prediction   SHAP Value
                ==============================================================================
                worst concave points       0.07                    -                  -0.05
                  worst perimeter          99.16                   -                  -0.05
                mean concave points        0.02                    -                  -0.05
        Best 4 of 5
                Predicted Probabilities: [benign: 0.888, malignant: 0.112]
                Predicted Value: benign
                Target Value: benign
                hinge_loss: 0.0
                Index ID: 375
                 Feature Name     Feature Value   Contribution to Prediction   SHAP Value
                =========================================================================
                worst concavity       0.21                    -                  -0.04
                mean concavity        0.07                    -                  -0.05
                 worst texture        19.14                   -                  -0.07
        Best 5 of 5
                Predicted Probabilities: [benign: 0.915, malignant: 0.085]
                Predicted Value: benign
                Target Value: benign
                hinge_loss: 0.0
                Index ID: 376
                 Feature Name     Feature Value   Contribution to Prediction   SHAP Value
                =========================================================================
                  worst area         351.90                   -                  -0.07
                 worst radius         10.85                   -                  -0.07
                worst perimeter       76.51                   -                  -0.10
        Worst 1 of 5
                Predicted Probabilities: [benign: 0.409, malignant: 0.591]
                Predicted Value: malignant
                Target Value: benign
                hinge_loss: 1.369
                Index ID: 128
                    Feature Name       Feature Value   Contribution to Prediction   SHAP Value
                ==============================================================================
                mean concave points        0.09                    +                   0.10
                worst concave points       0.14                    +                   0.09
                   mean concavity          0.11                    +                   0.08
        Worst 2 of 5
                Predicted Probabilities: [benign: 0.39, malignant: 0.61]
                Predicted Value: malignant
                Target Value: benign
                hinge_loss: 1.446
                Index ID: 421
                   Feature Name       Feature Value   Contribution to Prediction   SHAP Value
                =============================================================================
                mean concave points       0.06                    +                   0.08
                  mean concavity          0.14                    +                   0.07
                  worst perimeter        114.10                   +                   0.07
        Worst 3 of 5
                Predicted Probabilities: [benign: 0.343, malignant: 0.657]
                Predicted Value: malignant
                Target Value: benign
                hinge_loss: 1.652
                Index ID: 81
                    Feature Name       Feature Value   Contribution to Prediction   SHAP Value
                ==============================================================================
                worst concave points       0.17                    ++                  0.15
                mean concave points        0.07                    +                   0.11
                 worst compactness         0.48                    +                   0.07
        Worst 4 of 5
                Predicted Probabilities: [benign: 0.266, malignant: 0.734]
                Predicted Value: malignant
                Target Value: benign
                hinge_loss: 2.016
                Index ID: 363
                 Feature Name     Feature Value   Contribution to Prediction   SHAP Value
                =========================================================================
                worst perimeter      117.20                   +                   0.13
                 worst radius         18.13                   +                   0.12
                  worst area         1009.00                  +                   0.11
        Worst 5 of 5
                Predicted Probabilities: [benign: 1.0, malignant: 0.0]
                Predicted Value: benign
                Target Value: malignant
                hinge_loss: 7.907
                Index ID: 135
                   Feature Name       Feature Value   Contribution to Prediction   SHAP Value
                =============================================================================
                   worst radius           14.49                   -                  -0.05
                  worst perimeter         92.04                   -                  -0.06
                mean concave points       0.03                    -                  -0.06
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-evalml/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning:
Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-evalml/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning:
Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
Changing Output Formats#
Instead of getting the prediction explanations as text, you can get the report as a python dictionary or pandas dataframe. All you have to do is pass output_format="dict" or output_format="dataframe" to either explain_prediction, explain_predictions, or explain_predictions_best_worst.
Single prediction as a dictionary#
[35]:
import json
single_prediction_report = explain_predictions(
    pipeline=pipeline_binary,
    input_features=X_holdout,
    indices_to_explain=[3],
    y=y_holdout,
    top_k_features=6,
    include_explainer_values=True,
    output_format="dict",
)
print(json.dumps(single_prediction_report, indent=2))
{
  "explanations": [
    {
      "explanations": [
        {
          "feature_names": [
            "worst concavity",
            "mean concavity",
            "worst area",
            "worst radius",
            "mean concave points",
            "worst perimeter"
          ],
          "feature_values": [
            0.1791,
            0.038,
            599.5,
            14.04,
            0.034,
            92.8
          ],
          "qualitative_explanation": [
            "-",
            "-",
            "-",
            "-",
            "-",
            "-"
          ],
          "quantitative_explanation": [
            -0.023008481104309524,
            -0.02621982146725469,
            -0.033821592020020774,
            -0.04666659740586632,
            -0.0541511910494414,
            -0.05523688273171911
          ],
          "drill_down": {},
          "class_name": "malignant",
          "expected_value": 0.3711208791208791
        }
      ]
    }
  ]
}
Single prediction as a dataframe#
[36]:
single_prediction_report = explain_predictions(
    pipeline=pipeline_binary,
    input_features=X_holdout,
    indices_to_explain=[3],
    y=y_holdout,
    top_k_features=6,
    include_explainer_values=True,
    output_format="dataframe",
)
single_prediction_report
[36]:
| feature_names | feature_values | qualitative_explanation | quantitative_explanation | class_name | prediction_number | |
|---|---|---|---|---|---|---|
| 0 | worst concavity | 0.1791 | - | -0.023008 | malignant | 0 | 
| 1 | mean concavity | 0.0380 | - | -0.026220 | malignant | 0 | 
| 2 | worst area | 599.5000 | - | -0.033822 | malignant | 0 | 
| 3 | worst radius | 14.0400 | - | -0.046667 | malignant | 0 | 
| 4 | mean concave points | 0.0340 | - | -0.054151 | malignant | 0 | 
| 5 | worst perimeter | 92.8000 | - | -0.055237 | malignant | 0 | 
Best and worst predictions as a dictionary#
[37]:
report = explain_predictions_best_worst(
    pipeline=pipeline_binary,
    input_features=X,
    y_true=y,
    num_to_explain=1,
    top_k_features=6,
    include_explainer_values=True,
    output_format="dict",
)
print(json.dumps(report, indent=2))
{
  "explanations": [
    {
      "rank": {
        "prefix": "best",
        "index": 1
      },
      "predicted_values": {
        "probabilities": {
          "benign": 1.0,
          "malignant": 0.0
        },
        "predicted_value": "benign",
        "target_value": "benign",
        "error_name": "Cross Entropy",
        "error_value": 0.0001970443507070075,
        "index_id": 52
      },
      "explanations": [
        {
          "feature_names": [
            "mean concavity",
            "worst area",
            "worst radius",
            "worst concave points",
            "mean concave points",
            "worst perimeter"
          ],
          "feature_values": [
            0.01972,
            527.2,
            13.1,
            0.06296,
            0.01349,
            83.67
          ],
          "qualitative_explanation": [
            "-",
            "-",
            "-",
            "-",
            "-",
            "-"
          ],
          "quantitative_explanation": [
            -0.024450176040601602,
            -0.03373367604833195,
            -0.042905917251496686,
            -0.04393174846277656,
            -0.050938583943217694,
            -0.06002768963828602
          ],
          "drill_down": {},
          "class_name": "malignant",
          "expected_value": 0.3711208791208791
        }
      ]
    },
    {
      "rank": {
        "prefix": "worst",
        "index": 1
      },
      "predicted_values": {
        "probabilities": {
          "benign": 1.0,
          "malignant": 0.0
        },
        "predicted_value": "benign",
        "target_value": "malignant",
        "error_name": "Cross Entropy",
        "error_value": 7.986911819330411,
        "index_id": 135
      },
      "explanations": [
        {
          "feature_names": [
            "mean concavity",
            "worst area",
            "worst concave points",
            "worst radius",
            "worst perimeter",
            "mean concave points"
          ],
          "feature_values": [
            0.04711,
            653.6,
            0.09331,
            14.49,
            92.04,
            0.02704
          ],
          "qualitative_explanation": [
            "-",
            "-",
            "-",
            "-",
            "-",
            "-"
          ],
          "quantitative_explanation": [
            -0.029936744551331215,
            -0.03748357654576422,
            -0.04553126236476177,
            -0.0483274199182721,
            -0.06039220265366764,
            -0.060441902449258976
          ],
          "drill_down": {},
          "class_name": "malignant",
          "expected_value": 0.3711208791208791
        }
      ]
    }
  ]
}
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-evalml/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning:
Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-evalml/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning:
Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
Best and worst predictions as a dataframe#
[38]:
report = explain_predictions_best_worst(
    pipeline=pipeline_binary,
    input_features=X_holdout,
    y_true=y_holdout,
    num_to_explain=1,
    top_k_features=6,
    include_explainer_values=True,
    output_format="dataframe",
)
report
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-evalml/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning:
Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-evalml/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning:
Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
[38]:
| feature_names | feature_values | qualitative_explanation | quantitative_explanation | class_name | label_benign_probability | label_malignant_probability | predicted_value | target_value | error_name | error_value | index_id | rank | prefix | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | mean concavity | 0.05928 | - | -0.029022 | malignant | 1.0 | 0.0 | benign | benign | Cross Entropy | 0.000197 | 502 | 1 | best | 
| 1 | worst area | 552.00000 | - | -0.034112 | malignant | 1.0 | 0.0 | benign | benign | Cross Entropy | 0.000197 | 502 | 1 | best | 
| 2 | worst concave points | 0.08411 | - | -0.046896 | malignant | 1.0 | 0.0 | benign | benign | Cross Entropy | 0.000197 | 502 | 1 | best | 
| 3 | worst radius | 13.57000 | - | -0.046928 | malignant | 1.0 | 0.0 | benign | benign | Cross Entropy | 0.000197 | 502 | 1 | best | 
| 4 | mean concave points | 0.03279 | - | -0.052902 | malignant | 1.0 | 0.0 | benign | benign | Cross Entropy | 0.000197 | 502 | 1 | best | 
| 5 | worst perimeter | 86.67000 | - | -0.064320 | malignant | 1.0 | 0.0 | benign | benign | Cross Entropy | 0.000197 | 502 | 1 | best | 
| 6 | mean concavity | 0.04711 | - | -0.029937 | malignant | 1.0 | 0.0 | benign | malignant | Cross Entropy | 7.986912 | 135 | 1 | worst | 
| 7 | worst area | 653.60000 | - | -0.037484 | malignant | 1.0 | 0.0 | benign | malignant | Cross Entropy | 7.986912 | 135 | 1 | worst | 
| 8 | worst concave points | 0.09331 | - | -0.045531 | malignant | 1.0 | 0.0 | benign | malignant | Cross Entropy | 7.986912 | 135 | 1 | worst | 
| 9 | worst radius | 14.49000 | - | -0.048327 | malignant | 1.0 | 0.0 | benign | malignant | Cross Entropy | 7.986912 | 135 | 1 | worst | 
| 10 | worst perimeter | 92.04000 | - | -0.060392 | malignant | 1.0 | 0.0 | benign | malignant | Cross Entropy | 7.986912 | 135 | 1 | worst | 
| 11 | mean concave points | 0.02704 | - | -0.060442 | malignant | 1.0 | 0.0 | benign | malignant | Cross Entropy | 7.986912 | 135 | 1 | worst | 
Force Plots#
Force plots can be generated to predict single or multiple rows for binary, multiclass and regression problem types. These use the SHAP algorithm. Here’s an example of predicting a single row on a binary classification dataset. The force plots show the predictive power of each of the features in making the negative (“Class: 0”) prediction and the positive (“Class: 1”) prediction.
[39]:
from evalml.model_understanding.force_plots import graph_force_plot
rows_to_explain = [0]  # Should be a list of integer indices of the rows to explain.
results = graph_force_plot(
    pipeline_binary,
    rows_to_explain=rows_to_explain,
    training_data=X_holdout,
    y=y_holdout,
)
for result in results:
    for cls in result:
        print("Class:", cls)
        display(result[cls]["plot"])
Class: malignant
Have you run `initjs()` in this notebook? If this notebook was from another user you must also trust this notebook (File -> Trust notebook). If you are viewing this notebook on github the Javascript has been stripped for security. If you are using JupyterLab this error is because a JupyterLab extension has not yet been written.
Here’s an example of a force plot explaining multiple predictions on a multiclass problem. These plots show the force plots for each row arranged as consecutive columns that can be ordered by the dropdown above. Clicking the column indicates which row explanation is underneath.
[40]:
rows_to_explain = [
    0,
    1,
    2,
    3,
    4,
]  # Should be a list of integer indices of the rows to explain.
results = graph_force_plot(
    pipeline_multi, rows_to_explain=rows_to_explain, training_data=X_multi, y=y_multi
)
for idx, result in enumerate(results):
    print("Row:", idx)
    for cls in result:
        print("Class:", cls)
        display(result[cls]["plot"])
Row: 0
Class: class_0
Have you run `initjs()` in this notebook? If this notebook was from another user you must also trust this notebook (File -> Trust notebook). If you are viewing this notebook on github the Javascript has been stripped for security. If you are using JupyterLab this error is because a JupyterLab extension has not yet been written.
Class: class_1
Have you run `initjs()` in this notebook? If this notebook was from another user you must also trust this notebook (File -> Trust notebook). If you are viewing this notebook on github the Javascript has been stripped for security. If you are using JupyterLab this error is because a JupyterLab extension has not yet been written.
Class: class_2
Have you run `initjs()` in this notebook? If this notebook was from another user you must also trust this notebook (File -> Trust notebook). If you are viewing this notebook on github the Javascript has been stripped for security. If you are using JupyterLab this error is because a JupyterLab extension has not yet been written.
Row: 1
Class: class_0
Have you run `initjs()` in this notebook? If this notebook was from another user you must also trust this notebook (File -> Trust notebook). If you are viewing this notebook on github the Javascript has been stripped for security. If you are using JupyterLab this error is because a JupyterLab extension has not yet been written.
Class: class_1
Have you run `initjs()` in this notebook? If this notebook was from another user you must also trust this notebook (File -> Trust notebook). If you are viewing this notebook on github the Javascript has been stripped for security. If you are using JupyterLab this error is because a JupyterLab extension has not yet been written.
Class: class_2
Have you run `initjs()` in this notebook? If this notebook was from another user you must also trust this notebook (File -> Trust notebook). If you are viewing this notebook on github the Javascript has been stripped for security. If you are using JupyterLab this error is because a JupyterLab extension has not yet been written.
Row: 2
Class: class_0
Have you run `initjs()` in this notebook? If this notebook was from another user you must also trust this notebook (File -> Trust notebook). If you are viewing this notebook on github the Javascript has been stripped for security. If you are using JupyterLab this error is because a JupyterLab extension has not yet been written.
Class: class_1
Have you run `initjs()` in this notebook? If this notebook was from another user you must also trust this notebook (File -> Trust notebook). If you are viewing this notebook on github the Javascript has been stripped for security. If you are using JupyterLab this error is because a JupyterLab extension has not yet been written.
Class: class_2
Have you run `initjs()` in this notebook? If this notebook was from another user you must also trust this notebook (File -> Trust notebook). If you are viewing this notebook on github the Javascript has been stripped for security. If you are using JupyterLab this error is because a JupyterLab extension has not yet been written.
Row: 3
Class: class_0
Have you run `initjs()` in this notebook? If this notebook was from another user you must also trust this notebook (File -> Trust notebook). If you are viewing this notebook on github the Javascript has been stripped for security. If you are using JupyterLab this error is because a JupyterLab extension has not yet been written.
Class: class_1
Have you run `initjs()` in this notebook? If this notebook was from another user you must also trust this notebook (File -> Trust notebook). If you are viewing this notebook on github the Javascript has been stripped for security. If you are using JupyterLab this error is because a JupyterLab extension has not yet been written.
Class: class_2
Have you run `initjs()` in this notebook? If this notebook was from another user you must also trust this notebook (File -> Trust notebook). If you are viewing this notebook on github the Javascript has been stripped for security. If you are using JupyterLab this error is because a JupyterLab extension has not yet been written.
Row: 4
Class: class_0
Have you run `initjs()` in this notebook? If this notebook was from another user you must also trust this notebook (File -> Trust notebook). If you are viewing this notebook on github the Javascript has been stripped for security. If you are using JupyterLab this error is because a JupyterLab extension has not yet been written.
Class: class_1
Have you run `initjs()` in this notebook? If this notebook was from another user you must also trust this notebook (File -> Trust notebook). If you are viewing this notebook on github the Javascript has been stripped for security. If you are using JupyterLab this error is because a JupyterLab extension has not yet been written.
Class: class_2
Have you run `initjs()` in this notebook? If this notebook was from another user you must also trust this notebook (File -> Trust notebook). If you are viewing this notebook on github the Javascript has been stripped for security. If you are using JupyterLab this error is because a JupyterLab extension has not yet been written.