A detailed walk-through of SHAP example for interpretable machine learning

7 min readSep 8, 2019

Recently, I have been researching on feature importance methods which are hopefully robust and accurate for interpretability. In the past, I have been using LASSO, Gini importance, random permutation of features to determine important features for my machine learning tasks. Surprisingly, there are huge discrepancies between the highly ranked features for each algorithm, therefore casting doubts on which algorithm should I trust and how much should I trust.

On the quest of understanding more about this topic, I stumbled upon SHapley Additive exPlanations (SHAP) published by S. Lundberg et.al.. SHAP is a general framework for interpreting your machine learning model for single prediction task (i.e. for a prediction, how each feature is affecting it) and global feature importance (a set of samples). The former task is especially important for individual interpretability and explaining to stakeholders why the machine learning model that you have, predicts in a certain way. For example, income prediction. Age might affect the model’s prediction more for certain individuals whereas education might matter more to others. Any of the methods which I have been using in the past are not suited for this task. This “local interpretability” and its theoretical foundation stemming from game theory presents itself as a compelling technique for data scientists or machine learning practitioners.

The intention of this post is not to summarize and explain the paper as this has already been done by many other authors and S. Lundberg himself. This post is dedicated to the specific example in the paper “Consistent Individualized Feature Attribution for Tree Ensembles” (Tree SHAP) on why traditional metric for global feature importance such as split (number of times a feature is used for splitting), gain (reduction in loss due to a feature’s split) provide inconsistent results when a model is changed to rely more on a particular feature. This post will contain most calculations involved in explaining the example (also as a reminder for myself whenever I forget how it works in the future!). Hence, I hope the readers will find this post useful for complementing the paper.

Figure 1: Example from the paper “Consistent Individualized Feature Attribution for Tree Ensembles” by S. Lundberg

The example used in the paper is the prediction of risk score for illness based on only two factors, Cough and Fever. Figure 1 are two models which the paper presented to demonstrate the consistency property of SHAP comparing to other methods. The consistency property basically says that increasing the impact of a feature on a model will not decrease the feature’s attribution towards a prediction. Here, I will only demonstrate calculations for SHAP, Gain and Split. The outputs of individual tree models are summarized in the equation below the trees, for example for model A, when Cough and Fever are both “Yes”, then it will be [True & True]*80 =80. As you can see, Cough has larger impact in model B and equal impact as Fever in model A. Naturally, you will think that a global feature importance method should reflect this property. Well, let’s do some calculations to find out.

Before calculating SHAP, Gain and Split, let’s first calculate the prediction of each model when certain feature(s) is removed (Cough/Fever/Cough & Fever). First of all, when all features are removed, then model A predicts the average, A_o = (0+0+0+80)/4 = 20; For model B, B_o=25. Figure 2 below depicts the predictions when a single feature is removed.

Figure 2: **(Top Left)** Prediction of model A when Cough is removed **(Top Right)** Prediction of model B when Cough is removed **(Bottom Left)** Prediction of model A when Fever is removed **(Bottom Right)** Prediction of model B when Fever is removed

Denote A(F=1, C=1) as prediction of model A when Fever and Cough are both True, A(F=1| R(C) ) as output of model A when Fever is True in the case that Cough is removed. Similarly for model B. Below shows how I got the values in Figure 2. Using Model A as an example, for the top left tree:

A(F=0| R(C) ) = 1/2*[A(F=0, C=1) + A(F=0, C=0)]
A(F=1| R(C) ) = 1/2*[A(F=1, C=1) + A(F=1, C=0)]

Let’s first calculate Split. For model A, based on figure 1, Split_A(F) = 1 and Split_A(C) = 2; For model B, Split_B(F) = 2 and Split_B(C) = 1. Here, we can already see that Split does not reflect the larger impact of Cough in model B. Next, let’s calculate Gain. Recap that gain of a feature is how much it reduces the current loss when splitting on the feature. For model A, when all features are removed, the mean squared loss is 1200 and the equation is as below:

Now, splitting on Fever, the loss is 800 and the equation is as below:

Hence, Gain_A(F) = 1200–800 = 400. Next, when we split by Cough, the loss will be 0. Hence, Gain_A(C) = 800–0 = 800. Repeating this for model B, Gain_B(F) = 1425–800=625, Gain_B(C)=800–0 = 800. Clearly, Gain doesn’t reflect the larger impact of Cough in model B too, in fact it reflects a decrease in impact.

Next, let’s calculate SHAP values. In the TreeSHAP paper, the attribute of feature i for a single sample x (local interpretability) is given by

where N is the set of all features, M is the size of N, f_x(S U {i}) is the prediction of the model on x when feature i is included and f_x(S) is the prediction of the model on x when feature i is excluded, given a particular subset of features S. To compute the global importance of feature i given a set of samples X, it is basically the average of absolute SHAP over all samples

Let’s tweak the equations above to suit our example. First of all, N = {F, C}, M=2, S can be {} or {F} when computing attribute of Cough, {} or {C} when computing attribute of Fever, and the fraction which involves factorial in Figure 3 will be 1/2 no matter which S. Hence, for model A, rewriting the equation in Figure 3 with respect to feature F will give the following:

Figure 5: SHAP value for Fever using model A for a single sample F=i, C=j

where i, j=0/1. The first part of the second equation in Figure 5 shows the difference in prediction when Cough is removed from model A vs. all features are removed, and the second part is the difference in prediction when all features are used in model A vs. Fever is removed from model A. Finally, the mean absolute SHAP for feature F given model A is just the sum across all combinations of Fever and Cough:

Figure 6: Global importance using SHAP for feature F in Model A

Similarly, we obtain the following formula for attribute of Cough for model A:

, attributes of both Fever and Cough for model B:

We already calculated the values required in all of the formula above in Figure 1 and 2. Substituting all the values, we got the global importances SHAP_A(Fever) = 20, SHAP_A(Cough) = 20, SHAP_B(Fever) = 20, SHAP_B(Cough) = 25. As you can see, the attribution of Cough in model B did not decrease, hence demonstrating that SHAP assigns attributes consistently even when the model relies more on a particular feature, which is not observed in traditional measures such as Gain and Split.

Although this is a simple example, this consistency is proven to be general by theoretical foundations in game theory about Shapley values, which are the optimal values to assign rewards (feature attributions) to each member (feature) of a coalition in producing a certain payout (prediction). As you probably already noticed, computing Shapley values seems to be computationally intensive. However, the TreeSHAP algorithm runs in O(T LD² ) where T is the number of trees, L is the max number of leaves in any tree, M is the number of features and D is the max depth of any tree. It runs very fast in practice!

I hope this post helps anyone who’s reading the paper to understand more about the calculations of different measures.

References:

A detailed walk-through of SHAP example for interpretable machine learning

Written by Stanley G