I’m trying to use the shap.summary_plot method to produce a graph showing shap values for a random forest model.
When the code below is run, shap.summary_plot produces this graph as a popup, but it also produces a few additional ones. This is the full list of graphs produced as popups:
- ‘Figure 1’ – The confusion matrix
- ‘Figure 2’ – The ROC curve, with an additional line showing chance level (AUC 0.5)
- ‘Figure 3’ – The ROC curve
- ‘Figure 4’ – The precision recall curve, with an additional line showing chance level (AP 0.33)
- ‘Figure 5’ – The shap graph
The expected result was that shap.summary_plot would only produce the shap graph.
Does anyone know why the additional graphs are being produced please? The best I can think of is that somehow the graphs saved via matplotlib are being reproduced by shap.summary_plot, but that doesn’t explain how the chance level lines are being added (these don’t appear in the graphs saved via matplotlib).
Here is my code:
#Import required modules
import os
import sqlalchemy
import pandas as pd
import matplotlib.pyplot as plt
import sklearn
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay, RocCurveDisplay, PrecisionRecallDisplay, f1_score
from joblib import dump, load
import shap
#Some code has been removed here to make the example simpler
#Data is read into a Pandas dataframe; this is then split into a training set and a testing set
#Select & train the model
model = RandomForestClassifier()
model.fit(X = X_train, y = Y_train)
#Make test predictions using the trained model
test_prediction = model.predict(X = X_test)
#Plot the confusion matrix and save to file
cm = confusion_matrix(Y_test, test_prediction)
cm_display = ConfusionMatrixDisplay(confusion_matrix = cm)
cm_display.plot()
plt.savefig('Confusion matrix.png')
#Plot the ROC curve and save to file
ROC = RocCurveDisplay.from_estimator(model, X_test, Y_test, plot_chance_level = True)
ROC.plot()
plt.savefig('ROC curve.png')
#Plot the precision-recall curve and save to file
PR = PrecisionRecallDisplay.from_estimator(model, X_test, Y_test, plot_chance_level = True)
PR.plot()
plt.savefig('Precision recall curve.png')
#Calculate the model's Shapley values
explainer = shap.Explainer(model)
shap_values = explainer(X_test[0:100])
shap.summary_plot(shap_values[:,:,1])