I am working on a binary classification problem using PyTorch and I am trying to interpret my model using SHAP. I have successfully obtained the SHAP values using KernelExplainer
for each class, I would like to combine these arrays to get a single matrix of SHAP values that represents the combined classes. I am not sure how to do this.
import numpy as np
from shap import KernelExplainer
np.random.seed(0)
train_data, train_labels = next(iter(train_loader))
train_data = to_numpy(train_data)
exp = KernelExplainer(model.predict_shap, train_data)
test_data, test_labels = next(iter(test_loader))
test_data = to_numpy(test_data)
shap_values = exp.shap_values(test_data)
class_label1_shap_values = shap_values[0]
class_label2_shap_values = shap_values[1]
shap_values
variable is a list of arrays, where each array corresponds to a class label. I would like to combine these arrays to get a single array of SHAP values. I tried adding the arrays together, but I am not sure if this is the correct approach.
combined_shap_values = class_label1_shap_values + class_label2_shap_values
Any guidance on how to correctly combine these SHAP values would be greatly appreciated.