I have a neural network classifier built on TensorFlow, for a numerical data with multiclass labels. I explained it using DeepSHAP, and now I would like to do the same with LIME to compare feature explanations.
As it is my first time with LIME, I don’t quite understand the output, but I am trying to format it in the way DeepSHAP do, ie a list of arrays, each item in the list corresponds to a class and each array has the shape (samples, features).
This is what I’ve done so far to get the top 5 features for each class:
from lime import lime_tabular
# Create an explainer
explainer = lime_tabular.LimeTabularExplainer(X_train, mode='classification',
feature_names=gene_symbols, class_names=label_encoder.classes_)
# Convert the model prediction to a NumPy array
def predict_fn(x):
return combined_model.predict(x).astype(np.float64)
def predict_proba_fn_of_class(label):
def rewrapped_predict(x):
# Assuming combined_model(x) returns the probability distribution directly
preds = tf.keras.backend.eval(combined_model(x))[:, label].reshape(-1, 1) #label is class
return np.hstack((1 - preds, preds))
return rewrapped_predict
explanations_by_class = []
# Iterate over each class
for class_index in range(num_classes): # Assuming num_classes is the total number of classes
# Create a predict function specific to the chosen class
predict_fn_class = predict_proba_fn_of_class(class_index)
# Create an explainer for the current class
explainer_class = lime_tabular.LimeTabularExplainer(X_train, mode='classification',
feature_names=gene_symbols,
class_names=label_encoder.classes_)
# Explain an instance for the current class
exp_class = explainer_class.explain_instance(X_test[2], predict_fn_class, num_features=5).as_list()
# Store the explanation for the current class
explanations_by_class.append(exp_class)
The output looks like this:
[[('CYB561 <= -0.71', -0.031659175210558284),
('EPHA8 <= -0.54', -0.031116198601930344),
('ATP8B1 > 0.71', 0.029454332316088773),
('PPP2R3A <= -0.60', 0.029452940856413576),
('ZMYND12 <= -0.54', 0.029274620014499073)],
[('CYB561 <= -0.71', 0.040768483174686225),
('JADE2 <= -0.64', 0.03176854016642045),
('ACOX3 <= -0.64', 0.031385325370634085),
('PLEKHB1 <= -0.56', 0.030865445689519894),
('TMEM14A <= -0.55', 0.027479891979293455)],
[('LAMA3 > 0.55', 0.017852322063052317),
('DEFB127 > 0.14', -0.01721149720006829),
('PNPLA4 <= -0.73', -0.0157699714516676),
('ATP9A <= -0.73', -0.01565054173505174),
('GYG2 <= -0.53', -0.01553913061323871)],
[('ELAC2 <= -0.68', -0.011963967913564178),
('MTA3 <= -0.67', -0.011740218141244027),
('GLRX2 <= -0.69', -0.011713361607066134),
('NUDCD3 <= -0.65', -0.011306011791352906),
('MYO15A <= -0.68', -0.011004570024089241)]]
My first question is: is this plausible, getting scores per class this way?
The second question: is there a way to get scores for each features without ordering them, so that i can concatenate them in an array of (samples, features) by looping over multiple samples? To match with DeepSHAP output of course.