I built a soft voting model and wanted to calculate the feature contribution for an individual prediction. However, I found discrepancies in the result between the feature contributions calculated by myself and the table in the Explainer Dashboard. I fit the same model and processed data but still the result is different. Is there anyway to solve it ?
This is the feature contributions i get from the explainer dashboard
Below are the codes that I calculate the feature contribution manually using LIME. For now I hardcoded the average_population_effect to be the same with the value in Explainer Dashboard to check on the result but it’s different. I also tried to change the configuration for LIME but not much changes on the result
custom_input = np.array([[-1.1146465042123073, -1.1269069848628173, -0.5204541934529272, -0.38306820131416996, -1.8520200101947943, 0.18595422584978583, -1.181043961181701, -0.9282866963312929]])
explainer = lime.lime_tabular.LimeTabularExplainer(
training_data=risk_factors_test, # Scaled data
feature_names=['sex', 'age', 'hypertension', 'heart_disease', 'ever_married', 'avg_glucose_level', 'bmi', 'smoking_status'],
class_names=['No Stroke', 'Stroke'],
mode='classification',
# kernel_width=kernel_width,
# feature_selection=feature_selection
)
def predict_proba_fn(X):
return soft_voting_model.predict_proba(X)
# Explain the custom input prediction
exp = explainer.explain_instance(
custom_input[0],
predict_proba_fn,
num_features=8,
# num_samples=num_samples
)
# Get the explanation as a dataframe
explanation_list = exp.as_list()
explanation_df = pd.DataFrame(explanation_list, columns=['Reason', 'Effect'])
try:
explanation_df['Effect'] = explanation_df['Effect'].astype(float)
except ValueError:
# If 'Effect' values are percentages in string format, convert them to numerical values
explanation_df['Effect'] = explanation_df['Effect'].str.rstrip('%').astype('float') / 100.0
# The average population
average_population_effect = 52.21 # Updated value from the explainer dashboard
# Calculate the final prediction
effect_values = explanation_df['Effect']
contributions_sum = effect_values.sum() * 100 # Convert back to percentage points
final_prediction = average_population_effect + contributions_sum
# This is the result
Reason Effect
0 Average of population 52.21%
1 hypertension = -0.5204541934529272 -18.24%
2 bmi = -1.181043961181701 -12.62%
3 avg_glucose_level = 0.18595422584978583 1.35%
4 sex = -1.1146465042123073 -1.33%
5 Other features combined +0.0%
6 age = -1.1269069848628173 0.97%
7 heart_disease = -0.38306820131416996 0.87%
8 smoking_status = -0.9282866963312929 -0.33%
9 ever_married = -1.8520200101947943 0.00%
10 Final prediction 22.88%
Tky02 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.