I have trained an XGBoost model using caret and now, I am calculating the mean SHAP value of each predictor using the package SHAPforxgboost, using the following code:
library(SHAPforxgboost)
to_select <- names(caret.xgb$trainingData)[-1] #variables to select in the training set,
#the first one is the outcome, needs to be removed
shap_values <- shap.values(xgb_model = caret.xgb$finalModel,
X_train =data_train %>%
select(all_of(to_select)) %>% as.matrix()
)
shap_long <- shap.prep(shap_contrib = shap_values$shap_score,
X_train = data_train %>%
select(all_of(to_select)) %>% as.matrix()
)
However, I get the following error:
Error in predict.xgb.Booster(xgb_model, (X_train), predcontrib = TRUE) :
Feature names stored in `object` and `newdata` are different!
But I am already selecting the same features as in the training set of the model, and when I use the function identical the output is TRUE.
Thank you!
I tried selecting the same features as in the training set in the model, in case the order of the variables was different, but the error is still the same. I also looked at the intersection of colnames() of each dataset, and it was complete.
a12456 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
0
I found the error!
The xgboost function had internally changed one column name from my dataset, that’s why I got the error.
Here is the code I used to look for it:
caret.xgb$coefnames %>% as_tibble() %>% filter(!value %in% to_select)