My team and I are fairly new to SHAP, and we are trying to run some tests where we can apply SHAP on some synthetic data that we created. We have a regression task at hand, where there are 5 numeric independent variables and 1 numeric dependent variable.
I know that normally we do something like below to get the SHAP values for a linear regression.
X_train = data[['x1', 'x2', 'x3', 'x4', 'x5']]
y_train = data[['y']]
model = sklearn.linear_model.LinearRegression()
model.fit(X_train, y_train)
explainer = shap.LinearExplainer(model, X_train)
In the above example, I predefine a linear regression model on my data before running SHAP. However, my manager asked me to “rather than running the true models (which we have the benefit of knowing), let’s instead run SHAP and have it try to predict y, using x1 through x5 as the data.”
Is there a way to calculate SHAP values without specifying a specifying a model, like Linear Regression, and instead feeding it only the independent and the dependent variables?