I am trying to apply BART for classification in a problem where predictors are dummy variable as well as the y variable. I know is an uncommon set up but unfortunately this is the setting. Actually the 0 and 1 values were obtained from a categorical variable ranging from -4 to 4 setting the negative values to 0 add the positive values to 1. I also have the categorical version of the data in case it may be useful.
Now, my predictors contain a lot of NA values (namely 70%) and consist of a 648×48 matrix of dummies 0-1. My y variable does not contain missing values and has 648 values.
I am currently working in R with RSTudio. When I perform the code below, however the result are disappointing:
bart_machine = build_bart_machine(predictors, response_var,use_missing_data = TRUE, use_missing_data_dummies_as_covars = TRUE)
bart_machine$confusion_matrix
Namely I obtain a NULL confusion matrix and
bartMachine v1.3.4.1 for regression
Missing data feature ON
training data size: n = 638 and p = 96
built in 5.8 secs on 8 cores, 50 trees, 250 burn-in and 1000 post. samples
sigsq est for y beforehand: 0.016
avg sigsq estimate after burn-in: 0.00314
in-sample statistics:
L1 = 9.91
L2 = 1.01
rmse = 0.04
Pseudo-Rsq = 0.9547
p-val for shapiro-wilk test of normality of residuals: 0
p-val for zero-mean noise: 0.99451
Now my questions are: 1) do you think that I have too many NA values in order to perform a BART? Do you think my set up should at least produce a confusion matrix? Do you think that the categorical version of the data may be more helpful here or the disappointing results above are due to something more profound?
Thank you