I have a set of agricultural practice adoption data for a set of plots, and demographics and other survey variables associated with each farmer.
Some points to note – a few farmers have multiple plots listed, so practice information varies while demographic data stays the same.
I am trying to use 2 main types of analysis. The first is a Random Forest approach where I ran a classification RF for each of the agricultural practices and looked at the permutation importances to see which were the most important explanatory variables. My OOB errors are not great – 36%, 14% and 21%.
The second approach was one I found based on literature – a multivariate probit model as seen in Kassie et al 2009: https://onlinelibrary.wiley.com/doi/epdf/10.1111/j.1477-8947.2009.01224.x (sorry I think it is not open access!).
I removed any highly correlated variables, made all my categorical variables into dummy variables and used the mvProbit package to run the analysis.
An example:
df <- tibble(AgPrac = c(1,0,1,0,0,0,0,1),
AgPrac2 = c(1,0,0,0,1,1,1,1),
AgPrac3 = c(1,1,0,0,1,1,0,0),
Farmer_Woman = c(0,0,0,0,1,0,0,1),
District2 = c(0,0,0,1,1,0,0,1),
Farmer_age = c(41, 31, 61, 39, 50, 54, 60,55),
Farmer_educ_primary = c(0,1,1,1,0,0,0,1),
Farmer_educ_secondary = c(1,0,0,0,0,0,0,0))
prob_mod <- mvProbit(cbind(AgPrac, AgPrac2, AgPrac3) ~ Farmer_Woman + District2 + Farmer_age + Farmer_educ_primary + Farmer_educ_secondary, data = df) `
I have about 9 such regular + dummy variables and would like to add more, and have around 1100 plot observations.
As seen in this Posit forum question, I get many (50 +) warnings that the correlation matrix is not positive definite. I tried removing variables from the model but it did not help. I do not think I have a dummy variable trap. Could anyone explain why I am getting this error and how to address it? Should I just stick to running 3 univariate probit models? Does this have something to do with repeated farmer demographics for some plots — should I select the plots with maximum adoption and keep only unique farmer IDs?
I am not supposed to share the actual data but can provide any other information…