I suspect that there are parallels with other questions but I haven’t been able to find a combination which works in this situation.
In essence I am trying to use a for loop to do multiple imputation (I am aware that there are packages which can do the whole analysis but I need to extract interim stages).
Starting with a data set like this, where y is the outcome, x has the missing values and the imputed missing values are in a1-3.
dat <- data.frame(y = c(1,0,0,1,0),
x= c(1,NA,2,NA,1),
a1 = c(NA,1,NA,2,NA),
a2 = c(NA,1,NA,1,NA),
a3 = c(NA,2,NA,1,NA))
I thought that I could loop over a1-3, coalescing the values into the x variable and then running the anaysis but this doesn’t seem to work. I’ve tried:
for (i in 1:3) {
dat$x <- coalesce(dat$x,dat$a[i])
z <- glm(y ~ x, data=dat, family="binomial")
res <- summary(z)$coefficient
print(res)
}
But the first line dat$x <- coalesce(dat$x,dat$a[i])
clearly isn’t doing what I think it should as all three “res” are the same and if I print dat$x after this line it hasn’t been updated with the values from A1-3.
I’d thought that using mutate from dplyr might work as in:
for (i in 1:3) {
dat %>%
mutate(x = coalesce(x,a[i]))
z <- glm(y ~ x, data=dat, family="binomial")
res <- summary(z)$coefficient
print(res)
}
But that gives the error “object ‘a’ not found”. I’ve attempted writing a function and then using apply instead of trying to loop but didn’t get any further with that approach. Based on similar questions, I guess this is all to do with how the loop is using the column names but I’ve not managed to implement any of the solutions successfully.
Any hints gratefully received.