I’ve run into a problem trying to access other columns/variables when setting a post-processing rule using the mice package in R.
The simplified data I have are structured as:
participant_id
date
lab_measurement
covariate_1
covariate_2
Each participant has multiple measurements (but all on different dates). I’ve got the data in long format, so each participant has multiple rows, with each representing a different date. Naturally, some measurements are missing, and I’m using MICE to impute only the missing values in lab_measurement.
The difficulty is that, within in each patient, the measurements are correlated with the values at the previous date. To account for this, I’ve created an additional column, “previous_lab”, that is based on the values in lab_measurement. I then specify the regression model used for imputation of lab_measurement as: lab_measurement ~ previous_lab + covariates. I would like previous_lab to update after each iteration in the imputation algorithm. The obvious issue is that the first lab_measurement cannot have a previous_lab, but it is safe to assume that everyone comes in with a pre-study measurement of 100.
To accomplish this, I tried the following:
imp <- mice(data, maxit = 0)
imp$post["previous_lab"] <- "imp[[j]][, 'previous_lab'] <- ave(imp[[j]][, 'lab_measurement'], imp[[j]][, 'participant_id'], FUN = function(x) {c(100, x[-length(x)])})"
predictor_matrix <- matrix(0, nrow=ncol(data), ncol=ncol(data))
rownames(predictor_matrix) <- colnames(data)
colnames(predictor_matrix) <- colnames(data)
predictor_matrix["lab_measurement", c("previous_lab", "covariate_1", "covariate_2")] <- 1
imp <- mice(data, m = 5, predictorMatrix = predictor_matrix, post = imp$post, maxit = 10, seed = 123)
Unfortunately, R spits out the following error:
Error in [.data.frame
(imp[[j]], , “participant_id”) :
undefined columns selected
A couple of things that I’ve done to trouble-shoot:
I’ve asked the post-processing rule to print the column names, by doing the following:
imp$post[“previous_lab”] <- ”
print(colnames(imp[[j]]))
imp[[j]][, ‘previous_lab’] <- ave(imp[[j]][, ‘lab_measurement’], imp[[j]][, ‘participant_id’], FUN = function(x) {c(100, x[-length(x)])})”`
And then I get the following output:
iter imp variable
1 1 lab_measurement previous_lab[1] “1” “2” “3” “4” “5” “6” “7” “8” “9” “10”
Error in [.data.frame
(imp[[j]], , “participant_id”) :
undefined columns selected
From this it seems that R has converted the column names to numbers, but then the numbers should go up to 5, not 10. When I change the number of imputations, the print-out of column names/numbers goes up to the number of imputations requested.
I’ve double-checked that participant_id has been converted from a String to a factor variable.
Last, I’ve had a look at the vignette written by Gerko Vink and Stef van Buuren (https://www.gerkovink.com/miceVignettes/Passive_Post_processing/Passive_imputation_post_processing.html). Here they applied post-processing by accessing a variable using the undefined index i:
post["tv"] <- "imp[[j]][, i] <- squeeze(imp[[j]][, i], c(1, 25))"
I’m not sure how exactly R knows that “i” needs to get to the “tv” variable.
It looks like post-processing can only be applied “within” a variable, and one cannot access other variables during post-processing. Would be greatly appreciated if anyone has grappled with this and knows a solution!
Harry M is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.