I want to perform multiple imputation with chained equations, using the package ‘mice’ in R. One of the variables that has missing values is categorical with three levels (say A, B, and C). The values that are missing from this variable can in reality only be A or B, but not C. How can I restrict the imputation such that level C is never imputed for this variable?
Toy example here:
require(mice)
set.seed(12345)
N = 50
miss = sample(0:1, N, prob = c(0.9, 0.1), replace = TRUE)
miss2 = sample(0:1, N, prob = c(0.4, 0.6), replace = TRUE)
df = data.frame(
x1 = rnorm(N),
x2 = factor(sample(LETTERS[1:3], N, replace = TRUE)),
x3 = rnorm(N)
)
df$x2[miss==1] <- NA
df$x3[miss2==1] <- NA
imp = mice(df, m = 2)
imp1 = complete(imp, 1)
imp2 = complete(imp, 2)
dfx2 = data.frame(df$x2, imp1$x2, imp2$x2)
dfx2[is.na(dfx2$'df.x2'),]
df.x2 imp1.x2 imp2.x2
10 <NA> C C
20 <NA> A C
23 <NA> B B
38 <NA> C C
43 <NA> B B
So mice imputes x2 as either A, B, or C, as it “should” – but I’d like to restrict the imputation to only either A or B.
I’ve looked at this: https://github.com/amices/mice/issues/224#issuecomment-693935305 – but there the restriction is conditional on another variable in the data set. In my case, the restriction applies for all missing values of x2.
This question: Imputing a categorical variable with MICE but restricting the possible values is very similar to mine. The proposed solution there was to remove from the data frame all rows where the variable-to-be imputed has the category we don’t want to impute (ie, in my toy example to remove all rows where x2==”C”), then impute, and afterwards add those rows to the data frame again. This does not work in my case, however, as my data also feature other variables with missing values, which then would not be imputed (cf variable x3 in my toy data set).
Shabakuk is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.