I have a dataset with time-ordered variables where I distinguish between a continuous series of missing values including the final value (monotone missing) and missing values where at least one non-missing value separates them from the final value (intermittent missing).
I would like to use multiple imputation to get several versions of a dataset where the intermittent missing values are imputed, but the monotone missing values remain missing (to be imputed later with a slightly different strategy).
Here I have a dataframe with a bunch of missing values as well as a mask that specifies the intermittent values I would like to impute. In this, only the second line with an intermittent missing value and no monotone missing values gets imputed.
library(dplyr)
library(mice)
set.seed(2024)
df <- tribble(
~v1, ~v2, ~v3, ~v4, ~v5,
5, NA, 3, 9, NA,
6, 6, NA, 1, 1,
2, NA, 2, NA, NA,
4, 8, 7, NA, NA,
7, 4, 2, 5, 4
)
mask <- tribble(
~v1, ~v2, ~v3, ~v4, ~v5,
FALSE, TRUE, FALSE, FALSE, FALSE,
FALSE, FALSE, TRUE, FALSE, FALSE,
FALSE, TRUE, FALSE, FALSE, FALSE,
FALSE, FALSE, FALSE, FALSE, FALSE,
FALSE, FALSE, FALSE, FALSE, FALSE
)
mi <- mice(data = df,
where = mask)
complete(mi, action = "long")
What I get is
.imp .id v1 v2 v3 v4 v5
1 1 1 5 NA 3 9 NA
2 1 2 6 6 3 1 1
3 1 3 2 NA 2 NA NA
4 1 4 4 8 7 NA NA
5 1 5 7 4 2 5 4
6 2 1 5 NA 3 9 NA
7 2 2 6 6 3 1 1
8 2 3 2 NA 2 NA NA
9 2 4 4 8 7 NA NA
10 2 5 7 4 2 5 4
But I want the intermittent values to be non-missing, so something like
.imp .id v1 v2 v3 v4 v5
1 1 1 5 2 3 9 NA
2 1 2 6 6 3 1 1
3 1 3 2 4 2 NA NA
4 1 4 4 8 7 NA NA
5 1 5 7 4 2 5 4
6 2 1 5 5 3 9 NA
7 2 2 6 6 3 1 1
8 2 3 2 7 2 NA NA
9 2 4 4 8 7 NA NA
10 2 5 7 4 2 5 4