My data has a significant number of missing values, so I can’t use the na.omit() default in order to conduct downstream analysis on my dataset, as this removes the whole row if there is even one value absent. It is my understanding that the mice package is a robust way to perform so many consecutive imputations of the data, but after I perform the complete() function, my number of observations jumps from ~60 to 300.
Here is my code where the jump seems to lie:
Impute missing values using mice. mydata: 58obs of 30 variables
imputed_data <- mice(mydata, m = 5, method = "pmm", seed = 123)
Pool the imputed datasets – here is where my dataframe seems to change considerably in structure. pooled_data becomes 290obs of 32 variables
pooled_data <- complete(imputed_data, "long", include = FALSE)
Why is this the case, and am I still able to perform downstream dimensionality reduction and statistical testing on this data frame and have it be representative of the original dataset? This is my goal with the imputation. If there are superior methods to perform this imputation, I’m also very interested to learn. Thanks in advance!
user24943575 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.