Using na.locf to impute dataset with multiple timepoints in long format dataset

I have a dataset like this:

structure(list(study_id = structure(c("P005", "P005", "P005",
"P008", "P008", "P008", "P021", "P021", "P021", "P028", "P028",
"P028", "P032", "P032", "P032", "P036", "P036", "P036", "P037",
"P037", "P037", "P049", "P049", "P049", "P053", "P053", "P053",
"P069", "P069", "P069", "P079", "P079", "P079", "P089", "P089",
"P089", "P093", "P093", "P093", "P096", "P096", "P096", "P104",
"P104", "P104", "P105", "P105", "P105"), label = "ISMART Study ID", format.stata = "%9s"),
    phase = structure(c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L,
    2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L,
    2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L,
    2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), levels = c("Baseline", "Midterm",
    "Final"), class = "factor"), selfeff1 = structure(c(3L, 3L,
    3L, 3L, 3L, 3L, 2L, 3L, 3L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
    3L, 3L, 3L, 3L, 3L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, NA, 3L,
    3L, 3L, 3L, 3L, 3L, NA, 3L, 3L, 3L, 3L, 3L, 3L, 3L, NA, 3L,
    2L), levels = c("Not confident", "Somewhat confident", "Very confident"
    ), class = "factor"), selfeff3 = structure(c(3L, 3L, 3L,
    3L, 3L, 3L, 3L, 3L, 3L, 2L, 3L, 3L, 2L, 3L, 2L, 3L, 3L, 3L,
    3L, 3L, NA, 3L, 2L, 3L, 2L, 3L, 3L, 3L, 3L, 3L, NA, 3L, 3L,
    3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L
    ), levels = c("Not confident", "Somewhat confident", "Very confident"
    ), class = "factor")), class = "data.frame", row.names = c(NA,
-48L))

This is pivot-long format dataset. each study_id has three rows for Baseline, Midterm and Final value. Now I want to use the carryforward/carryback method to impute the missing value. But since they are repeated measure, I also want to apply the rule like:

If they are missing baseline, but have midterm: carryback (i.e., replace baseline with midterm);
If they are missing midterm, but have final: carryback (i.e., replace midterm with final)
If they are missing final, but have midterm: carryforward (i.e., replace final with midterm)
If they are missing both baseline and final, carryforward and back midterm (i.e., replace both with midterm).

I tried to write a function to achieve that since in my real dataset, I have selfeff1-13. The code is like this:

impute_values <- function(x, phase) {
  # Carryback: Replace baseline with midterm if baseline is missing but midterm is available
  if (phase == "Baseline" & is.na(x) & phase == "Midterm" & !is.na(x)) {
    x <- na.locf(x)
  }
  # Carryback: Replace midterm with final if midterm is missing but final is available
  # Carryforward: Replace final with midterm if final is missing but midterm is available
  else if (phase == "Midterm" & is.na(x) & phase == "Final" & !is.na(x[3])) {
    x <- na.locf(x)
  } else if (phase == "Midterm" & !is.na(x) & phase == "Final" & is.na(x[3])) {
    x <- na.locf(x, option="nocb")
  }
  # For the case where both baseline and final are missing but midterm is available, 
  # we can simply carry forward the missing values from midterm
  else if (phase == "Baseline" & is.na(x) & phase == "Final" & is.na(x) & 
           phase == "Midterm" & !is.na(x)) {
    x <- na.locf(x)
  }
  return(x)
}

But when I try to test this function with one variable: say, selfeff1, I use the code :

df2 <- df %>%
  mutate(selfeff1=impute_values(selfeff1, phase))

summary(is.na(df2$selfeff1)

I got error that saying:

error in if(```)NULL,  the condition has length>1

Could someone help to show me how to fix it and make it work for my case?

You could use a function prepl that pastes the is.na-structure into a binary pattern, e.g. "001" for study_id P037 in selfeff3. So you can easily apply replacement logic for each case in each selfeff* column using grep in by (which you can imagine like a combination of split and lapply) then unsplit. This makes it clear at a glance what is happening and it can be expanded as required.

> prepl <- (x) {
+   p <- paste(+is.na(x), collapse='')
+   if (grepl('10.', p)) {
+     x[1] <- x[2]
+     x
+   } else if (grepl('.10', p)) {
+     x[2] <- x[3]
+     x
+   } else if (grepl('.01', p)) {
+     x[3] <- x[2]
+     x
+   } else if (grepl('1.1', p)) {
+     x[c(1, 3)] <- x[2]
+     x
+   } else {
+     x
+   }
+ }

> icl <- grep('^selfeff\d+$', names(df))
> df[icl] <- lapply(df[icl], (x) by(x, df$study_id, prepl) |> unsplit(df$study_id))
> df
   study_id    phase           selfeff1           selfeff3
1      P005 Baseline     Very confident     Very confident
2      P005  Midterm     Very confident     Very confident
3      P005    Final     Very confident     Very confident
4      P008 Baseline     Very confident     Very confident
5      P008  Midterm     Very confident     Very confident
6      P008    Final     Very confident     Very confident
7      P021 Baseline Somewhat confident     Very confident
8      P021  Midterm     Very confident     Very confident
9      P021    Final     Very confident     Very confident
10     P028 Baseline Somewhat confident Somewhat confident
11     P028  Midterm     Very confident     Very confident
12     P028    Final     Very confident     Very confident
13     P032 Baseline     Very confident Somewhat confident
14     P032  Midterm     Very confident     Very confident
15     P032    Final     Very confident Somewhat confident
16     P036 Baseline     Very confident     Very confident
17     P036  Midterm     Very confident     Very confident
18     P036    Final     Very confident     Very confident
19     P037 Baseline     Very confident     Very confident
20     P037  Midterm     Very confident     Very confident
21     P037    Final     Very confident     Very confident
22     P049 Baseline     Very confident     Very confident
23     P049  Midterm Somewhat confident Somewhat confident
24     P049    Final     Very confident     Very confident
25     P053 Baseline     Very confident Somewhat confident
26     P053  Midterm     Very confident     Very confident
27     P053    Final     Very confident     Very confident
28     P069 Baseline     Very confident     Very confident
29     P069  Midterm     Very confident     Very confident
30     P069    Final     Very confident     Very confident
31     P079 Baseline     Very confident     Very confident
32     P079  Midterm     Very confident     Very confident
33     P079    Final     Very confident     Very confident
34     P089 Baseline     Very confident     Very confident
35     P089  Midterm     Very confident     Very confident
36     P089    Final     Very confident     Very confident
37     P093 Baseline     Very confident     Very confident
38     P093  Midterm     Very confident     Very confident
39     P093    Final     Very confident     Very confident
40     P096 Baseline     Very confident     Very confident
41     P096  Midterm     Very confident     Very confident
42     P096    Final     Very confident     Very confident
43     P104 Baseline     Very confident     Very confident
44     P104  Midterm     Very confident     Very confident
45     P104    Final     Very confident     Very confident
46     P105 Baseline     Very confident     Very confident
47     P105  Midterm     Very confident     Very confident
48     P105    Final Somewhat confident Somewhat confident

There may be specific reasons why you want to use a loop with your actual data, however for your example an approach based on vec_fill_missing() may be more practical/straightforward:

library(dplyr)
library(vctrs)

df <- structure(list(study_id = structure(c("P005", "P005", "P005",
                                      "P008", "P008", "P008", "P021", "P021", "P021", "P028", "P028",
                                      "P028", "P032", "P032", "P032", "P036", "P036", "P036", "P037",
                                      "P037", "P037", "P049", "P049", "P049", "P053", "P053", "P053",
                                      "P069", "P069", "P069", "P079", "P079", "P079", "P089", "P089",
                                      "P089", "P093", "P093", "P093", "P096", "P096", "P096", "P104",
                                      "P104", "P104", "P105", "P105", "P105"), label = "ISMART Study ID", format.stata = "%9s"),
               phase = structure(c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L,
                                   2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L,
                                   2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L,
                                   2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), levels = c("Baseline", "Midterm",
                                                                               "Final"), class = "factor"), selfeff1 = structure(c(3L, 3L,
                                                                                                                                   3L, 3L, 3L, 3L, 2L, 3L, 3L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
                                                                                                                                   3L, 3L, 3L, 3L, 3L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, NA, 3L,
                                                                                                                                   3L, 3L, 3L, 3L, 3L, NA, 3L, 3L, 3L, 3L, 3L, 3L, 3L, NA, 3L,
                                                                                                                                   2L), levels = c("Not confident", "Somewhat confident", "Very confident"
                                                                                                                                   ), class = "factor"), selfeff3 = structure(c(3L, 3L, 3L,
                                                                                                                                                                                3L, 3L, 3L, 3L, 3L, 3L, 2L, 3L, 3L, 2L, 3L, 2L, 3L, 3L, 3L,
                                                                                                                                                                                3L, 3L, NA, 3L, 2L, 3L, 2L, 3L, 3L, 3L, 3L, 3L, NA, 3L, 3L,
                                                                                                                                                                                3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L
                                                                                                                                   ), levels = c("Not confident", "Somewhat confident", "Very confident"
                                                                                                                                   ), class = "factor")), class = "data.frame", row.names = c(NA,
                                                                                                                                                                                              -48L))

df2 <- df %>%
  mutate(selfeff1 = vec_fill_missing(selfeff1, direction = "updown"), .by = study_id)

df2
#>    study_id    phase           selfeff1           selfeff3
#> 1      P005 Baseline     Very confident     Very confident
#> 2      P005  Midterm     Very confident     Very confident
#> 3      P005    Final     Very confident     Very confident
#> 4      P008 Baseline     Very confident     Very confident
#> 5      P008  Midterm     Very confident     Very confident
#> 6      P008    Final     Very confident     Very confident
#> 7      P021 Baseline Somewhat confident     Very confident
#> 8      P021  Midterm     Very confident     Very confident
#> 9      P021    Final     Very confident     Very confident
#> 10     P028 Baseline Somewhat confident Somewhat confident
#> 11     P028  Midterm     Very confident     Very confident
#> 12     P028    Final     Very confident     Very confident
#> 13     P032 Baseline     Very confident Somewhat confident
#> 14     P032  Midterm     Very confident     Very confident
#> 15     P032    Final     Very confident Somewhat confident
#> 16     P036 Baseline     Very confident     Very confident
#> 17     P036  Midterm     Very confident     Very confident
#> 18     P036    Final     Very confident     Very confident
#> 19     P037 Baseline     Very confident     Very confident
#> 20     P037  Midterm     Very confident     Very confident
#> 21     P037    Final     Very confident               <NA>
#> 22     P049 Baseline     Very confident     Very confident
#> 23     P049  Midterm Somewhat confident Somewhat confident
#> 24     P049    Final     Very confident     Very confident
#> 25     P053 Baseline     Very confident Somewhat confident
#> 26     P053  Midterm     Very confident     Very confident
#> 27     P053    Final     Very confident     Very confident
#> 28     P069 Baseline     Very confident     Very confident
#> 29     P069  Midterm     Very confident     Very confident
#> 30     P069    Final     Very confident     Very confident
#> 31     P079 Baseline     Very confident               <NA>
#> 32     P079  Midterm     Very confident     Very confident
#> 33     P079    Final     Very confident     Very confident
#> 34     P089 Baseline     Very confident     Very confident
#> 35     P089  Midterm     Very confident     Very confident
#> 36     P089    Final     Very confident     Very confident
#> 37     P093 Baseline     Very confident     Very confident
#> 38     P093  Midterm     Very confident     Very confident
#> 39     P093    Final     Very confident     Very confident
#> 40     P096 Baseline     Very confident     Very confident
#> 41     P096  Midterm     Very confident     Very confident
#> 42     P096    Final     Very confident     Very confident
#> 43     P104 Baseline     Very confident     Very confident
#> 44     P104  Midterm     Very confident     Very confident
#> 45     P104    Final     Very confident     Very confident
#> 46     P105 Baseline     Very confident     Very confident
#> 47     P105  Midterm     Very confident     Very confident
#> 48     P105    Final Somewhat confident Somewhat confident

^{Created on 2024-04-24 with reprex v2.1.0}

Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa Dịch vụ tổ chức sự kiện 5 sao Thông tin về chúng tôi Dịch vụ sinh nhật bé trai Dịch vụ sinh nhật bé gái Sự kiện trọn gói Các tiết mục giải trí Dịch vụ bổ trợ Tiệc cưới sang trọng Dịch vụ khai trương Tư vấn tổ chức sự kiện Hình ảnh sự kiện Cập nhật tin tức Liên hệ ngay Thuê chú hề chuyên nghiệp Tiệc tất niên cho công ty Trang trí tiệc cuối năm Tiệc tất niên độc đáo Sinh nhật bé Hải Đăng Sinh nhật đáng yêu bé Khánh Vân Sinh nhật sang trọng Bích Ngân Tiệc sinh nhật bé Thanh Trang Dịch vụ ông già Noel Xiếc thú vui nhộn Biểu diễn xiếc quay đĩa Dịch vụ tổ chức tiệc uy tín Khám phá dịch vụ của chúng tôi Tiệc sinh nhật cho bé trai Trang trí tiệc cho bé gái Gói sự kiện chuyên nghiệp Chương trình giải trí hấp dẫn Dịch vụ hỗ trợ sự kiện Trang trí tiệc cưới đẹp Khởi đầu thành công với khai trương Chuyên gia tư vấn sự kiện Xem ảnh các sự kiện đẹp Tin mới về sự kiện Kết nối với đội ngũ chuyên gia Chú hề vui nhộn cho tiệc sinh nhật Ý tưởng tiệc cuối năm Tất niên độc đáo Trang trí tiệc hiện đại Tổ chức sinh nhật cho Hải Đăng Sinh nhật độc quyền Khánh Vân Phong cách tiệc Bích Ngân Trang trí tiệc bé Thanh Trang Thuê dịch vụ ông già Noel chuyên nghiệp Xem xiếc khỉ đặc sắc Xiếc quay đĩa thú vị

Filed under: Kiến thức lập trình - @ 01:25

Thẻ: r

Thiết kế website giá rẻ

Danh mục

Using na.locf to impute dataset with multiple timepoints in long format dataset