I have a dataframe where missingness in indicated by “Z” (there may also be some “z” and NA
entries present in the data), and values are entered as characters (“0”, “1”, etc). I need to create scores (“updrs1”, “updrs2″, updrs3”) that add up the non-missing values across columns selected by colname prefix (“NP1”, “NP2”, “NP3”).
dummy data:
dummy_df <- data.frame(
subject_id = seq(1,6,1),
OTHERV1 = c(1,1,0,0,1,1),
NP1VAR1 = c("Z","0","Z","Z","Z","Z"),
NP1VAR2 = c("Z","0","Z","Z","Z","Z"),
NP1VAR3 = c("Z","3","Z","Z","Z","Z"),
NP2VAR1 = c("Z","2","Z","Z","Z","Z"),
NP2VAR2 = c("Z","0","Z","Z","Z","Z"),
NP2VAR3 = c("Z","0","Z","Z","Z","Z"),
NP3VAR1 = c("Z","4","Z","Z","Z","Z"),
NP3VAR2 = c("Z","0","Z","Z","z","Z"),
NP3VAR3 = c("Z","0","Z","Z","Z",NA),
OTHERV2 = c(NA,NA,NA,NA,NA,NA)
)
desired output:
subject_id updrs1 updrs2 updrs3
1 1 Z Z Z
2 2 3 2 4
3 3 Z Z Z
4 4 Z Z Z
5 5 Z Z Z
6 6 Z Z Z
I’ve tried variations on this answer, with no luck.
I think I want to use a combination of tidyr::pivot_longer()
, dplyr::group_by()
, and sum()
.
jlbro is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.