I’ve reviewed many ‘do X using dplyr
based on condition’ SO questions, and I believe only this one comes close to what I’m doing. I’ve tried but cannot manage to adapt @divibisan’s answer there.
My data looks like:
df <- tibble(
Group_ID = c(1,2,3,4,5,6),
statusA = c("NEW", "NEW", "OLD","NEW", "OLD","OLD"),
statusB = c("BONUS", NA, "BONUS", NA, NA,"ROLLBACK"),
someVar = c("Thanks", "for", "your", "help","!","!"))
When statusB
is any non-NA value, I want to ‘split’ (as it were) that row such that for the Group_ID
there exists a distinct row for its statusA
and its statusB
. When rows are ‘split’, the row with a non-NA value for statusA
should show NA for statusB
, and the row with a non-NA value for statusB
should show NA for statusA
. All remaining variables (i.e., other than statusA
and statusB
) should appear, per each Group_ID
, on each line.
desired_df <- tibble(
Group_ID = c(1,1,2,3,3,4,5,6,6),
statusA = c("NEW",NA,"NEW","OLD",NA,"NEW","OLD","OLD",NA),
statusB = c(NA,"BONUS",NA,NA,"BONUS",NA,NA,NA,"ROLLBACK"),
someVar = c("Thanks","Thanks", "for", "your","your","help","!","!","!"))
I strongly wish to avoid using a loop. Can this be accomplished using dplyr?
You can try:
library(dplyr)
library(tidyr)
df |>
reframe(statusA = c(statusA, NA[!is.na(statusB)]),
statusB = c(NA[!is.na(statusB)], statusB), .by = -c(statusA, statusB)) |>
relocate(c(statusA, statusB), .after = Group_ID)
# A tibble: 9 × 4
Group_ID statusA statusB someVar
<dbl> <chr> <chr> <chr>
1 1 NEW NA Thanks
2 1 NA BONUS Thanks
3 2 NEW NA for
4 3 OLD NA your
5 3 NA BONUS your
6 4 NEW NA help
7 5 OLD NA !
8 6 OLD NA !
9 6 NA ROLLBACK !
1
Bind the rows together while blanking each of the other values to NA
:
df %>% mutate(statusB = NA) %>%
bind_rows(df %>% filter(!is.na(statusB)) %>% mutate(statusA = NA)) %>%
arrange(Group_ID)
## Group_ID statusA statusB someVar
##1 1 NEW <NA> Thanks
##2 1 <NA> BONUS Thanks
##3 2 NEW <NA> for
##4 3 OLD <NA> your
##5 3 <NA> BONUS your
##6 4 NEW <NA> help
##7 5 OLD <NA> !
##8 6 OLD <NA> !
##9 6 <NA> ROLLBACK !