I wanted to create a group variable with specific conditions based on the values of GT. If GT is ‘I’, it should generate a different group number. If GT is ‘G’, the following rules should apply:
- For the first four consecutive ‘G’s, assign the same group number, and for the next four consecutive ‘G’s, assign the next group number.
- If there are five consecutive ‘G’s, assign the same group number to the first three ‘G’s and a different group number to the remaining two ‘G’s.
My Attempt:
library(tidyverse)
df1 <-
data.frame(
GT = c(rep("G", 9), rep("I", 2), rep("G", 2), rep("I", 1))
)
df2 <-
df1 %>%
mutate(
grp1 = case_when(
GT == "I" ~ row_number()
, .default = consecutive_id(GT)
)
) %>%
group_by(GT, grp1) %>%
mutate(count = n()) %>%
ungroup() %>%
mutate(
grp2 = case_when(
GT == "G" & count %/% 4 > 0 ~ row_number() %/% 5 + row_number(1)
, .default = grp1
)
, grp3 = case_when(
GT == "G"~ (cumsum(GT == "G") - 1) %/% 4 + 1
, .default = grp1
)
)
df2
#> # A tibble: 14 × 5
#> GT grp1 count grp2 grp3
#> <chr> <int> <int> <dbl> <dbl>
#> 1 G 1 9 1 1
#> 2 G 1 9 1 1
#> 3 G 1 9 1 1
#> 4 G 1 9 1 1
#> 5 G 1 9 2 2
#> 6 G 1 9 2 2
#> 7 G 1 9 2 2
#> 8 G 1 9 2 2
#> 9 G 1 9 2 3
#> 10 I 10 1 10 10
#> 11 I 11 1 11 11
#> 12 G 3 2 3 3
#> 13 G 3 2 3 3
#> 14 I 14 1 14 14
The group variable grp2
is very close to requirements. However, I want df2[8:9, 4]
should have different group number than df2[5:7, 4]
. Any hint please!