I wanted to create a group variable with specific conditions based on the values of GT. If GT is ‘I’, it should generate a different number. If GT is ‘G’, the following rules should apply:
- For the first four consecutive ‘G’s, assign the same number.
- If there are five consecutive ‘G’s, assign the same number to the first three ‘G’s and a different number to the remaining two ‘G’s.
My Attempt:
So far I got the following:
library(tidyverse)
df1 <- data.frame(
GT = c(rep("G", 9), rep("I", 2), rep("G", 2), rep("I", 1))
)
df1
#> GT
#> 1 G
#> 2 G
#> 3 G
#> 4 G
#> 5 G
#> 6 G
#> 7 G
#> 8 G
#> 9 G
#> 10 I
#> 11 I
#> 12 G
#> 13 G
#> 14 I
# Add the group variable based on the conditions
df1 <-
df1 %>%
mutate(
group = case_when(
GT == "I" ~ row_number(), # Each 'I' gets a unique number
GT == "G" ~ (cumsum(GT == "G") - 1) %/% 4 + 1 # Group G's in batches of 4
)
)
df1
#> GT group
#> 1 G 1
#> 2 G 1
#> 3 G 1
#> 4 G 1
#> 5 G 2
#> 6 G 2
#> 7 G 2
#> 8 G 2
#> 9 G 3
#> 10 I 10
#> 11 I 11
#> 12 G 3
#> 13 G 3
#> 14 I 14
Required Output
GT group
1 G 1
2 G 1
3 G 1
4 G 1
5 G 2
6 G 2
7 G 2
8 G 3
9 G 3
10 I 10
11 I 11
12 G 3
13 G 3
14 I 14