I have an input dataset, which has partially redundant information (i.e. info
) per group (i.e. ID
) in several rows. Some rows for some groups are more complete, while some groups also never have the full information. I would like to retain the most complete information line per group and information type and discard the others. Also, I would like to be able to do this in a single dplyr
pipe without intermediate variables.
Input example:
data.frame(ID = c(1, 1, 2, 3, 3, 3, 3),
info = c("info1#info1_detail", "info1",
"info1",
"info1#info1_detail", "info1",
"info2#info2_detail", "info2"))
#> ID info
#> 1 1 info1#info1_detail
#> 2 1 info1
#> 3 2 info1
#> 4 3 info1#info1_detail
#> 5 3 info1
#> 6 3 info2#info2_detail
#> 7 3 info2
Desired output example:
data.frame(ID = c(1, 2, 3, 3),
info = c("info1#info1_detail",
"info1",
"info1#info1_detail",
"info2#info2_detail"))
#> ID info
#> 1 1 info1#info1_detail
#> 2 2 info1
#> 3 3 info1#info1_detail
#> 4 3 info2#info2_detail