I have a data frame with variable that contains strings
df <- data.frame(ID = 1:5,
string = c("blah, F21, blah",
"woop, woop, F25",
"G1, yes, yes",
"hey, hey F23",
"how, G2, how"))
I have a vector that contains a list of characters that I want to use to search thru my dataframe
check <- c("F21", "F23", "G1")
I am looking for help on trying to evaluate the string variable to determine if it contains any characters that are in the check vector. I would like the output df to look like this
ID | string | test |
---|---|---|
1 | blah, F21, blah | in check |
2 | woop, woop, F25 | not in check |
3 | G1, yada, yada | in check |
4 | hey, hey, F23 | in check |
5 | how, G2, how | not in check |
A Tidyverse would be very much appreciated.
the struggle bus is camped out in my driveway
0
You could form a regex alternation of your check substrings, and then use grepl()
to check for their presence in the data frame:
check <- c("F21", "F23", "G1")
regex <- paste0("\b(?:", paste(check, collapse="|"), ")\b")
df$test <- ifelse(grepl(regex, df$string), "in check", "not in check")
df
ID string test
1 1 blah, F21, blah in check
2 2 woop, woop, F25 not in check
3 3 G1, yes, yes in check
4 4 hey, hey F23 in check
5 5 how, G2, how not in check
Note: If your actual intention be to find the patterns as any substrings (but not as standalone words), then use this regex pattern instead:
regex <- paste0("(?:", paste(check, collapse="|"), ")")
3