I have the following data frame
Naam <- c("Wheels", "Set")
Omschrijving <- c("Ah", "ALBERT")
data <- data.frame(Naam, Omschrijving)
data$Label <- ""
I have set up a labeler that classifies a row if the column omschrijving has one of the items in word_list_11
for (i in 1:nrow(data)){
#1.1 Eten - Lunch
word_list_11 <- c("Albert")
sentence <- data$Omschrijving[i]
is_present_11 <- any(sapply(word_list_11, function(word) grepl(word, sentence, ignore.case = TRUE)))
if(is_present_11){
data$Label[i] <- "Horeca"
}
)
I however would like to include a rule that searches for words in the column “Naam”. So ideally I have two lists. And I can classify something I column “Omschrijving” has a hit in list-1 or the column “Naam” has a hit in list-2.
Any feedback on how I can improve my code to make this happen?
3
I am trying to fully understand your goal. If I understand correctly, you want to put a string as a classification label in a row on the Label
column if any of two conditions is met:
- The
Omschrijving
column contains a word that exists in theword_list_11
list, or - The
Naam
column contains a word that exists in another list, sayword_list_12
.
If that’s correct, I’d suggest small modifications to your code, as follows
Naam <- c("Wheels", "Set", "Tafel")
Omschrijving <- c("Ah", "ALBERT", "Rood")
data <- data.frame(Naam, Omschrijving)
data$Label <- ""
View(data)
for (i in 1:nrow(data)){
#1.1 Eten - Lunch
word_list_11 <- c("Albert")
word_list_12 <- c("Wheels")
sentence1 <- data$Omschrijving[i]
sentence2 <- data$Naam[i]
is_present_11 <- any(sapply(word_list_11, function(word) grepl(word, sentence1, ignore.case = TRUE)))
is_present_12 <- any(sapply(word_list_12, function(word) grepl(word, sentence2, ignore.case = TRUE)))
if(is_present_11 || is_present_12){
data$Label[i] <- "Horeca"
}
}
Here is the resulted data
# Naam Omschrijving Label
#1 Wheels Ah Horeca
#2 Set ALBERT Horeca
#3 Tafel Rood
2