I have a dataset with over 6000 observations and over 300 different variables. Some of these variables are character-variables with strings. Some of strings contain swearwords which I want to replace with the expression “[incivility]” or misspellings.
Here is an example data-frame:
df <- data.frame(
id = 1:6,
date = seq.Date(as.Date("2024-12-01"), as.Date("2024-12-06"), "day"),
group = rep(LETTERS[1:2], 3),
tex1 = c("First strings", "second strings", "third strings", "fourth string",
"fift strings", "sixth strigs"),
text2 = c("first example", "second example", "third example", "forth example",
"fift example", "sixth example")
)
And here is a dataframe containing the original strings and the replacement strings:
df_replace <- data.frame(
original = c("first", "second", "example", "fift"),
replacement = c("[incivility]", "[incivility]", "[incivility]", "fifth")
)
I tried to solve this problem using the following code:
df1 <- data.frame(lapply(df, function(x) {
gsub(df_replace$original, df_replace$replacement, x)
}))
As you can see, it did not work. Is there an easy way to do that using dplyr
? To be honest I could not figure out how to use functions like mutate
or across
to solve this problem.