I’m basically brand new with R. I’m trying to write a more streamlined version of the code that I currently have, which is just repeating a bunch of str_replace_all functions. I feel like there has to be a better way of doing this.
I’m cleaning up addresses and trying to match a number of typos and replace them with the proper state name.
A clunky way I’ve gotten this to work is the following:
CA_messy <- c("CAtypo1|CAtypo2|CAtypo3|CAtypo4")
LA_messy <- c("LAtypo1|LAtypo2")
alldata_cleaned$StateClean <- str_replace_all(alldata$State, CA_messy, "CALIFORNIA")
alldata_cleaned$StateClean <- str_replace_all(alldata$StateClean, LA_messy, "LOUISIANA")
alldata_cleaned$StateClean <- str_replace_all(alldata$StateClean, "NEVEDA","NEVADA")
But I don’t love that this is rewriting things over and over. str_replace_all will also do parts of words (e.g. if I want to replace “CA” with “CALIFORNIA” it will also replace CA in the full name, giving “CALIFORNIALIFORNIA”.) I was also attempting case_when, but couldn’t get it to work:
alldata_cleaned2 <- alldata_cleaned %>%
mutate(StateClean = case_when(
State == CA_messy ~ "CALIFORNIA",
State == LA_messy ~ "LOUISIANA",
State == "NEVEDA" ~ "NEVADA",
TRUE ~ State
))
Only the last replacement for NEVEDA is working. Is there a way to do case_when replacement with feeding the vector into the argument? I’ve been fiddling with the syntax but can’t seem to get it to work. I’ve seen lots of similar questions on SO and have been trying to implement them but with no success. Thank you!
aly is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.