I’m working with a large data set in R in which the * character is used to denote cells with missing values. I am trying to replace cells that have this * with NA. To do this, I am trying to iterate over every row (per column) using the following code
for (i in 1:nrow(mydata)){
if (i == "*"){
mydata[i,] <- NA
}
}
The code runs but the data frame remains unchanged. Can someone help me understand why it doesn’t work and help with different ways to get the intended result?
S1atty is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
You can just do this
mydata[mydata == '*'] <- NA
Generally bad form in R to loop over a data frame since it is computationally inefficient, it should only be done if a vectorized alternative is not available. Most operations on a data frame can be done without looping.
Also, your code doesn’t work because you’re checking whether the iterator i
equals *
, not if the value equals star. Your code is checking
1 == *
2 == *
3 == *
etc.
Which of course will each return FALSE
, so no changes are made. To loop you’d need to loop over both rows and columns checking the value mydata[i, j] == '*'
where i
is your row index and j
is your column index.
It looks a bit inefficient; consider using dplyr::na_if()
instead. The following code should work.
library(dplyr, warn.conflicts=FALSE)
mydata <- mydata %>%
mutate(across(everything(), na_if, y = "*"))
1
Go with what user @Mako212 says here. It’s highly recommended and the R-like way to choose.
This is just to demonstrate how you would need to modify your for-loop logic:
mydata = data.frame(
A = c("a", "b", "*"),
B = c("*", "b", "c"),
C = c("*", "b", "c"))
for(i in seq(nrow(mydata)))
for(j in seq(ncol(mydata)))
if(mydata[i, j] == "*")
mydata[i, j] = NA
mydata
#> A B C
#> 1 a <NA> <NA>
#> 2 b b b
#> 3 <NA> c c
2