I am trying to write a filter pipe using dplyr the way I always do, but its returning an empty dataframe. If I change the name of the variable I am filtering one, it works. I can’t find anything special about the variable name I was using.
Here is some code to illustrate the problem:
#read in the data
x <- read.csv("output_data/final_data.csv")
#x has a variable in it called proj.id, and I want to filter for matches with proj.ids in the variable multiphase
#Confirm that some of the values in x$proj.id are in multiphase
> any(x$proj.id %in% multiphase)
[1] TRUE
#Try and use filter as I normally would - yields and empty dataframe
> x %>% filter(proj.id %in% multiphase) %>% dim()
[1] 0 13
#Try and do it without dplyr - yields a dataframe with 2 rows (as expected)
> x[x$proj.id %in% multiphase,] %>% dim()
[1] 2 13
#Checking there isn't some datatype issue, nothing shows up:
> class(multiphase) == class(x$proj.id)
[1] TRUE
Notably other uses of filter on the same data set work fine:
#I filter on another vector ("delayed") and get a dataframe with 13 rows as i expect
> x %>% filter(proj.id %in% delayed) %>% dim()
[1] 13 13
I try look for differences, but there are none
> class(delayed) == class(multiphase)
[1] TRUE
I try renaming multiphase (by invoking itself) and things work??!!
> multiphase_1 <- multiphase
> x %>% filter(proj.id %in% multiphase_1) %>% dim()
[1] 2 13
When i print(multiphase)
i get what i expect
?multiphase
yields nothing.
So i can fix this by renaming multiphase to something else, but why doesn’t “multiphase” work as a variable name?