I’m having a strange problem when I read a .csv file using read_csv. I’m afraid I don’t think I can produce a reproducible example because the issue may involve my current R/RStudio session and how they interact while reading a file, requiring the file and a similar set up. I don’t know for sure, but I’m leaning towards some issue with a text encoding mismatch, but I know little about this and so I seek the advanced advice of the stack-overflow hive mind.
In any case, here’s the behavior I have.
The tibble is named `fl’ and contains the FIPS codes for all the administrative districts in the U.S. I’m looking at a ‘Census Area’ in Alaska as the example, but I have other similar cases in the same tibble.
The following produces sensible output.
> fl %>% filter(str_detect(fl$NAME, 'Hoonah'))
# A tibble: 1 × 6
FIPS NAME STATEFPn COUNTYFPn STATEFP COUNTYFP
<chr> <chr> <int> <int> <chr> <chr>
1 02105 Hoonah–Angoon Census Area 2 105 02 105
But, if I do the following, by typing the whole NAME at the console prompt, I get nothing.
> fl %>% filter(NAME=='Hoonah-Angoon Census Area')
# A tibble: 0 × 6
# … with 6 variables: FIPS <chr>, NAME <chr>, STATEFPn <int>, COUNTYFPn > <int>, STATEFP <chr>,
# COUNTYFP <chr>
# ℹ Use `colnames()` to see all variable names
However, if I copy-and-paste from the first output, it works and I get this.
> fl %>% filter(NAME=='Hoonah–Angoon Census Area')
# A tibble: 1 × 6
FIPS NAME STATEFPn COUNTYFPn STATEFP COUNTYFP
<chr> <chr> <int> <int> <chr> <chr>
1 02105 Hoonah–Angoon Census Area 2 105 02 105
I have some suspicion that’s it’s about some sort of character encoding issue/mismatch between my RStudio session and what’s in the file, despite the fact that, to the best of my knowledge the file (as checked by guess_encoding()) and my session (as set in ‘file:save with encoding’ and then using ‘file:reopen with encoding’) both read ‘UTF-8’.
Any ideas about what is happening?