I’m working with a dataset that uses long strings of numerics as identifiers. I always make this id
column into class character
because the numerics do not actually represent numbers. I saved and then loaded a dataset using write_csv
and read_csv
and noted that those functions seem to be actually changing the values in the dataframe, which is really troubling. Here’s a reprex:
library(readr)
# example from my actual dataset
dat <- data.frame(id = c("196307010100010157", "196307010100020158", "196307010100030163",
"196307010100040161", "196307010100050162", "196307010200010159",
"196307010200020160", "196307010200030164", "196307010200040165",
"196307010200050173"))
class(dat$id) # definitely characters
write_csv(dat, file="dat.csv")
dat <- read_csv("dat.csv")
class(dat$id) # now it's numeric, which is issue #1 -- why change the class?
sprintf("%.0f",dat$id) # issue #2, and much more serious, is -- the values are different!
I’m pretty alarmed by this because I use these functions all the time. Does anyone know why this behavior arises in this case?