I’m saving a large data.table object as a compressed file using data.table::fwrite(data, 'data.csv.gz')
. However, when I try to extract this file using Ubuntu file manager, I get an “empty archive error”. Extracting in the command line works (gunzip data.csv.gz
).
Listing the gzip archive contents gives me a bogus compress ratio result (reprex bellow). Is this a bug on data.table::fwrite()
?
library(data.table)
dt <- data.table(a = 1:100000,
b = runif(100000))
fwrite(dt, 'teste.csv.gz')
# Malformed Gzip file (see ratio)
# Ubuntu file manager can't extract it.
res <- system('gunzip -l teste.csv.gz', intern = TRUE)
res
#> [1] " compressed uncompressed ratio uncompressed_name"
#> [2] " 1055898 85906 -1129.1% teste.csv"
# but I can extract the file if I use the command line
system('gunzip teste.csv.gz')
# and if I compress the file through the CLI it works (see ratio)
# and now Ubuntu file manager can handle the archive
system('gzip teste.csv')
res <- system('gunzip -l teste.csv.gz', intern = TRUE)
res
#> [1] " compressed uncompressed ratio uncompressed_name"
#> [2] " 1054997 2388946 55.8% teste.csv"
sessionInfo()
#> R version 4.4.0 (2024-04-24)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 20.04.6 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3; LAPACK version 3.9.0
#>
#> locale:
#> [1] LC_CTYPE=pt_BR.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=C
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: America/Sao_Paulo
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] data.table_1.15.4
#>
#> loaded via a namespace (and not attached):
#> [1] styler_1.10.3 digest_0.6.35 fastmap_1.2.0 xfun_0.44
#> [5] magrittr_2.0.3 glue_1.7.0 R.utils_2.12.3 knitr_1.47
#> [9] htmltools_0.5.8.1 rmarkdown_2.27 lifecycle_1.0.4 cli_3.6.2
#> [13] R.methodsS3_1.8.2 vctrs_0.6.5 reprex_2.1.0 withr_3.0.0
#> [17] compiler_4.4.0 R.oo_1.26.0 R.cache_0.16.0 purrr_1.0.2
#> [21] rstudioapi_0.16.0 tools_4.4.0 evaluate_0.23 yaml_2.3.8
#> [25] rlang_1.1.3 fs_1.6.4
Created on 2024-05-29 with reprex v2.1.0