I’m working with a directory that has close to 90,000 files inside 200 folders. With such a large number of files both list.files() and the list.dirs() functions are incredibly slow. The thing is I’m not even interested in the files just extracting the names of the folders and their directories.
for instance instead of returning
#images in 'folder1'
'/Users/name/Documents/project/folder1/Img_001.jpg'
'/Users/name/Documents/project/folder1/Img_002.jpg'
#images in 'folder2'
'/Users/name/Documents/project/folder2/Img_001.jpg'
'/Users/name/Documents/project/folder2/Img_002.jpg'
I just a want a list upto the individual folders as such:
'/Users/name/Documents/project/folder1/'
'/Users/name/Documents/project/folder2/'
Is there a way to do this upfront to cut down on time?
5
This will list the full path of the jpg files in folders directly under project and then take just the directory part and then find the unique ones:
Sys.glob("/Users/name/Documents/project/*/*.jpg") |> dirname() |> unique()
This will list directories (not files) immediately under project regardless of what they contain or do not contain. The -1 is to omit project itself.
list.dirs("/Users/Louis/Downloads/project/")[-1]
fs::dir_ls()
comes with a type
arg and at least on Windows it’s ~6x faster than list.dirs()
when testing on a conda cache directory:
p_ = "D:/pkg_cache/micromamba/"
tictoc::tic("fs::dir_ls")
fs_dir_ls <- fs::dir_ls(path = p_, type= "directory", recurse = TRUE, all = TRUE)
tictoc::toc()
#> fs::dir_ls: 0.36 sec elapsed
str(fs_dir_ls)
#> 'fs_path' Named chr [1:7234] "D:/pkg_cache/micromamba/.micromamba" ...
#> - attr(*, "names")= chr [1:7234] "D:/pkg_cache/micromamba/.micromamba" "D:/pkg_cache/micromamba/.micromamba/condabin" "D:/pkg_cache/micromamba/.micromamba/envs" "D:/pkg_cache/micromamba/.micromamba/envs/d2147d81" ...
tictoc::tic("list.dirs")
lst_dirs <- list.dirs(path = p_)
tictoc::toc()
#> list.dirs: 2.33 sec elapsed
str(lst_dirs)
#> chr [1:7235] "D:/pkg_cache/micromamba/" ...
Difference is result lengths comes from the first element, list.dirs()
includes starting path while fs::dir_ls()
does not.