I have a JSON file with links to around 320k pdf files. I need to acquire all of these files. Through my work we have a large amount of storage in NextCloud as well as access to Google Colab pro+.
I have mounted Google Colab to NextCloud using this tutorial: https://rizkyrajitha.hashnode.dev/connect-nextcloud-with-google-colab
I run the code through Google Colab to spare my shitty Mac. I loop through the JSON file, download the files and store them in a NextCloud folder, with no problem. It goes quickly for the first 15k, or so, files but then it slows down and eventyally almost stops completely. It is a large amount of data, so running it locally
Can anyone think of a better solution to retrive and store this large amount of data in a fairly convinent way, or optimize my current idea?