Issue:
While scrapping a dynamic website, I wanted to download to my google cloud bucket instead of local directory, as I need to move it to GCS bucket later on to process further, to avoid the time lapse in transferring the files, when I searched how to download file directly to GCS bucket (while scrapping through selenium), I got that – Selenium doesn’t support direct downloads to a cloud storage path.
Example:
Imagine you are running a web scraper to download a large number of files from a dynamic website. These files need to be processed further using cloud-based tools and services hosted on Google Cloud Platform (GCP).
Currently, the workflow involves downloading files to a local directory and then uploading them to a Google Cloud Storage (GCS) bucket. This two-step process introduces delays and increases complexity, especially if the local storage is limited or if the files are large.
With the proposed feature of direct downloads to a GCS bucket, you can streamline this workflow:
-
Set up your web scraper using Selenium.
-
Configure the download destination to point directly to your GCS bucket.
-
Run your scraper. Files will be downloaded directly to the GCS bucket, eliminating the need for local storage.
-
Process the files immediately using cloud-based tools and services available on GCP.
This would save time, reduce the need for intermediate local storage, and simplify the overall data processing pipeline. It would be particularly beneficial for scenarios involving large datasets, limited local storage, or high-frequency scraping tasks where quick processing is essential.
Expectation:
Files should be downloaded directly to GCS bucket instead of local directory.