The below code was successful for uploading CSVs and an XLSX file to Google Cloud Storage – as expected since it is from Upload objects from a file system
However a CSV around 2,000KB returned the error message:
('Connection aborted.', TimeoutError('The write operation timed out'))
There are other functions that have a timeout option but it is not the case here, I’ve considered amending the bucket method based on reading list below.
from google.cloud.storage import Client, transfer_manager
def upload_many_blobs_with_transfer_manager(input_logger,
bucket_name, filenames, source_directory="", workers=8):
"""Upload every file in a list to a bucket, concurrently in a process pool.
Each blob name is derived from the filename, not including the
`source_directory` parameter. For complete control of the blob name for each
file (and other aspects of individual blob metadata), use
transfer_manager.upload_many() instead.
"""
storage_client = Client()
bucket = storage_client.bucket(bucket_name)
results = transfer_manager.upload_many_from_filenames(
bucket, filenames, source_directory=source_directory, max_workers=workers
)
for name, result in zip(filenames, results):
# The results list is either `None` or an exception for each filename in
# the input list, in order.
if isinstance(result, Exception):
input_logger.info("Failed to upload {} due to exception: {}".format(name, result))
else:
input_logger.info("Uploaded {} to {}.".format(name, bucket_name))
All help appreciated.
NateBI
I have read:
-
Why does upload_from_file Google Cloud Storage Function Throws timeout error?
-
https://googleapis.dev/python/storage/latest/retry_timeout.html
-
Why does upload_from_file Google Cloud Storage Function Throws timeout error? and I’m considering changing the
bucket = storage_client.bucket(bucket_name)
tobucket = client.get_bucket(BUCKET_NAME, timeout=300.0) # five minutes
based on this but asking here first.
2
I’m considering changing the
bucket = storage_client.bucket(bucket_name)
tobucket = client.get_bucket(BUCKET_NAME, timeout=300.0)
# five minutes
Yes, you are correct. Based on the Stackoverflow Link you shared, the ultimate solution should be to define a timeout when creating the bucket client.
Also as suggested in the Documentation
You can pass a single integer or float which functions as the timeout for the entire request. >E.g.:
bucket = client.get_bucket(BUCKET_NAME, timeout=300.0) # five minutes
You can also try configuring the retry mechanism as suggested in the Documentation.
Also you need to have a look at your network speed based on the file size. You can check with upload time calculator.
If you have bad internet connection you can also play around with the chunk size of the uploads (although it is not recommended)
Have a look at Github link for reference.
1