I am using AWS Glue Notebook, to download excel files from a site and save to s3 folder.
Here are steps carried out in one session:
- I use boto3.client(“s3”).list_objects_v2() to get a list of files’ names already saved in “s3 folder”. It lists 10 files there.
- I use pd.read_excel() to read a random file among these 10 files in “s3 folder”.
- I use boto3.client(“s3”).put_object() to put 4 new files to “s3 folder”.
- I use boto3.client(“s3”).list_objects_v2() to get a list of files’ names saved at the moment in “s3 folder”. It lists 14 files there (10 + 4).
- I use pd.read_excel() to read each file among new 4 files saved in “s3 folder”.
And, it throws an error: ‘No such file found: “name of one of the 4 new excel files saved”.’
If I skip step #2, then step #5 does not throw an error.
So, I assume, when I perform a ‘read’ operation on “s3 folder” in step #2 (at the time it had 10 files only), it somehow ‘freezes’ the ‘readable’ excel files list (even though, ‘list’ operation shows 14 files there in step #4).
Could someone please explain, why I cannot read the 4 new files, when I can actually see them there?
MedeaM is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.