I’m facing a problem in Chapter 3 of the Data engineering with AWS by Gareth Eagar book. This is regarding the lambda function to convert CSV to parquet.
It wrote the parquet but:
- Table is not created in the Glue Catalogue. Also, at which part of code are we adding a “table” in the catalogue? We are doing it for database in the line:
if db_name not in current_databases.values:
print(f'- Database {db_name} does not exist ... creating')
wr.catalog.create_database(db_name)
else:
print(f'- Database {db_name} already exists')
- For 1/6 CSV file uploads, Lambda function did not trigger.
- Even after 2 minutes of timeout, the “results” are not getting printed. I think it gets stuck at writing the parquet:
result = wr.s3.to_parquet(
df=input_df,
path=output_path,
dataset=True,
database=db_name,
table=table_name,
mode="append")
print("RESULT: ") # THIS IS NOT GETTING PRINTED.
print(f'{result}') # NOR THIS.
2024-05-01T16:41:02.026Z 7bf8244b-ff47-49e6-b296-09c65da7b43c Task timed out after 122.07 seconds
- The logs say that “task timed out”. Where can I see the logs for this particular task?
I’m using AWS Wrangler 3.73 with Python 3.9 in the Lambda layer.
The full code is available in GitHub.
Thanks in advance.
Recognized by AWS
New contributor
omt is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.