I was working on an AWS Lambda function in Python that simply converts a JSON file to Parquet. I set a 5-minute timeout limit and invoked the function. Although the Parquet file was created successfully, the function still timed out. When I debugged it, everything seemed fine until the to_parquet line, where the program got stuck. It runs perfectly on my local environment. Does anyone have any suggestions on how to resolve this issue?
Python function
import awswrangler as wr
import pandas as pd
import urllib.parse
import os
# Temporary hard-coded AWS Settings; i.e. to be set as OS variable in Lambda
os_input_s3_cleansed_layer = os.environ['s3_cleansed_layer']
os_input_glue_catalog_db_name = os.environ['glue_catalog_db_name']
os_input_glue_catalog_table_name = os.environ[ 'glue_catalog_table_name']
os_input_write_data_operation = os.environ['write_data_operation']
def lambda_handler (event, context):
print('## EVENT')
print(event)
# Get the object from the event and show its content type
bucket = event['Records'][0]['s3']['bucket']['name']
key = urllib.parse.unquote_plus(event['Records'][0]['s3']['object']['key'], encoding='utf-8')
try:
# Creating DF from content
df_raw = wr.s3.read_json ('s3://{}/{}'.format(bucket, key))
print('## DF RAW')
print(df_raw.shape)
# Extract required columns:
df_step_1 = pd.json_normalize(df_raw['items'])
print('## DF CLEANED')
print(df_step_1.shape)
# Write to S3
wr_response = wr.s3.to_parquet(
df=df_step_1,
path=os_input_s3_cleansed_layer,
dataset=True,
database=os_input_glue_catalog_db_name,
table=os_input_glue_catalog_table_name,
mode=os_input_write_data_operation
)
print('## RESPONSE')
return wr_response
except Exception as e:
print(e)
print('Error getting object {} from bucket {}.Make sure they exist and your bucket is in the same region as this function.'.format (key, bucket))
raise e
AWS Lambda Output
Test Event Name
s3-put
Response
{
"errorMessage": "2024-07-31T07:11:36.885Z 7a24ab3f-13c8-4284-980f-6b705d64273a Task timed out after 307.11 seconds"
}
Function Logs
START RequestId: 7a24ab3f-13c8-4284-980f-6b705d64273a Version: $LATEST
## EVENT
{'Records': [{'eventVersion': '2.0', 'eventSource': 'aws:s3', 'awsRegion': 'us-east-1', 'eventTime': '1970-01-01T00:00:00.000Z', 'eventName': 'ObjectCreated:Put', 'userIdentity': {'principalId': 'EXAMPLE'}, 'requestParameters': {'sourceIPAddress': '127.0.0.1'}, 'responseElements': {'x-amz-request-id': 'EXAMPLE123456789', 'x-amz-id-2': 'EXAMPLE123/5678abcdefghijklambdaisawesome/mnopqrstuvwxyzABCDEFGH'}, 's3': {'s3SchemaVersion': '1.0', 'configurationId': 'testConfigRule', 'bucket': {'name': 'youtubepipeline-raw-useast1-dev', 'ownerIdentity': {'principalId': 'EXAMPLE'}, 'arn': 'arn:aws:s3:::youtubepipeline-raw-useast1-dev'}, 'object': {'key': 'youtube/raw_statistics_reference_data/CA_category_id.json', 'size': 1024, 'eTag': '0123456789abcdef0123456789abcdef', 'sequencer': '0A1B2C3D4E5F678901'}}}]}
## DF RAW
(31, 3)
## DF CLEANED
(31, 6)
2024-07-31T07:11:36.885Z 7a24ab3f-13c8-4284-980f-6b705d64273a Task timed out after 307.11 seconds
END RequestId: 7a24ab3f-13c8-4284-980f-6b705d64273a
REPORT RequestId: 7a24ab3f-13c8-4284-980f-6b705d64273a Duration: 307106.33 ms Billed Duration: 300000 ms Memory Size: 128 MB Max Memory Used: 128 MB Init Duration: 4043.97 ms
Request ID
7a24ab3f-13c8-4284-980f-6b705d64273a
I’m seeking advice on resolving this issue.
Little Blue is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.