My end goal is to trigger a Stepfunction flow where the first step is Map
, and its input request is larger then 256KB.
My current Stepfunction flow image
{
"file_secrets": [
{
"sector": "composition",
"id": "i232kj34nk2dajv7",
"category": "Continuation of the composition",
},
{
"sector": "tech",
"id": "i2823kj42k3v3hdjv7",
"category": "Continuation of the composition",
},
{
"sector": "finance",
"id": "i28d9asd93dhv3hdjv7",
"category": "Continuation of the composition",
},
....
.....
......
........
],
"file_id": "23472837fd2837f8fgd28",
"admin": true
}
Here is how my flow looks, and how the input request json file structured.
- I want so the initial
Map
state will receive an input to iterate over larger then 256KB (the payload limitation). - I want so the
Map
state will iterate over each element of the arrayfile_secrets
and run its inheriting/next states for each of those elements, in parallel.
What ive understood from the official stepfunction/lambda payload limitation workaround from AWS documentation is to use an S3 with a distributed Map
state, where the flow it basically as following:
- Instead of triggering the stepfunction with the large payload, you first push your payload to S3 bucket.
- You then trigger the stepfunction with an input request of bucket_name, key_name path of the file.
- the
Map
state retrive the file from the S3 bucket and configured to iterate over a specific field - The Map state then pass to the inherited/next states as input only a single element of
Map
state S3 input. again and again
{
"s3": {
"bucket": "my-test-state-bucket",
"key": "big-input.json"
}
}
But…, it didnt worked.
I encountered an issue when trying to access the file_secrets array from the JSON file retrieved from S3. Specifically, I attempted to configure the Map
state’s InputPath using various paths such as $.file_secrets, file_secrets[*], and $, but I consistently received errors like Attempting to map over non-iterable node
and Invalid path '$.file_secrets': No results for path: $['file_secrets']
.
To troubleshoot this, I explored the AWS documentation further and discovered the ItemsPath field, which is designed to specify the JSON key field that contains the array to iterate over. However, there’s an important limitation: according to the documentation, ItemsPath can only be used in a Distributed Map
state if the JSON input is passed from a previous state in the workflow.
This presents a significant challenge:
- When using an S3 JSON file as input directly, it seems that the InputPath field does not support selecting and iterating over a nested array like file_secrets.
- If I wanted to use ItemsPath to target the nested array, I would need to pass the JSON data from a previous state in the workflow. However, doing so is subject to the 256KB payload limitation, which defeats the purpose of using the S3 workaround in the first place.
It works now, But…. | Is this really all AWS is?
My unwanted limitation workaround was to save into S3 json file of only the array it self. raw.
[
{
"sector": "composition",
"id": "i232kj34nk2dajv7",
"category": "Continuation of the composition",
},
{
"sector": "tech",
"id": "i2823kj42k3v3hdjv7",
"category": "Continuation of the composition",
},
{
"sector": "finance",
"id": "i28d9asd93dhv3hdjv7",
"category": "Continuation of the composition",
},
.
.
.
]
Because of it /and for it i had to change the stepfunction input request and manipulate the extra fields from the original input file_id
admin
keys to each of the inherited/next states input.
I did it because i saw it was possible with the ResultPath
field.
Here is my current input request:
{
"s3": {
"bucket": "my-test-state-bucket",
"key": "big-input.json"
},
"file_id": "23472837fd2837f8fgd28",
"admin": true
}
Which then gets manipulated and passed into the inherited/next states. here is an example of single input of the first execution of the Mapped next states:
{
"file_id": "23472837fd2837f8fgd28",
"admin": true
"Payload" :
"Body" : [
{
{
"sector": "composition",
"id": "i232kj34nk2dajv7",
"category": "Continuation of the composition",
}
]
}
So now when its working,
i stil dont know how i feel about it.
Is this really all AWS direct integration got? or is there anything i dont see or know here?
Would love a second opinion