I have been writing CloudFormation Stack using yaml and deploying it to AWS Infrastructure ( For legacy reasons, I can not switch to CDK unfortunately ;))
Following yaml code is a part of the cloudformation stack. The yaml code is creating a Glue job. it loads etl script from S3 bucket (name transform_json_to_parquet.py) as a part of the Cloudformation stack (see line ScriptLocation below).
A major limitation of approach is
It expects that transform_json_to_parquet.py script should be present in S3-bucket-name-1. Therefore, I have to manually upload transform_json_to_parquet.py file to S3-bucket-name-1.
I am just wondering is there any way that allow me to load transform_json_to_parquet.py file when I deploy cloudformation stack to AWS
TransformJsonDataJob:
Type: "AWS::Glue::Job"
Properties:
Role: !Ref AWSGlueETLJobRole
Name: "TransformJsonToParquet"
Description: "Trasform JSON to Parquet"
Timeout: 5
WorkerType: G.1X
NumberOfWorkers: 2
MaxRetries: 0
Command:
"Name": "glueetl"
"ScriptLocation" : !Sub s3://<S3-bucket-name-1>/transform_json_to_parquet.py
DefaultArguments:
"--s3_json_path" : !Sub s3://<S3-bucket-name-2>/
"--s3_parquet_path" : !Sub s3://<S3-bucket-name-3>/