I have AWS Glue shell script (python) which is copying data from S3 into dataframe, transform it and then push it to RedShift. This is how i am writing to redshift
<code>glueContext.write_dynamic_frame.from_jdbc_conf(
frame = DynamicFrame.fromDF(df_subset, glueContext, "zendesk_tickets_data"),
catalog_connection = "redshift_connection",
connection_options = {"dbtable": "public.zendesk_tickets_data", "database": "<db_name>"},
redshift_tmp_dir = "s3://<s3_bucket_for_glue>/<etl_folder_name>/"
)
</code>
<code>glueContext.write_dynamic_frame.from_jdbc_conf(
frame = DynamicFrame.fromDF(df_subset, glueContext, "zendesk_tickets_data"),
catalog_connection = "redshift_connection",
connection_options = {"dbtable": "public.zendesk_tickets_data", "database": "<db_name>"},
redshift_tmp_dir = "s3://<s3_bucket_for_glue>/<etl_folder_name>/"
)
</code>
glueContext.write_dynamic_frame.from_jdbc_conf(
frame = DynamicFrame.fromDF(df_subset, glueContext, "zendesk_tickets_data"),
catalog_connection = "redshift_connection",
connection_options = {"dbtable": "public.zendesk_tickets_data", "database": "<db_name>"},
redshift_tmp_dir = "s3://<s3_bucket_for_glue>/<etl_folder_name>/"
)
When i look at the logs i see
<code>24/08/07 23:43:30 INFO RedshiftWriter: Loading new Redshift data to: "public"."zendesk_tickets_data"
24/08/07 23:43:30 INFO RedshiftWriter: Creating table within Redshift: "public"."zendesk_tickets_data"
</code>
<code>24/08/07 23:43:30 INFO RedshiftWriter: Loading new Redshift data to: "public"."zendesk_tickets_data"
24/08/07 23:43:30 INFO RedshiftWriter: Creating table within Redshift: "public"."zendesk_tickets_data"
</code>
24/08/07 23:43:30 INFO RedshiftWriter: Loading new Redshift data to: "public"."zendesk_tickets_data"
24/08/07 23:43:30 INFO RedshiftWriter: Creating table within Redshift: "public"."zendesk_tickets_data"
But this table already exist. After this try, the glue job fails and the error in STL_LOAD_ERRORS
says that String length exceeds DDL length...
little big more context
- The data in S3 has been exported via AppFlow in JSON Format.
- I have verify
redshift_connection
from Glue UI and it tests out okay. - The crawler has created respective tables in the Glue Database.