I’m trying to take some data from a few sources do some transformations on it and load it into Kinesis using AWS glue and scala. The data is coming from static sources like tables and s3 buckets so it’s not a streaming ETL job. Currently I’m using a dynamicFrame and trying to take my data sink and simply do a writeDynamicFrmae like so
// some logic to set up a source and do some transformations ending up with a Dynamic frame called myDynamicFrame
val kinesis = glueContext.getSinkWithFormat(
conectionType = "aws-kinesis",
options = JsonOptions(
Map(
"streamArn" -> "arn:aws:kinesis:xxxxxxxxxxx/sink-stream",
"startingPosition" -> "TRIM_HORIZON"
"inferSchema" -> "true"
)
)
)
kinesis.writeDynamicFrame(myDynamicFrame)
My thought would be that this would take the data from the dynamic frame and push it into kinesis however I instead get this error.
Exception in User Class: java.lang.IllegalStateException : org.apache.spark.sql.connector.kinesis.KinesisV2TableProvider does not allow create table as select.
I’m using glue version 4 and in the documentation it says you can specify kinesis https://docs.aws.amazon.com/glue/latest/dg/glue-etl-scala-apis-glue-gluecontext.html#glue-etl-scala-apis-glue-gluecontext-defs-getSinkWithFormat
There are some other documentation that talks about creating a writer from a data frame and using forEachBatch methods but these look like there referring to jobs where kinesis is the source and it’s a streaming etl job which I wouldn’t think this is since we’re getting the data in batches from s3 and if you attempt to do a write stream on them it throws an error as it wants a streaming data frame.
also if it helps its scala version 2.12.19 spark 3.3 and glue v4