I have a kinesis streamer with EventBridge as the Producer and Firehose Data as the consumer. Firehose converts the event json in a parquet file, which can be queried in an athena table.
This is done with aws_cdk for Python.
I also want to enable the s3_backup option, to get the raw data. However, the jsons in the backup bucket are not separated by newline.
I have tried:
processing_configuration = aws_kinesisfirehose.CfnDeliveryStream.ProcessingConfigurationProperty(
enabled=True,
processors=[aws_kinesisfirehose.CfnDeliveryStream.ProcessorProperty(
type="AppendDelimiterToRecord",
parameters=[
aws_kinesisfirehose.CfnDeliveryStream.ProcessorParameterProperty(
parameter_name="Delimiter",
parameter_value="\n"),
])]
)
This goes in the processing configuration of the extended_s3_destination config:
extended_s3_destination_config = aws_kinesisfirehose.CfnDeliveryStream.ExtendedS3DestinationConfigurationProperty(
bucket_arn=destination_bucket.bucket_arn,
role_arn=firehose_role.role_arn,
prefix="events/year=!{timestamp:yyyy}/month=!{timestamp:MM}/day=!{timestamp:dd}/",
error_output_prefix="errors/",
buffering_hints=buffering_hints_config,
data_format_conversion_configuration=data_format_conversion_config,
processing_configuration=processing_configuration,
s3_backup_mode="Enabled",
s3_backup_configuration=backup_configuration)
and the backup configuration:
backup_configuration = aws_kinesisfirehose.CfnDeliveryStream.S3DestinationConfigurationProperty(
bucket_arn=backup_raw_bucket.bucket_arn,
buffering_hints=buffering_hints_config,
compression_format="UNCOMPRESSED",
role_arn=firehose_role.role_arn,
prefix="backup/year=!{timestamp:yyyy}/month=!{timestamp:MM}/day=!{timestamp:dd}/",
error_output_prefix="errors/")
is it possible to add a newline character at the end of every record in the backup mode? I have this perfectly in the parquet format, and can queried it with athena, but the “raw” backup is concatenated one next to the other.
Why is the processing configuration not splitting by newline?