I’m using old Flink 1.71 but I think this is not version related. I use Streaming API FinkSink
to write Arvo GenericRecord
to parquet files.
It works but I get each 4 files for 4 records, even though checkpointing is 10s and all records come at the same time.
How do I control the number of records in each file?
FileSink<GenericRecord> sink = FileSink
.forBulkFormat(outputPath, AvroParquetWriters.forGenericRecord(avroSchema))
.withRollingPolicy(
OnCheckpointRollingPolicy
.build()
)
//.withBucketAssigner()
.build();