In spark streaming (using Java), I am trying to print the start offset and and last offset of a specific batch taken from Kafka along with the batch related metrics (like processingStartTime
, processingEndTime
etc.).
Example log below
Batch Id: <someBatchId>, Beginning Offset: <someValue>, Last
Offset: <someValue>, Start Time: <processingStartTime>, End Time:
<processingEndTime>
My plan is use to use the StreamingListener
and print the offsets in onBatchCompleted
but later realized that this has BatchInfo
which has nothing to do with the Kafka offsets.
If there was a batch id or any other unique identifier for the batch, then having different listeners (one which prints the BatchInfo
and the other that prints the offsets
but both the listeners would print the same batch id) would also do the trick for me. However, I can’t seem to find any such connecting id anywhere.
I have put the MCVE code here – https://github.com/ravitechy/spark-kafka-tutorial/. The initializer class is https://github.com/ravitechy/spark-kafka-tutorial/blob/main/src/main/java/org/ravi/starter/StartKafkaStreaming.java
To push events to Kafka topic, I am using the kafka-console-producer.sh
bundled with the Kafka installation. For now, I have only partition in the kafka topic.