Flink: Performance Difference between Table- and DataStream-API
Let us assume that I have two operations which I can write easily using both APIs in PyFlink (e.g. a sum
of a column over a TumblingWindow
). Are there any performance differences when I use the predefined Table-API commands vs manually implementing the count in Python as a ProcessWindowFunction
?
How to build a JsonRowSerializationSchema in pyflink?
I’m new to Flink and Kafka, and I’m trying to create a Kafka sink to send a string in JSON format to Kafka. To achieve this, I need to build a JsonRowSerializationSchema. Here’s what I’ve tried so far:
java.lang.ClassCastException: class [B cannot be cast to class org.apache.flink.types.Row
I use apache pyflink 1.18.1. The input data type from apache flink kafka source is like below,