I have question about Apache Flink’s data processing.
I use Flink SQL for merging heterogeneous sensor data by leveraging LISTAGG function.
I got into trouble when I increase the parallelism of the SQL operator.
I encontered a out-of-order changelog problem. Unexpected data are generated because of intermediate states.
If the data was a numeric type, it could be ignored.
But since it is a JSON string type, the number of keys in json changes, making it difficult to use.
e.g. I expected three member keys in json, but data shows only two member because of -U records.
Is this an inevitable part of stream processing?
Or, Could you suggest the solution to filter these unexpected results?
Or, Should I keep the parallelism to 1 to meet serializable?
SQL QUERY
CREATE TABLE issue-table(
...
) WITH (
'connector' = 'upsert-kafka',
'key.format' = 'json',
'value.format' = 'json'
...
);
INSERT INTO issue-table
SELECT
uuid(),
MAX(time) as time,
'{' || **LISTAGG**(CONCAT('"' || id || '"', ':', data)) || '}' as data,
CASE WHEN ...
THEN ...
WHEN ...
ELSE ...
END FROM (
.....
)WHERE row = 1 GROUP BY id, count;
Thanks for your advice
I tried to implement the filtering function, but I cannot find the common features.
-U records are appeared on unexpected time.
If I keep a parallelism to 1, it limits the scalability.
wonki choi is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.