So I need some advice on how to do this. I have two streams:
stream_1
- node_id
- price
- timestamp
stream_2
- node_id
- value
- timestamp
these two streams are independent of each other, only the node_id
is shared. I would like to join the streams together by node_id
, then for each interval (that can be weeks or longer) I would like to calculate price
x value
x delta_t
(where the delta_t
is the amount of seconds between each the start of the interval and the end of the interval).
What is the best way to achieve this?
I tried Kafka Streams using JoinWindows.ofTimeDifferenceAndGrace
but this needs a duration, but the corresponding message might be very late.
I also tried KSQLDB using a stream-stream join (using within
), but then it calculates all permutations between two timestamp, but I only need to calculate it once per interval.
I also tried to use a stream-table join with LATEST_BY_OFFSET
of the stream2 in order to calculate it based on the latest available price per node_id, but this seems to add delays I think.
Here’s a visualization:
stream1: ---x-----x-------x--- --> 3 messages
stream2: ------x-----x-------x --> 3 messages
result0: ---I--I--I--I----I--I --> 5 intervals
Do I need to use Apache Flink to use count-based windows (using 2 as the limit)?
Any pointers appreciated! Thanks.