With Apache Beam I am trying to create one window which slides over an bounded dataset. The dataset contains, together with other fields like Country, a timestamp field and is sorted on this timestamp.
The goal is to determine, for every event in the dataset, the amount of events to the same country in the last hour.
My current approach would be to have a some dictionary which is updated when an event enters the window, e.g. my_dict["usa"] += 1
. And when this particular event leaves the window the dictionary is also updated. How would one do this in Apache Beam?
I tried using the different Windows in Apache Beam. But none of them seem to fit to this task. Mainly due to the fact that the windows group based on specified timestamps instead of event timestamps.
stacksper is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.