Considering a pipeline Apache Beam with 2 parallel transformations
# Transformation 1
p | read_from_pubsub_subscription_1 | save_current_state | write_to_pubsub
# Transformation 2
p | read_from_pubsub_subscription_2 | enrich_output_with_last_STATE | write_to_pubsub
Transformation 1 just pull messages from a queue and save last one as current state. This state should be a sort of a cache with key/values. The last state is just one per type (around 1000 in total).
Transformation 2 groups messages by their type, and look if the shared state contains that type and enrich the message with the latest state.
“Tranformation 2” should group messages by their type and enrich them in a time window considering the state (The current state written by Transformation 1).
Is it possible to consider a key/value map that is shared between those flows of execution?
Is there a pattern for this part or examples?
How would I astract this?
Is Apache Beam the correct tool for pulling messages from pubsub, share a state and push the result in another queue?