I would like to display some events in ‘real-time’. However, I must fetch the data from another source. I can request the last X minutes, though the source is updated approximately every 5 minutes. This means that there will be a delay between the most recent data retrieved and the point in time that I make the request.
Second, because I will be receiving a batch of data, I don’t want to just fire out all the events down a socket once my fetcher has retrieved it: I would like to spread out the events so that they are both accurately spaced amongst each other and in sync with their original occurrences (e.g. an event is always displayed 6 minutes after it actually happened).
My thought is to fetch the data every 5 minutes from the source, knowing that I won’t get the very latest data. The original data would be then queued to be sent down the socket 7.5 minutes from its original timestamp – that is, at least ~2.5 minutes from when its batch was fetched and at most 7.5 minutes since then.
My question is this: is this the best way to approach the problem? Does this problem have any standard approaches or associated literature related to implementation best-practices and edge cases?
I am a bit worried that the frequency of my fetches and the frequency in which the source is updated will get out of sync, leading to points where no data will be retrieved from the source. However, since my socket delay is greater than my fetch frequency, the subsequent fetch should retrieve newer data before the socket queue is empty.
Is that correct? Am I missing something?
Thanks!
2
This seems similar to something I have been dealing with. In my case: queries that can take minutes before finished (due to complexity) and requests that should get a workable response as soon as possible.
It seems you have a deliberate delay in broadcasting the data. Probably to cater the “real time” situation you have with frequent updates.
We did a “give me what you got” solution where the Server would put the request in a halting state until at least one record was there.
Then we would process the result (one or more records) and poll for the next available set of records that might or might not be streamed from another source to that server.
Problems solved on our side:
1: Slow connections / a lot of time before a connection is actually established
2: Slow response / a lot of time before one or more records are available.
If the time to build a connection is long and the source streamed X amount of records (one to N) we would get whatever is there.
If the time to build a connection is fast, we would get at least one and maybe N records.
In your case, you might like to build an internal timer and an internal Data Store where you can query and then clean the Data Store on “all records older than X minutes”.
You fill the Store with whatever your Data Source provides you and then retrieve and clear whatever is complient to your needs.
All objects / records “too fresh” will be ignored and you simply can run this process without hard dependencies on retrieval / broadcasting side.
Simply get what you can get and broadcast anything that applies to your filter (in two separate processes)
Since your records are time-stamped and already provide hard data on their point of creation, you can keep this quite simple.