How can I handle timeouts in a network application.
I’m implementing a provisioning system on a Linux server, the code is huge so I’m going to put the algorithm, it works as like this
- Read provisioning commands from file
- Send it to another server using TCP
- Save the request in hash.
- Receive the response then
- if successful response received then remove request from hash
- if failed response received then retry the message
The problem I’m in now is when the program didn’t receive the response for a timeout reason then the request will be waiting for a response forever and won’t be retried.
And please note that I’ll be sending hundreds of commands and I have to monitor the timeout commands for all of them.
I tried to use timer but that didn’t help because I’ll end up with so many waiting timers and I’m not sure if this is a good way of doing this.
The question is how can I save the message in some data structure and check to remove or retry it later when there is no response from the other end?
Please note that I’m willing to change the algorithm to anything you suggest that could deal with the timeouts.
4
I’ll end up with so many waiting timers and I’m not sure if this is a good way of doing this.
Lots of timers is generally not a problem. Internally, your OS is “doing the right thing” (just setting one timer to the next event, and keeping the rest on a sorted list.)
If you want to reduce the number of timers, round your timeouts to the nearest second (or N seconds) and re-use the timer’s you’ve already set. But this requires a complex data structure.
I’m using hash, sweeping structure here doesn’t work
Huh? Your hash should allow you to enumerate the keys. Then you look up each key in the hash, looking for things that haven’t been retried in a while. (Informally called “sweeping”) If you only have “hundreds” of events, sweeping the list won’t be a performance problem.
(In fact, if we’re only talking “hundreds”, I’d skip all the timer stuff entirely. Just set up one timer to fire every 5 seconds. Sweep the structure and do any retries needed. It’s going to be less code, and not be a performance problem.)
Alternately, keep a sorted list of your events (in addition to the hash). The list can contain just the key to the hash, or pointers to the hash. Then when a timer hits, pop off any events that need to be re-sent. Stop popping when the event isn’t due yet, and set another timer for that time-out.
Be sure to worry about how you will gracefully shut-down the service.