I’m currently writing a email synchronizer application that synchronizes email to a sql server database.
One direction is not a problem: Fetching folders from the server (with its messages) and writing them into my database.
What I’m missing, though, is the “reverse” ability: How can I detect which folders or messages have been deleted on the server (e.g. by another application than mine), but are still present in my database?
How can I create such a “two way” synchronization? I’ve thought of iterating through my current dataset and trying to find those messages and folders on the server. If they do exist in my database, but not on the server, I’d need to delete these items in my database… But that doesn’t appear to be the right way to me.
What is the common approach to do this? This seems to be a rather language-irrelevant task.
3
You are confused. Synchronizing two different instances of similar data requires knowing which version is correct if there is a discrepancy.
If an email message is on one side and not on the other, how do you know whether you should replay a “delete” on the one side, or an “add” on the one side? If email messages with the same ID differ in their content, which version should be propagated to the other side? Until you find a deterministic way of deciding, you can’t program a computer to solve it.
I think there are only two categories of algorithms:
- Those who rely on versioning and change logs. Basically “give me all of the changes that happened since version #1354.”
- Those who perform extensive comparisons of each side’s data. This can be done in O(n) if the ordering is the same.
For a client-server problem the former will usually require less bandwidth.
Have the delete operation record a deleted flag(or date) rather than actually delete data on the server, so that clients may check if a message has been deleted. Then have a periodic task that purges data that has been deleted for more than Y days, or track time last synced by all known clients and delete after every client that might be interested in a deleted message has synced.
The other way to do it is to act like version control keep current state + a record changes, and delete is just another change. Then when a client syncs it says give me all changes since this rev.