I am working on a system that has many remote laptops all connected to the internet through cellular data connections.
The application will synchronize periodically to a central database. The problem is, due to factors outside our control, the cost to move data across the cellular networks are spectacularly expensive.
Currently the we are sending a compressed XML file across the wire where it is being processed and various things are done with (mainly stuffing it into a database).
My first couple of thoughts were to convert that XML doc to json, just prior to transmission and convert back to XML just after receipt on the other end, and get some extra compression for free without changing much. Another thought was to test various other compression algorithms to determine the smallest one possible. Although, I am not entirely sure how much difference json vs xml would make once it is compressed.
I thought that their must be resources available that address this problem from an information theory perspective. Does anyone know of any such resources or suggestions on what direction to go in. This developed on the MS .net stack on windows for reference.
The very best compression you get from leaving information out, because information that is not transmitted reduces down to 0 bytes.
As a first step, keep a record of when things were last changed and when the application last synchronized with the database. Then you have your first saving in all those records that didn’t change since the last synchronization.
A second step to reduce the amount of data transmitted is to send only the differences during the synchronization, but that will require additional storage to keep a history of how things changed.
After you have minimised the amount of information that needs to be transmitted, then you can start to think about compression with gzip, bzip2, or some similar algorithm.
As a completely unrelated method to keep the costs of the synchronizations down, add an option to your application to trigger a synchronization manually and encourage users to use that option when they have their device connected to a cheap network.
3