I maintain a tool that syncs files between a client computer and a document management server (using the CMIS protocol). Versions history (on the CMIS server) is important.
When saving a document locally, MS Word:
- Writes to a temp file
~wrdxxxx.tmp
- Deletes the original file
Example.doc
- Renames
~wrdxxxx.tmp
toExample.doc
PROBLEM: At step 2, my sync tool deletes the file on the server, loosing all versions history.
QUESTION: From the point of view of my sync tool, is there a way to know whether the file has really been deleted, or whether it is just being saved?
Notes:
- Waiting is probably not a good solution, as I can’t be sure how much time steps 2 and 3 take.
- Cheking for the presence of
~wrdxxxx.tmp
files does not work when several documents are being edited in the same folder at the same time. - The tool works like Dropbox: Users don’t have to “commit” changes, files are synchronized automatically as soon as possible.
4
This is not “Word” specific, the same problem could occur when someone does this manually (make a backup of the file, rename or delete the original one, restore a backup etc.) Your sync tool cannot know which of these operations are creations of new documents, or just continuations of existing ones, and it cannot easily manage the version history correctly, since that would mean it had to “mindread” what the author/editor takes as “one” document.
Thus, all software VCS tools I know of expect the user to explicitly create a new revision, delete or rename the files explictly in the repository. If you are looking for a user-friendly solution, look how tools like TortoiseSVN or TortoiseGIT have solved that problem.
At step 2, my sync tool deletes the file on the server, loosing all versions history.
In case you still want to try an “automatic” approach, ignoring what I wrote above: when a file on the client is deleted, why is it really important to delete the history on the server? Can’t you just mark it “deleted” as a special state, but still keep the history? When the file “reappears” in the next sync cycle at the same place on the client, you have to “revive” the file, undo the “deleted” state and continue the history.
4
You have two things working against you:
First is a race condition, where the outcome of your sync operation depends on the timing of Word and that of your sync process. (If you’re using something event-driven, the timing is a bit more predictable, but it just means that you’re going to delete your history all the time instead of just sometimes as you would if you’re polling.) As you observed, there’s no way to know how much time will elapse between steps 2 and 3, so waiting to see if the file returns would create another race condition.
Second is uncertainty about whether a file you see as deleted will reappear because the OS can’t predict what an application will do in the future. It also can’t tell you what application deleted a file, which you might be able to use to draw a reasonable conclusion that the file might reappear.
If Word actually saves files in the way Microsoft describes, that’s a flaw in the implementation. Windows has API calls to do atomic renames that would make the process go like this:
- Create temporary file
~wrdxxxx.tmp
- Atomically rename
~wrdxxxx.tmp
toExample.doc
Because step 2 is atomic, your sync program would never see that Example.doc
disappeared, just that its attributes or content changed.
Word’s behavior isn’t something you can change, so waiting is really your only option.
If the goal is not to lose version history, it’s better to make deletions provisional and wait a fairly long time (say, 30 seconds) to make them permanent. This will prevent most — but not all — of the problems caused by not knowing how long a save takes. Obviously, it’s possible for a save to take that long, but it’s probably rare. The bigger pitfall is that a long enough provisional period will treat a delete/create sequence done by a human in that time as a version change rather than what it actually was. This doesn’t reflect the reality of what happened, but it will preserve the version history.
The question only you can answer is whether having reality distorted like that on relatively rare occasions is acceptable in the face on not having your synchronizer blow away the history.
12