The good folks who created the SVN version control system use a structure they refer to as “skip deltas” to store the revision history of files internally. A revision is stored as a delta against an earlier revision. However, revision N is not necessarily stored as a delta against revision N-1, like this:
0 <- 1 <- 2 <- 3 <- 4 <- 5 <- 6 <- 7 <- 8 <- 9
Instead, revision N is stored as a delta against N-f(N), where f(N) is the greatest power of two that divides N:
0 <- 1 2 <- 3 4 <- 5 6 <- 7
0 <------ 2 4 <------ 6
0 <---------------- 4
0 <------------------------------------ 8 <- 9
(Superficially it looks like a skip list but really it’s not that similar – for instance, skip deltas are not interested in supporting insertion in the middle of the list.) You can read more about it here.
My question is: Do other systems use skip deltas? Were skip deltas known/used/published before SVN, or did the creators of SVN invent it themselves?
1
Starting from your link Skip-Deltas in Subversion I found this note Notes on keeping version histories of files and there it’s written that:
The author does not know of papers or other references describing the
technique [skip-deltas]. .. This document was written by Greg Hudson
[email protected] on 2002-06-24. It was last updated on 2002-10-03.
Skip-deltas seem to have been introduced to SVN in rev. 842883 mostly in file “/subversion/trunk/subversion/libsvn_fs/tree.c” by Greg Hudson on July 31, 2002 with the log message “Implement skip-deltas, in a low-overhead way.” and in that revision in the file tree.c a comment was added “We do this using a technique derived from skip lists”. Indeed skip lists have similarities to skip-deltas.
I also found two mailing list discussion from July 2002 where Greg Hudson discusses the potential impact and various other people discuss the patch that was committed a few days later.
So my guess is that skip deltas were invented by the SVN team, especially by Greg Hudson from the MIT in 2002, independently from anyone else drawing inspirations from skip lists while wanting to increase the information retrieval performance of their repository “file system”.
I also searched for delta compression and didn’t find any earlier reference. I think it would make sense, because this skip-delta technique is specialized on fast retrieval and maybe before SVN nobody needed fast retrieval of delta compressed data. Delta compression itself is in the literature since 70s/80s. One of the earliest papers is Experiences with delta compression of data produced by DIII.
2