Background:
I have a system that is air gapped from our primary development machines. I am developing software on the networked machines, and then updates from the primary git repository must be manually patched to the repository maintained on the other system. We do this by applying just the diff for each patch to the air gapped system each time, rather than recreating the entire repository every time. There is a need for a repository to be maintained on the air gapped system as well, as there are development branches specific to that machine also.
For the most part this serves my purposes, but there is potential for human error at several places along the patching process and this has occurred in the past. I wish to patch the entirety of the repository again to eliminate issues with portions of diffs that have failed to be applied in the past.
The problem:
How could I transfer branches that are specific to the air gapped system from the old repository to the new one? The repositories are extremely similar in content (not identical – probably 98% of files are identical but 2% would contain differences) but have entirely different commit histories. Changing origin for example will just complain there are no commits in common.
The branch history is not particularly important, if I was able to reapply just the changes, so perhaps just doing that would be sufficient, but I am not sure what options are in this situation.
2
git bundle
is the easy way to sneakernet history across an air gap. Bundle up your histories, put them on a thumb drive and walk it across, fetch from the bundle, Git will notice and skip any full dups and eventually squeeze out any partial-content repetitions as usual.
If data size is a concern, if the bundle is painfully large, you can get intrusive with your checking, git bundle
is just a convenience command managing a(n arbitrary but handy) single-file way of packaging refs and objects, you can pack them up yourself, especially for one-time use like this seeing how it works can even be helpful.
To do a full duplicate-object-elimination before packing you need a list of objects already at the destination, so
- Get a list of objects already on your airgapped system: in your airgapped repo,
git cat-file --batch-all-objects --batch-check='%(objectname)'
> /path/to/thumbdrive/objects-at-destination
- pack up only what’s new on your source system: in your source repo there, with that thumbdrive mounted,
branches="shipbranch1 shipbranch2 etc"
git rev-list --objects --no-object-names $branches | sort
| join -v1 - /path/to/thumbdrive/objects-at-destination
| git pack-objects /path/to/thumbdrive/pack
git for-each-ref --format='create %(refname) %(objectname)'
$(git rev-parse --symbolic-full-name $branches)
> /path/to/thumbdrive/new-tips
- Bring in the new history: on your airgapped system, in the repo there:
cp /path/to/thumbdrive/pack-* .git/objects/pack # or objects/pack if the repo's bare
git update-ref --stdin </path/to/thumbdrive/new-tips
- If the added packs are annoyingly large you can do a full reconsider-my-life-choices-while-I-wait-for-this repack:
git repack -adf --window 500 --window-memory 1G
which with any history large enough to actually need it will take a very long time on a weak system (and a good few minutes on even a sort-of-decent one) but will also likely save a lot of space. If you’ve already done it once you can leave off the f
flag to save much time and get almost all the available benefit.
I’ve smoketested this (and done similar things irl) so it’s at least close to right.