When developing on a long-running branch (here defined as living longer than one release), what’s the most accepted practice(s) to keeping the branch current with its origin while keeping the history relatively clean prior to merging back?
As an example, consider the origin branch (m
) and a (topic) branch (t
) with the following history graph:
A[1]--------F[2]--H[3]--J[4] (m)
B----C-D-E--G-----I-----K (t)
[1] - Release 1.0
[2] - Release 2.0
[3] - Hotfix 2.1
[4] - Hotfix 2.2
In branch t
I’ve been merging from master (m
) periodically to keep my topic branch up to date, but there are many small commits between B
and K
that I’d like to squash with a rebase.
I’m concerned if I squash those commits that I will change the hash of the merged commits from master and create problems for others when I merge my branch back into master for Release 3.0.
If I rebase -i
the topic branch, I expect the graph would look like this:
A---------F---H---J (m)
B-----------------K (t)
where K
contains all the squashed commits between B
and K
including the merged changes from master (F
and H
). If I then merge t
back to master m
I’m wondering if there likely be conflicts from F
and H
. The graph should look like this then:
A---------F---H---J---L (m)
/
B-----------------K (t)
t
can now be deleted because it’s no longer used. I’d like to keep the history of where t
was branched from, so rebasing B
onto J
would lose that information and probably solve my problem but the graph would look like:
A---------F---H---J---K' (m)
Is this a valid concern? What is the commonly-accepted practice to keep one’s feature branch current with its origin while keeping a relatively clean history?
Note: the example here is simplified, there are actually hundreds of commits from the branch point to HEAD on the topic branch – over 300 – many of which could be squashed out of existence
1
The best practice is to keep an accurate history and just do a straight merge. That way, if you introduce a bug at C
, someone investigating it can say, “Oh, that makes sense because it was done before we put feature F
into origin
, but not merged until today.” It also has the benefit of not changing any hashes. There are flags you can use to hide the topic branch history when it doesn’t interest you.
If you must squash your commits, for example if origin
is controlled by another organization, then it depends on how many people use the topic branch, and how many of them need to do merges from your origin
. You’re not going to be able to do it without inconveniencing someone. Some teams just work with the original history, and one person deals with all the merges. Some teams start over with a new topic branch every time it gets merged into origin
. Some teams continually rebase their topic branch to origin
, and just make it common practice to always do a force pull. You’ll have to figure out what works for your team.
2
You have a couple of options depending on branch t.
If branch t is also a remote branch with multiple developers working on it. I think that best way is doing the merges that you are suggesting. Rebasing and doing a forced push seems like a risky way to do things. I try to avoid changing a pushed history.
If the branch is local, then you can do a git rebase origin/master
this will bring in all the changes that have occurred in master and then append your commits after the last commit. The advantage of this is you can easily find all your commits for squashing. This also keeps all the changes that are related to the project in your commits rather than doing merge conflict resolution in a merge commit and keeps the branch clean when you merge it into master.
1
After several experiments on a clone of the repository and given the vastness of commits to be squashed, I’ve determined it to be rather impractical to squash them while retaining the original branch commit just for some traceability.
The cleanest/simplest solution seems to be what @Karl suggested – just merge the branch into master and be done with it. While it does introduce a lot of unnecessary and sometimes spurious commit messages throughout the history, it is clear where the branch was created, refreshed from master, and finally merged back in.
In the end, traceability wins out over a pristine history.
The final graph looks like the following (using --no-ff
on the merge):
A[1]--------F[2]--H[3]--J[4]--L (m)
/
B----C-D-E--G-----I-------K (t)
[1] - Release 1.0
[2] - Release 2.0
[3] - Hotfix 2.1
[4] - Hotfix 2.2
And from a git log
standpoint it looks something like:
commit L (HEAD, m, origin/master)
merge parents K, J
commit K (t)
commit J
commit I
merge parents H, G
commit H
commit G
merge parents F, E
commit F
commit E
commit D
commit C
commit B
commit A