When working with git in a team using feature branches, I often find it
difficult to understand the branch structure in history.
Example:
Let’s say there was a feature branch feature/make-coffee, and
bugfixing continued on master in parallel to the feature branch.
History might look like this:
* merge feature/make-coffee
|
| * small bugfix
| |
* | fix bug #1234
| |
| * add milk and sugar
| |
* | improve comments
| |
* | fix bug #9434
| |
| * make coffe (without milk or sugar)
| |
* | improve comments
|/
*
Problem
At first glance, I find it difficult to tell which side is the feature
branch. I usually need to browse several comments on both sides to get
an idea of which is which. This gets more complicated if there are
multiple feature branches in parallel (particularly if they are for
closely related features), or if there was merging in both directions
between feature branch and master.
As a contrast, in Subversion, this is significantly easier because the
branch name is part of the history – so I can tell right away that a
commit was originally made on “feature/make-coffee”.
Git could make this easier by including the name of the current branch
in the commit metadata when creating a commit (along with author, date
etc.). However, git does not do this.
Is there some fundamental reason why this is not done? Or is it just
that nobody wanted the feature? If it’s the latter, are there other ways
of understanding the purpose of historical branches without seeing the
name?
3
Probably because branch names are only meaningful within one repository. If I’m on the make-coffee
team and submit all my changes through a gatekeeper, my master
branch might get pulled to the gatekeeper’s make-coffee-gui
branch, which he will merge with his make-coffee-backend
branch, before rebasing to a make-coffee
branch that gets merged into the central master
branch.
Even within one repository, branch names can and do change as your workflow evolves. master
might change later to be called development
, for example.
As CodeGnome alluded, git has a strong design philosophy of not baking something into the data if it isn’t needed. Users are expected to use options in git log
and git diff
to output the data to their liking after the fact. I recommend trying some out until you find a format that works better for you. git log --first-parent
, for example, won’t show the commits from a branch being merged in.
2
Track Branch History with --no-ff
In Git, a commit has ancestors, but a “branch” is really just the current head of some line of development. In other words, a commit is a snapshot of the working tree at some point in time, and can belong to any number of branches at once. This is part of what makes Git branching so lightweight when compared to other DVCSes.
Git commits don’t carry branch information because they aren’t necessary for Git history. However, merges which are not fast-forwards can certainly carry additional information about what branch was merged. You can ensure that your history contains this information by always passing the --no-ff
flag to your merges. git-merge(1) says:
--no-ff
Create a merge commit even when the merge resolves as a
fast-forward. This is the default behaviour when merging an
annotated (and possibly signed) tag.
This will generally create a merge message similar to Merge branch 'foo'
, so you can typically find information about “ancestor branches” with a line similar to the following:
$ git log --regexp-ignore-case --grep 'merge branch'
3
Why git commits don’t contain the name of the branch they were created on?
Just a design decision. If you need that information, you may add a prepare-commit-msg or commit-msg hook to enforce that on a single repository.
After the BitKeeper disaster Linus Torvalds developed git to match the Linux kernel development process. There, until a patch is accepted, it is revised, edited, polished several times (all through Mail), signed-off (=editing a commit), cherry-picked, wandering to branches of lieutenant maybe often rebased, more edits, cherry-picks and so on until they a finally merged upstream by Linus.
In such a workflow there may be no specific origin of a commit, and often a commit is just created in some temporary debugging branch in some unimportant private random repository with funny branch names, as git is distributed.
Maybe that design decision just happened coincidentally, but it exactly matches the Linux kernel development process.
The simplest answer is that names of branches are ephemeral. What if you were to, say, rename a branch (git branch -m <oldname> <newname>
)? What would happen to all the commits against that branch? Or what if two people have branches that have the same name on different local repositories?
The only thing that has meaning in git is the checksum of the commit itself. This is the basic unit of tracking.
The information is there, you’re just not asking for it, and its not exactly there.
Git stores the checksum of head of each branch in .git/refs/heads
. For example:
.../.git/refs/heads$ ls test -rw-rw-r-- 1 michealt michealt 41 Sep 30 11:50 test .../.git/refs/heads$ cat test 87111111111111111111111111111111111111d4
(yes, I’m clobbering my git checksums, its not special, but they’re private repositories that I’m looking at that the moment that don’t need to have anything accidently leaked).
By looking at this and tracking back every commit that has this as a parent, or the parents parent or the parents parent… you can find out what branches a given commit is in. Its just a big graph to render.
Storing where specifically the name of a branch (which can change as mentioned above) a commit is in the commit itself is extra overhead on the commit that doesn’t help it. Store the parent(s) of the commit and its good.
You can see the branches by running the git log command with the appropriate flag.
git log --branches --source --pretty=oneline --graph git log --all --source --pretty=oneline --graph
There are many different ways to pick what you want for that – look at the git log documentation – the -all
section and the few options that follow.
* 87111111111111111111111111111111111111d4 test Merge branch 'test1' into test | | * 42111111111111111111111111111111111111e8 test1 update: add stuff * | dd11111111111111111111111111111111111159 test Merge branch 'test3' into test |
That ‘test’, ‘test1’, and ‘test2’ are the branches these were committed to.
Note that the git log documentation is huge and it is quite possible to get almost anything you want out of it.
Because in Git commits like “make coffee” are considered part of both branches when the feature branch is merged. There’s even a command to check that:
% git branch --contains @
feature/make-coffee
master
Also, branches are cheap, local, and ethereal; they can be easily added, removed, renamed, pushed, and deleted from the server.
Git follows the principle that everything can be done, which why you can easily remove a branch from a remote server, and you can easily rewrite history.
Let’s say that I was in the ‘master’ branch, and I made two commits: ‘make coffee’ and ‘add milk and sugar’, then I decide to use new branch for that, so I reset ‘master’ back to ‘origin/master’. Not a problem, all the history is still clean: ‘make coffee’ and ‘add milk and sugar’ are not part of master, only feature/make-coffee.
Mercurial is the opposite; it doesn’t want to to change things. Branches are permanent, rewriting history is frowned upon, you can’t just delete a branch from the server.
In short: Git allows you to do everything, Mercurial wants to make things easy (and makes it really difficult to let you do what you want).