In a git
environment, where we have modularized most projects, we’re facing the one project per repository or multiple projects per repository design issue. Let’s consider a modularized project:
myProject/
+-- gui
+-- core
+-- api
+-- implA
+-- implB
Today we’re having one project per repository. It gives freedom to
release
individual componentstag
individual components
But it’s also cumbersome to branch
components as often branching api
requires equivalent branches in core
, and perhaps other components.
Given we want to release
individual components can we still get the similar flexibility by utilizing a multiple projects per repository design.
What experiences are there and how/why did you address these issues?
6
There are three major disadvantages to “one project per repository”, the way you’ve described it above. These are less true if they are truly distinct projects, but from the sounds of it changes to one often require changes to another, which can really exacerbate these problems:
- It’s harder to discover when bugs were introduced. Tools like
git bisect
become much more difficult to use when you fracture your repository into sub-repositories. It’s possible, it’s just not as easy, meaning bug-hunting in times of crisis is that much harder. - Tracking the entire history of a feature is much more difficult. History traversing commands like
git log
just don’t output history as meaningfully with fractured repository structures. You can get some useful output with submodules or subtrees, or through other scriptable methods, but it’s just not the same as typingtig --grep=<caseID>
orgit log --grep=<caseID>
and scanning all the commits you care about. Your history becomes harder to understand, which makes it less useful when you really need it. - New developers spend more time learning the Version Control’s structure before they can start coding. Every new job requires picking up procedures, but fracturing a project repository means they have to pick up the VC structure in addition the code’s architecture. In my experience, this is particularly difficult for developers new to git who come from more traditional, centralized shops that use a single repository.
In the end, it’s an opportunity cost calculation. At one former employer, we had our primary application divided into 35 different sub-repositories. On top of them we used a complicated set of scripts to search history, make sure state (i.e. production vs. development branches) was the same across them, and deploy them individually or en masse.
It was just too much; too much for us at least. The management overhead made our features less nimble, made deployments much harder, made teaching new devs take too much time, and by the end of it, we could barely recall why we fractured the repository in the first place. One beautiful spring day, I spent $10 for an afternoon of cluster compute time in EC2. I wove the repos back together with a couple dozen git filter-branch
calls. We never looked back.
11
Christopher did a very good job of enumerating the disadvantages of a one-project-per-repository model. I would like to discuss some of the reasons you might consider a multiple-repository approach. In many environments I have worked in, a multi-repository approach has been a reasonable solution, but the decision of how many repositories to have, and where to make the cuts has not always been an easy one to make.
In my current position, I migrated a behemoth single-repository CVS repository with over ten years of history into a number of git repositories. Since that initial decision, the number of repositories has grown (through the actions of other teams), to the point where I suspect we have more than would be optimal. Some new-hires have suggested merging the repositories but I have argued against it. The Wayland project has a similar experience. In a talk I saw recently, they had, at one point, over 200 git repositories, for which the lead apologized. Looking at their website, I see now they are at 5, which seems reasonable. It’s important to observe that joining and splitting repositories is a manageable task, and it’s okay to experiment (within reason).
So when might you want multiple repositories?
- A single repository would be too large to be efficient.
- Your repositories are loosely coupled, or decoupled.
- A developer typically only needs one, or a small subset of your repositories to develop.
- You typically want to develop the repositories independently, and only need to synchronize them occasionally.
- You want to encourage more modularity.
- Different teams work on different repositories.
Points 2 and 3 are only significant if point 1 holds. By splitting our repositories, I significantly decreased the delays suffered by our offsite colleagues, reduced disk consumption, and improved network traffic.
4 and 5 are more subtle. When you split the repos of say a client and server, this makes it more costly to coordinate changes between the client and server code. This can be a positive, in that encourages a decoupled interface between the two.
Even with the downsides of multi-repository projects, a lot of respectable work is done that way — wayland and boost come to mind. I don’t believe a consensus regarding best practices has evolved yet, and some judgement is required. Tools for working with multiple repositories (git-subtree, git-submodule and others) are still being developed and experimented with. My advice is to experiment and be pragmatic.
4
As we use GitHub, we actually have multiple projects in one repo but ensure that those projects/modules are properly modularised (we use -api and -core conventions + Maven + static and runtime checking and might even go to OSGi one day to boot).
What does it save on? Well we don’t have to issue multiple Pull Requests if we’re changing something small across multiple projects. Issues and Wiki are kept centralised etc.
We still treat each module/project as a proper independent project and build and integrate them separately in our CI server etc.
6
For me, the main difference in using one or more than one repository are the answers to the following questions:
- Are the multiple parts developed by the same team, have the same release cycle, the same customer? Then there are less reasons to split the one repository.
- Are the multiple parts highly dependent on each other? So splitting model, controller and UI (even when they are different parts) is not very sensible, due to the high dependency on each other. But if 2 parts only have a small dependency, which is implemented by a stable interface that is only changed every few years, so it would be wise to divide the 2 parts in 2 repositories.
Just as an example, I have a small application (client only), that checks the “quality” of a Subversion repository. There is the core implementation, that could be started from the command line, and works well with Java 6. But I have started to implement a UI, that uses JavaFX as part of Java 8. So I have split the 2, and created a second repository (with a second build process), with different schedule, …
I like the answers above (voted them up), but I think they are not the whole true story. So I wanted to add the arguments for splitting repositories as well. So the real answer (when to split) may be somewhere in the middle …
It might be that git-subtree (see Atlassian blog, medium blog, or kernel link) would be a good fit for that you have. So, each of your top level project would use a set of subtree at possibly different version(s).
From your example, the repositories should be setup in terms of how interdependent they are. All the reasoning about designing MicroServices and Domain Driven Design apply here: in some cases duplicate code is acceptable, work with interfaces, don’t break compatibility unless you really have to, etc.
Now in my view a UI should be independent of the backend. So a UI project repository should typically contain the UI code and the Client Controller. The Client Controller will connect with Service Controllers in an abstract manner. They will use a service client/api abstraction that is versioned separately from the service, so that a service can be updated without breaking the client(s) (there could be several different clients).
So a service itself should be its own repository. In my view, the service is just a wrapper of some single-point-of-thruth business logic. So the business logic should typically be separate from the service technology that hosts it. On the other hand, the repository implementation is typically so tightly connected to the business logic, that this could be integrated in the same repository. But even there your mileage may vary.
Of course, simple projects that are unlikely to change much in terms of technology or supporting multiple stacks, where all UI can be hosted from the same source as the backend and the backend services are typically only used by that same client, can benefit from more tightly integrated repositories.
In that case you would probably be fine with just having the full vertical in one repository, and focus on just making sure your functional domains are properly stand-alone in their own repository. You then still have most advantages of smaller repositories, and little overhead otherwise.