Choosing between Single or multiple projects in a git repository?

In a git environment, where we have modularized most projects, we’re facing the one project per repository or multiple projects per repository design issue. Let’s consider a modularized project:

myProject/
   +-- gui
   +-- core
   +-- api
   +-- implA
   +-- implB

Today we’re having one project per repository. It gives freedom to

release individual components
tag individual components

But it’s also cumbersome to branch components as often branching api requires equivalent branches in core, and perhaps other components.

Given we want to release individual components can we still get the similar flexibility by utilizing a multiple projects per repository design.

What experiences are there and how/why did you address these issues?

There are three major disadvantages to “one project per repository”, the way you’ve described it above. These are less true if they are truly distinct projects, but from the sounds of it changes to one often require changes to another, which can really exacerbate these problems:

It’s harder to discover when bugs were introduced. Tools like git bisect become much more difficult to use when you fracture your repository into sub-repositories. It’s possible, it’s just not as easy, meaning bug-hunting in times of crisis is that much harder.
Tracking the entire history of a feature is much more difficult. History traversing commands like git log just don’t output history as meaningfully with fractured repository structures. You can get some useful output with submodules or subtrees, or through other scriptable methods, but it’s just not the same as typing tig --grep=<caseID> or git log --grep=<caseID> and scanning all the commits you care about. Your history becomes harder to understand, which makes it less useful when you really need it.
New developers spend more time learning the Version Control’s structure before they can start coding. Every new job requires picking up procedures, but fracturing a project repository means they have to pick up the VC structure in addition the code’s architecture. In my experience, this is particularly difficult for developers new to git who come from more traditional, centralized shops that use a single repository.

In the end, it’s an opportunity cost calculation. At one former employer, we had our primary application divided into 35 different sub-repositories. On top of them we used a complicated set of scripts to search history, make sure state (i.e. production vs. development branches) was the same across them, and deploy them individually or en masse.

It was just too much; too much for us at least. The management overhead made our features less nimble, made deployments much harder, made teaching new devs take too much time, and by the end of it, we could barely recall why we fractured the repository in the first place. One beautiful spring day, I spent $10 for an afternoon of cluster compute time in EC2. I wove the repos back together with a couple dozen git filter-branch calls. We never looked back.

Christopher did a very good job of enumerating the disadvantages of a one-project-per-repository model. I would like to discuss some of the reasons you might consider a multiple-repository approach. In many environments I have worked in, a multi-repository approach has been a reasonable solution, but the decision of how many repositories to have, and where to make the cuts has not always been an easy one to make.

In my current position, I migrated a behemoth single-repository CVS repository with over ten years of history into a number of git repositories. Since that initial decision, the number of repositories has grown (through the actions of other teams), to the point where I suspect we have more than would be optimal. Some new-hires have suggested merging the repositories but I have argued against it. The Wayland project has a similar experience. In a talk I saw recently, they had, at one point, over 200 git repositories, for which the lead apologized. Looking at their website, I see now they are at 5, which seems reasonable. It’s important to observe that joining and splitting repositories is a manageable task, and it’s okay to experiment (within reason).

So when might you want multiple repositories?

A single repository would be too large to be efficient.
Your repositories are loosely coupled, or decoupled.
A developer typically only needs one, or a small subset of your repositories to develop.
You typically want to develop the repositories independently, and only need to synchronize them occasionally.
You want to encourage more modularity.
Different teams work on different repositories.

Points 2 and 3 are only significant if point 1 holds. By splitting our repositories, I significantly decreased the delays suffered by our offsite colleagues, reduced disk consumption, and improved network traffic.

4 and 5 are more subtle. When you split the repos of say a client and server, this makes it more costly to coordinate changes between the client and server code. This can be a positive, in that encourages a decoupled interface between the two.

Even with the downsides of multi-repository projects, a lot of respectable work is done that way — wayland and boost come to mind. I don’t believe a consensus regarding best practices has evolved yet, and some judgement is required. Tools for working with multiple repositories (git-subtree, git-submodule and others) are still being developed and experimented with. My advice is to experiment and be pragmatic.

As we use GitHub, we actually have multiple projects in one repo but ensure that those projects/modules are properly modularised (we use -api and -core conventions + Maven + static and runtime checking and might even go to OSGi one day to boot).

What does it save on? Well we don’t have to issue multiple Pull Requests if we’re changing something small across multiple projects. Issues and Wiki are kept centralised etc.

We still treat each module/project as a proper independent project and build and integrate them separately in our CI server etc.

For me, the main difference in using one or more than one repository are the answers to the following questions:

Are the multiple parts developed by the same team, have the same release cycle, the same customer? Then there are less reasons to split the one repository.
Are the multiple parts highly dependent on each other? So splitting model, controller and UI (even when they are different parts) is not very sensible, due to the high dependency on each other. But if 2 parts only have a small dependency, which is implemented by a stable interface that is only changed every few years, so it would be wise to divide the 2 parts in 2 repositories.

Just as an example, I have a small application (client only), that checks the “quality” of a Subversion repository. There is the core implementation, that could be started from the command line, and works well with Java 6. But I have started to implement a UI, that uses JavaFX as part of Java 8. So I have split the 2, and created a second repository (with a second build process), with different schedule, …

I like the answers above (voted them up), but I think they are not the whole true story. So I wanted to add the arguments for splitting repositories as well. So the real answer (when to split) may be somewhere in the middle …

It might be that git-subtree (see Atlassian blog, medium blog, or kernel link) would be a good fit for that you have. So, each of your top level project would use a set of subtree at possibly different version(s).

From your example, the repositories should be setup in terms of how interdependent they are. All the reasoning about designing MicroServices and Domain Driven Design apply here: in some cases duplicate code is acceptable, work with interfaces, don’t break compatibility unless you really have to, etc.

Now in my view a UI should be independent of the backend. So a UI project repository should typically contain the UI code and the Client Controller. The Client Controller will connect with Service Controllers in an abstract manner. They will use a service client/api abstraction that is versioned separately from the service, so that a service can be updated without breaking the client(s) (there could be several different clients).

So a service itself should be its own repository. In my view, the service is just a wrapper of some single-point-of-thruth business logic. So the business logic should typically be separate from the service technology that hosts it. On the other hand, the repository implementation is typically so tightly connected to the business logic, that this could be integrated in the same repository. But even there your mileage may vary.

Of course, simple projects that are unlikely to change much in terms of technology or supporting multiple stacks, where all UI can be hosted from the same source as the backend and the backend services are typically only used by that same client, can benefit from more tightly integrated repositories.

In that case you would probably be fine with just having the full vertical in one repository, and focus on just making sure your functional domains are properly stand-alone in their own repository. You then still have most advantages of smaller repositories, and little overhead otherwise.

Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa Dịch vụ tổ chức sự kiện 5 sao Thông tin về chúng tôi Dịch vụ sinh nhật bé trai Dịch vụ sinh nhật bé gái Sự kiện trọn gói Các tiết mục giải trí Dịch vụ bổ trợ Tiệc cưới sang trọng Dịch vụ khai trương Tư vấn tổ chức sự kiện Hình ảnh sự kiện Cập nhật tin tức Liên hệ ngay Thuê chú hề chuyên nghiệp Tiệc tất niên cho công ty Trang trí tiệc cuối năm Tiệc tất niên độc đáo Sinh nhật bé Hải Đăng Sinh nhật đáng yêu bé Khánh Vân Sinh nhật sang trọng Bích Ngân Tiệc sinh nhật bé Thanh Trang Dịch vụ ông già Noel Xiếc thú vui nhộn Biểu diễn xiếc quay đĩa Dịch vụ tổ chức tiệc uy tín Khám phá dịch vụ của chúng tôi Tiệc sinh nhật cho bé trai Trang trí tiệc cho bé gái Gói sự kiện chuyên nghiệp Chương trình giải trí hấp dẫn Dịch vụ hỗ trợ sự kiện Trang trí tiệc cưới đẹp Khởi đầu thành công với khai trương Chuyên gia tư vấn sự kiện Xem ảnh các sự kiện đẹp Tin mới về sự kiện Kết nối với đội ngũ chuyên gia Chú hề vui nhộn cho tiệc sinh nhật Ý tưởng tiệc cuối năm Tất niên độc đáo Trang trí tiệc hiện đại Tổ chức sinh nhật cho Hải Đăng Sinh nhật độc quyền Khánh Vân Phong cách tiệc Bích Ngân Trang trí tiệc bé Thanh Trang Thuê dịch vụ ông già Noel chuyên nghiệp Xem xiếc khỉ đặc sắc Xiếc quay đĩa thú vị

Filed under: softwareengineering - @ 09:57

Thẻ: git, java, maven, programming-practices, version-control

Thiết kế website giá rẻ

Danh mục

Choosing between Single or multiple projects in a git repository?