We are an organisation consisting of around 200 developers that are working continuously on one single product (using the revision control Git) which is planned to be released at a certain date.
Due to the huge number of developers, we are trying to create “cross functional” teams with around 10 developers in each team, resulting in around 20 development teams in the organisation.
Since we would like to maintain a continuously “high standard” (meaning when developer does a pull, the product should at least be compilable, etc) of the product in the main repository, we would like to use some kind of quality gates.
I am a bit unsure how to phrase the question, but I am wondering if I could get some advice of development methodologies for such a large group of developers working on a single product.
In our opinion, one end of the spectrum is to allow each developer to commit directly to the main repository, however we fear that due to the high number of developers / commits that the “main repository” might constantly be in a broken stage, due to we can’t have a demanding “quality gate” for each commit.
The other end of the spectrum might be like (we think Linus Torvalds / Linux does it) a tree or a pyramid structure, where the “main repository” only has three pull sources, these three only has a handful of trusted pull sources, etc. However, we feel that with a structure like that changes have a long chain to climb in order to come into the “main repository”. Plus that if a merge conflict occurs, the problem lands on another developer than the “original developer”.
With all this background information and opinions stated, how can we learn and read recommended development methodologies for so many developers? How do large organisations (Microsoft, Facebook, Ubuntu, etc?) structure up their development?
4
You should certainly consider splitting the product into modules with interface team(s) bringing those constituent modules together into a product. This in turn would mean splitting the repositories to match the module partitioning and hierarchy. If it appears that you can’t do this then the project will probably grind to a merge-induced halt considering the number of developers contributing.
If you are planning to use Git for version control then I would recommend using a code review system (like Gerrit) to improve transparency and ensure quality for each repository. That way all work would have to be approved before being merged into any authoritative repository. In this scenario it makes sense to grant certain trusted individuals permissions to push from a repo under a code review system to another repository (also possibly under a code review system). If used correctly this should be a fast and greatly beneficial process that doesn’t hinder the development process.
Regarding build verification you would need a continuous integration (CI) server whose purpose is to automatically build and verify code. By verify code I mean that the code successfully compiles and tests pass. Infact Jenkins (CI Server) can be linked to Gerrit code review system as part of the Gerrit verification stage, fully automating the process.
In addition to these integration tools it is important to strive for frequent integration as part of the development methodology, to minimise time merging.
It may be worth considering an Agile development process like Scrum whose purpose is to break a complex product it into manageable chunks of product increment (called Sprints). These would provide integration opportunities between repositories.
1
Clearly, with a 200 person development team, you must have some sort of hierarchical structure. An individual or a small group of people is making decisions about the design of the software product. Your development process should reflect this: you need code reviews and testing in place to make sure the software being created actually matches what you wanted to create (as well as for quality purposes).
Even the small teams need leaders to guide the teams and review their work as they develop individual components. Their should be quality control processes in place at the team level, as well.
So, yes, you should follow a hierarchical structure with regard to the repository. This is to match the hierarchical structure of the project overall.
Individual components should be built and tested to a certain level of adequacy before you even think about putting them all together. Allowing 200 people to commit directly to the main project would be chaos. You should have separate areas for each group where individuals can commit their changes on a day-to-day basis, without affecting the main build of the project.
It is a very good thing if “changes have a long chain to climb in order to come into the main repository” because this chain allows you to ensure the quality. It may seem faster if all the changes immediately apply to the main repository, but in fact this will just be a huge headache, as you will have a constantly buggy and unusable main build of your software.
It is also a good thing that “if a merge conflict occurs, the problem lands on another developer”–specifically, a higher level developer should be the one to decide how to resolve a conflict.
When you have something big and (as a consequence) unmanageable the way out is dividing it into smaller and manageable pieces.
There are several steps that would help you maintain the team and the project better:
-
divide functionality into modules. The functionality should be divided into independent maximally modules using the high cohesion, low coupling and dependency inversion principles. The first principle will help you create logically consistent modules. The second one will help keep these modules as independent as possible. The third one will help develop dependent modules simultaneously (if module A depends on module B, B should provide an interface that A can use even when B is not completely ready).
-
have clear documentation. When there are so many people working together, things can easily be forgotten or misunderstood. So you need to pay special attention to all the documentation from requirements to architectural solutions.
-
people for tasks (never tasks for people). After dividing the functionality into smaller sets, create teams to work on these sets. Creating teams will be easier at this stage, because you already know what each team is to work on. And tasks such as code reviews would be done inside each team.
-
clear system of tasks. Each of the 200 developers should clearly know what to work on. This will help you keep track of what is already done, what is each person working on and how much work is left.
-
source control. (I think this is outlined in other answers quite well)))
And finally, try to create as simple a structure of teams and modules as possible. You can’t afford complexity with such a huge project.
In addition to the other answers suggesting a hierarchical structure: that implies you will have to schedule ‘integration’ points in time where the focus is entirely on moving the code up in the hierarchy and ‘putting it all together’. This is not much different from smaller projects with a final phase where no other work is done than testing and bug fixing, only more frequently. Since you are working in a large group striving for high standards most of that (frame of mind) will probably be in place already.
In addition to hotpotato’s answer (which is directly on the mark IMHO), I would also suggest implementing some source control gates, as you suggest. When we moved a large team and code base to git for SCM, we decided to use what is called the “benevolent dictator” method, similar to the model you described.
In this scenario there are lots of various branches of the full code base that are regularly updated from their source branch, but responsibility for promoting code into more visible/public areas lies with a single person (or small group of people), and is typically tied to a code review process. With a well-organized branching structure, this can work REALLY well. For more information, check out this link.
I have worked on an enormous system which had several hundred developers working on it simultaneously with about 150M SLOC. This was on a mainframe, so we’re not talking Visual Studio, but the principles can still be adopted.
First of all, if you’re using Java, I would definitely say use Maven. If you’re using VS, you could also use Nuget although I’m not quite sure as to whether or not it’s there with Maven yet (it is also somewhat different). Using a system like this will allow you to pull your dependencies and allow them to function individually. You’d have a build script pull the relevant dependencies and build as a batch.
Given that you’re not directly asking a question but asking for a methodology, I’ll tell you how my previous employer handled it.
The system was broken into clusters. Clusters represented business areas and system infrastructure areas. I’m not going to name them, but for a huge retail business you could think of things such as marketing, retail operations, online operations, procurement, distribution. System infrastructure represented things such as customers and security. Within each cluster, there were components. Using the previous analogy, you could consider components of security for example – single sign on, directory services, auditing, reporting, etc. Each component had its relative routines stored within it.
As a namespace or package you’d have Organisation.Security.DirectoryServices, for example. By containing all logic to its relevant areas, the teams worked fairly autonomously. Obviously large projects requiring input from multiple teams happened but they were largely smooth operations.
I hope this helps.