We ran into an unfortunate situation at work recently and I’ve been wondering what we can do to avoid similar problems in the future.
We make embedded systems. The FPGA code is in one SVN repository while the firmware & software code is in a different repository. Building the firmware requires the built FPGA images, so these binary files are also stored in the firmware/software repo. The FPGA build can take 2-3 hours and may randomly fail. To my understanding, these late failures occur because of the complex timing issues in the FPGA logic. Developer errors such as bad syntax will cause the build to fail within minutes.
Because the builds take so long, the FPGA developers generally make their changes, build, give the new image to the firmware developers, and then commit to SVN. A few weeks ago, this caused a lot of issues as we were preparing for a release. One FPGA developer made a small change and started a build. Another FPGA developer made a commit (after he had been told that his changes did not need to be included in this release and that he should wait to commit). After this commit, the initial build failed. The first FPGA developer then had to revert the other’s changes, commit his changes, and spend a significant amount of time inspecting the code in SVN to make sure it reflected the code used to build the image required for the firmware build before we tagged for the release.
If we were using git or another DVCS this wouldn’t be a problem, since each FPGA developer could commit locally before building and push to a server after a successful build. Switching our version control is currently not an option. We’re in the process of throwing more hardware at the problem of long FPGA builds and we’re investigating whether the build time can be improved by changing settings in the build tools.
Is there anything else we can do to improve our processes so we don’t repeat this situation or discover other problems with versioning and conflicts?
7
One FPGA developer made a small change and started a build. Another FPGA developer made a commit (after he had been told that his changes did not need to be included in this release and that he should wait to commit). After this commit, the initial build failed. The first FPGA developer then had to revert the other’s changes, commit his changes, and spend a significant amount of time inspecting the code in SVN to make sure it reflected the code used to build the image required for the firmware build before we tagged for the release.
The problem is that you are using a single branch for multiple purposes that are conflicting. You are doing release packaging, low risk fixes, and high risk fixes in the same branch.
There is nothing fundamentally wrong with checking in code, its just that it needs to be checked in the right spot. Yes, DCVS makes some aspects of this easier, though it is not the right answer for every solution.
Consider having the trunk or mainline (where you currently check everything in to) be the low risk fixes and low risk development spot – don’t break things for other people. Have another branch be the release branch. Once the release is ready, merge all the changes from the mainline into the release branch and build from there. You can still build from the mainline, but that is a dev/nightly release, not one that is ready.
Upon having the release build fail, either check the code into the trunk again and cherry pick (this is key – so that the second developer’s code doesn’t need to be reverted) that change into the release branch, or check it into the release branch and merge it back into the main line. Different people will have different opinions on which of these is the best-est practice.
Advanced SCM Branching Strategies is a good read on this topic.
In the environment that I work on, there is a branch for the release (release), a branch for the dev build (the trunk), and a branch (or two or three – as necessary) for various high risk changes. Each of these branches has a separate build in the build server.
2
I would recommend using a code review system in addition to your source control system. This would enforce that each commit would have to be reviewed and approved by other members of the team before being accepted to a repository. Other benefits would include improved knowledge sharing and development process transparency.
edit:
Slowing things down is a worry but if you choose the right software review tool and use it correctly it can be fast and greatly beneficial. slightly slowing the development process down in the short term in this manner would spare long debugging/fixing sessions when something silly gets into the central repository, saving time in the long term.
7