Are there any studies that aggregate data over a wide population of contributed code, that establish a correlation between amount of code written in a commit and the # of bugs discovered in that code ? It’d be hard to do in github without knowing whether a change was due to new functionality or a bug, but you could determine a relation between lines of code per commit and how much thrashing eventually goes on in that code.
2
It just depends.
If all your program does is Console.WriteLine over and over.. chances are it won’t have any bugs no matter how big it gets. If you’re writing the next great document database, chances are you’ll have a lot of bugs.
You couldn’t scrape this information from github because you don’t know how hard the problems people are trying to solve.. If most projects on gitHub are the complexity of a tic tac toe game, again, you probably won’t see a ton of bugs. Your analysis could fool you and say “Wow codebases can expand with relatively few bugs or none at all!”.
Bugs are more related to complexity, is what I’m getting at.
The only metric that I’m familiar with that tries to relate possible defects to program size is one of Halstead’s complexity measures. The figure used is B = (E^(2/3))/3000
or B = V/3000
where B is the number of delivered bugs, E is the amount of effort, and V is the program volume. If you simplify down to the counted values, these equate to either B = ((n1/2)(N2/n2))/3000
or B = (N1 + N2) * log2(n1 + n2)
where n1 is the number of distinct operators, n2 is the number of distinct operands, N1 is the total number of operators, and N2 is the total number of operands.
Your number of bugs per commit may be related to the delta in bugs before the commit and bugs after the commit.
However, the validity of Halstead’s metrics have been questioned – if you search for academic studies, you’ll find papers that indicate their validity as well as papers that seem to indicate little to no validity of the metric. To the best of my knowledge, they are not widely accepted nor is there overwhelming evidence that they are empirically valid.
1
Its proportional to the number of functions/methods not covered by unit tests.
Bugs = K + M * <functions that are not tested> - N * <Integration Test Coverage>
17