I have the suspicion that many bug-fixes carried out by our developers sooner or later cause another bug, simply because the product is too complex.
I’d like to improve the quality of bug-fixes, i.e. make sure bug-fixes do not produce additional errors in the code. For that, I’d like to be able to measure how the quality changes as I apply various counter measures.
Is it a good idea to instruct bug-fixers to always look for the cause of their defect, and then evaluate the ratio of previous bug-fixes being such cause? I’m hesitant about this due to the obvious higher expenses and unwillingness of developers to blame themselves or fellow colleagues.
Is there a better approach?
1
Ideally you will have as many useful unit tests as possible, testing every combination of inputs to each class.
Then when fixing a bug, check the unit test that should have caught it. Maybe there is an edge condition or combination of inputs not previously encountered (add a new unit test). Maybe an existing test erroneously passed (fix the test).
If a bug fix causes another unit test to fail, you now have a measurable result that tells you about the quality of the bug fix. Of course you could just have an application with too tight of coupling where code changes in one area affect code in another area when perhaps they should not.
If you combine this with a continuous integration system that will recompile your code and run tests on a regular basis, you will have very quick feedback on the quality of bug fixes and the frequency with which they break other code.
1
Are your developers working under time pressure? Do they have quiet working conditions? Do they have the best debugging tools the money can buy? Do you have testers?…
There are too many factors which can make bug solving a task which creates more bugs than it solves. Too many to be enumerated here. If you notice that for every closed ticket, QA opens two more, talk with your team, see what’s wrong in the processes the team uses, find why the original bug existed in the first case.
For example, if a developer works under time pressure and pressure from the management and is assigned a ticket saying that when the original price is $0.89, the final one should be $1.20, the temptation would be to add:
if (price == 89) {
return 120;
}
This doesn’t solve the problem when the price is $0.88 or $0.90, but the ticket is closed and the management is happy, for now.
Is it a good idea to instruct bug-fixers to always look for the cause of their defect
Well, it is a really, really bad idea to fix defects without knowing the root cause. The problem is: you cannot force your team to always make a root cause analysis by instructing them, they have to train this.
and then evaluate the ratio of previous bug-fixes being such cause?
Evaluating such a ratio won’t improve the situation. Instead, I suggest you take every bug as an occasion to work with your teammates on how to make your code more bullet-proof. So discuss with your colleagues: why was this bug introduced in the first place, and what could be done to prevent such a bug next time:
- is the code too convoluted / not self-documenting enough?
- can the code be refactored to make it cleaner?
- was the usual code review forgotten?
- is there a missing test case (in your unit tests or your checklist for the manual tests?)
- …
How do you tell if the “new” bug is caused by a fix, or if it was there all along? The temptation is to say “I never saw this bug before that last patch, so the patch must have introduced the bug”, but if that was unilaterally true, how did the original bug get into production code? Increasing the quality of fixes would essentially entail increasing the cost of fixes and delay their deployment, because this means dedicating more time to testing a fix before releasing the patch (something, in my experience, management doesn’t want to dedicate a lot of time/money to). That also entails that the coders (or at least testers) actually be users of the system, too. In my situation, the team that works on patches has no idea whatsoever of the actual usage of the system they support, and it shows.
1