When debugging, I sometimes find that I make some changes and I am not 100% sure why those changes correct some bug in the program. Is it essential to understand every single detail about why some bugs were occurring and why certain changes eliminated those bugs? Or is it common among developers to sometimes get the program working without really knowing the details about why the fix worked?
2
I would say that it is essential to understand every single detail about why some bugs were occurring and why certain changes eliminated those bugs, and it is also common among developers to sometimes get the program working without really knowing the details about why the fix worked!
The art of changing things until a bug disappears, without understanding what caused it or why the change fixed it, is often called “voodoo programming,” and it’s not a compliment. There is really no way you can possibly be confident that you have genuinely fixed a bug, as opposed to partially fixing it for the particular case you were investigating, if you don’t understand what caused it.
In the worst case, you haven’t done anything at all except move the bug: I remember from first year computing at uni, when many students were learning C and pointers for the first time, pointer bugs would often stop manifesting when they changed things randomly, because the changes would rearrange data structures in memory enough to make the pointer bug stomp over a different bit of memory. Obviously that hasn’t helped at all.
But having said that, the commercial realities of programming are often such that satisfying the client that a bug is fixed is more important than satisfying yourself. I’d never recommend you declare something fixed if you had no idea what caused it, but if you can see that some code was problematic, and you reworked it, even if you’re “not 100% sure” how that caused the specific bug to manifest, sometimes you just have to move on to the next bug before the client screams too loudly about your slow progress.
If you think a client is mad about it taking too long to fix a bug, imagine how mad they will be about a bug recurring that you claimed was fixed, or a fix for one thing making something else worse. If your fix is only a workaround or mitigation, customers will usually still welcome it, but you must be honest about what it is, and you put as much logging as you need in order to fix it for real.
If you’re pretty sure you fixed it, but don’t know why the fix works, ask someone. Most engineers I know love to get questions like that because of the mystery behind it.
1
Changing stuff until the bug is no longer there is generally bad practice, but unfortunately a reality for some people.
I’m of the strong opinion that you should never write code that you don’t understand what it does or why it does it. How can you be sure that even though you’ve fixed the bug you’ve set out to fix – you haven’t broken anything else?
Generally, before you fix a problem/bug – you should be doing an underlying cause assessment/analysis to determine why the issue is occurring and if it can be replicated. Then you should be reading the code and understanding why the code is causing the bug to happen. Once you have that understanding: then you can start to look at how the resolve the issue and determining other areas that your change(s) will impact. Unit tests can really help here!
I’ve seen a number of code changes that people have made to fix an issue (which is great), but it unfortunately introduced other issues because the developer was unaware of the full impact of what they changed. Many of these “fixes” just obscure the underlying cause of the original issue as well as introducing complexity and more bugs.
Having said that, I’ve fixed a number of issues in code purely by association. Where I’ve changed/reworked/refactored something and it fixed other outstanding bugs. So although I don’t know what caused them originally, I found dodgy code and “fixed” it – which happened to fix those bugs too. I cover changes like this with unit and integration tests to ensure the integrity of the business and technical requirements of the function.
Or is it common among developers to sometimes get the program working
without really knowing the details about why the fix worked?
There are at least three big problems with that:
-
It leads to a black magic mindset where you give up on the idea that you can understand the code and instead just start moving parts around hoping that problems will go away. This is the programming equivalent of pushing food around on your plate, hoping to make your dinner look sufficiently eaten that your parents won’t make you eat more of your vegetables.
-
You can’t know that the bug is actually fixed or just masked by your change unless you understand a) what the problem was, and b) how your change solves the problem.
-
The bug is probably not fixed, and it’s going to bite you again in the near future.
I see two scenarios: you worked on something else and the bug stopped happening, as long as the something else hasn’t broken anything else, you pretty much have to let that go in–you did what was needed/wanted and it had an unforeseen and inexplicable positive side effect.
The other is that you are working on this bug and a random change made things work, that’s unacceptable. If you have no idea what the old code was doing wrong, you probably have no idea what the new code is doing wrong.
I can’t really think of a good reason to checkin the second case — if it’s a critical bug, then it’s critical to get it right. If it’s a non-critical bug, at least you can be sure that you are not introducing a critical bug with your “fix”.
Or is it common among developers to sometimes get the program working
without really knowing the details about why the fix worked?
I for one think it is very very common these days. That’s because of Google and Stackoverflow. You have a problem with your code, just google it, find solution, fixed, move on to the next problem.