I’m writing a simple little program to transmit MIDI over a network. I know that the program will encounter transmission problems and / or other exception situations that I won’t be able to predict.
For the exception handling, I see two approaches. Should I write the program so that it:
- fails with a bang when something goes wrong
or - should it just ignore the error and
continue, at the expense of data integrity?
Which approach would a user reasonably expect?
Is there a better way of handling exceptions?
Additionally, should my decision about handling exceptions be affected by whether or not I am dealing with a network connection (ie. something where I can reasonably expect to have problems come up)?
5
Never ever should you ignore an error that your program encounters. At the bare minimum, you should log it to a file or some other mecanism for notification. There may be occasional situations where you will want to ignore an error but document it! Don’t write an empty catch
block without any comments explaining why it is empty.
Whether the program should fail or not depends a lot on the context. If you can handle the error gracefully, go for it. If it is an unexpected error, then your program will crash. That’s pretty much the basic of exception handling.
9
You should never silently ignore errors, because your program is built on a series of actions which implicitly depend on everything that’s gone before them going right. If something goes wrong in step 3, and you try to continue on to step 4, step 4 is going to be starting out based on invalid assumptions, which makes it more likely that it will end up generating an error as well. (And if you ignore that too, then step 5 throws an error, and things start to snowball from there.)
The thing is, as the errors pile up, eventually you’ll run into some error so big that you can’t ignore it, because it will consist of something being given to the user, and that something will be completely wrong. Then you have users complaining at you about your program not working right, and you have to fix it. And if the “give something to the user” part is in step 28, and you have no idea that the original error that’s causing all this mess was in step 3 because you ignored the error in step 3, you’re going to have a heck of a time debugging the problem!
On the other hand, if that error in step 3 makes everything blow up in the user’s face, and generates an error saying SOMETHING WENT BADLY WRONG IN STEP 3!
(or its technical equivalent, a stack trace,) then the result is the same–the user complaining at you about the program not working right–but this time you know exactly where to start looking when you go to fix it.
EDIT: In response to the comments, if something goes wrong that you anticipated and know how to handle, that’s different. For example, in the event of receiving a malformed message, that’s not a program error; that’s “user provided bad input which failed validation.” The appropriate response there is to tell the user that he’s giving you invalid input, which is what it sounds like you’re doing. No need to crash and generate a stack trace in a case like that.
4
There are other options between “blow up” and “ignore.”
If the error is predictable and avoidable, change your design or refactor your code to avoid it.
If the error is predictable but not avoidable, but you know what to do when it happens, then catch the error and handle the situation. But be careful to avoid using exceptions as flow control. And you may want to log a warning at this point, and maybe notify the user if there’s some action they might take to avoid this situation in the future.
If the error is predictable, unavoidable, and when it happens there’s nothing you can do that will guarantee data integrity, then you need to log the error and fall back to a safe state (which, as others have said, may mean crashing).
If the error is not something you’ve anticipated, then you really can’t be sure that you can even fall back to a safe state, so it may be best to just log and crash.
As a general rule, don’t catch any exception you can’t do anything about, unless you just plan to log and rethrow it. And in the rare cases where a try-catch-ignore is unavoidable, at least add a comment in your catch block to explain why.
See Eric Lippert’s excellent exception-handling article for more suggestions on categorizing and handling exceptions.
0
These are my views on the question:
A good starting principle is to fail fast. Specifically, you should never write error handling code for any failure for which you don’t know the exact cause.
After applying this principle you may add recovery code for specific error conditions that you encounter. You may also introduce several “safe states” to return to. Aborting a program is mostly safe, but sometimes you might want to return to another known good state. An example is how a modern OS handles an offending program. It only shuts the program down, not the whole OS.
By failing fast and slowly covering more and more specific error conditions you never compromise data integrity and steadily move towards a more stable program.
Swallowing errors, i.e trying to plan for errors for which you don’t know the exact cause and therefore have no specific recovery strategy for, only leads to an increasing amount of error-skipping and circumventing code in your program. Since one cannot trust that previous data was correctly processed, you will start seeing spread out checks for incorrect or missing data. Your cyclomatic complexity will spiral out of hand and you will end up with a big ball of mud.
Whether or not you are aware of the failure cases is of less importance. But if you are for instance dealing with a network connection for which you know a certain amount of error states, postpone adding error handling until you also add recovery code. This is in line with the principles outlined above.
You should never silently ignore errors. And especially not at the expense of data integrity.
The program is trying to do something. If it fails, you have to face the fact and do something about it. What that something will be depends on lot of things.
In the end the user requested the program to do something and the program should tell them that it didn’t succeed. There are many ways how it can do it. It may stop immediately, it may even roll back already completed steps or on the other hand it can continue and complete all steps it can and than tell the user that these steps succeeded and those others failed.
Which way you choose depends on how closely the steps are related and whether it is likely that the error will reoccur for all future steps, which might in turn depend on the exact error. If strong data integrity is required, you have to rollback to last consistent state. If you are just copying bunch of files, you can skip some and just tell the user at the end that those files couldn’t be copied. You should not silently skip files and tell the user nothing.
Ad edit, the only difference that makes is that you should consider retrying some number of times before giving up and telling user it didn’t work, since network is likely to have transient errors that won’t reoccur if you try again.
0
There is one class of cases where ignoring errors is the right thing to do: When there’s nothing that could possibly be done about the situation and when poor and possibly incorrect results are better than no results.
The case of decoding the HDMI stream for display purposes is such a case. If the stream is bad it’s bad, yelling about it won’t magically fix it. You do what you can to display it and let the viewer decide if it’s tolerable or not.
I don’t believe a program should silently ignore or cause havoc whenever it runs into an issue.
What I do with internal software I write for my company…
It depends on the error, lets say if it is a critical function that is entering data into MySQL, it needs to let the user know that it failed. The error handler should try to collect as much information and provide the user with an idea of how to correct the mistake themselves so they can save the data. I also like to provide a way silently send us the information they where trying to save so if worse comes to worse we can enter it manually after the bug is fixed.
If it is not a critical function, something that can error and not affect the final outcome of what they are trying to achieve, I may not show them an error message, but have it send an email that automatically insert it into our bug tracking software or an email distribution group that alerts all programmers in the company so that we are aware of the error, even if the user is not. This allows us to fix the back end while on the front end no one knows what is going on.
One of the biggest things I try to avoid is having the program crash after the error – not being able to recover. I always try to give the user the option to continue without closing the application.
I believe if no one knows about the bug – it will never be fixed.
I am also a firm believer of error handling that allows the application to continue to function once a bug is discovered.
If the error is network related – why not have the functions perform a simple network communication test before executing the function to avoid the error in the first place? Then just alerting the user that a connection is not available please verify your internet etc. etc. and try again?
1
My own strategy is to distinguish between coding errors (bugs) and runtime errors, and, as much as possible, make coding errors difficult to create.
Bugs need to be fixed as soon as possible, so a Design by Contract approach is appropriate. In C++, I like to check all my preconditions (inputs) with assertions at the top of the function in order to detect the bug as soon as possible and make it easy to attach a debugger and fix the bug. If the developer or tester instead chooses to try to continue running the program, any loss of data integrity then becomes their problem.
And find ways to prevent the bug in the first place. Being strict with const-correctness and choosing data types appropriate for the data they will hold are two ways to make it difficult to create bugs. Fail-Fast is also good outside of safety critical code that needs a way to recover.
For runtime errors that could possibly happen with bug-free code, such as network or serial communication failures or missing or corrupt files:
- Log the error.
- (Optional) Attempt to silently retry or otherwise recover from the operation.
- If the operation continues to fail or is unrecoverable, report the error visibly to the user. Then as above, the user can decide what to do. Remember the Principle of Least Astonishment, because a loss of data integrity is surprising to the user unless you’ve warned them ahead of time.
Failing is the right option when you have reasons to think that the overall state of the program is unstable and something bad can happen if you let it run from now on. Somewhat “ignoring” it (i.e., as others have pointed out, logging it somewhere or displaying an error message to the user, then going on) is ok when you know that, sure, the current operation cannot be performed, but the program can keep running.
0
My definition of “error”: My software cannot do what it is supposed to do on a high level. My software can encounter a situation that someone calls an “error” that my software can actually fix easily, or that is actually wanted (for example delete the first file in a directory until there is an error because the directory is empty and there is no first file).
Then there comes a conscious decision what is the best way to handle the situation that my software cannot do what it should do. There are situations where ignoring the error is the best – media playback is one example. What do you do with a self driving car? Depending on the error, leaving the road if brakes are not working, or driving straight on and praying, effectively ignoring the error, might be the best. That will be a conscious and difficult decision.
Next, if you act, you think about what action is useful to the user. Logging to be able to fix a problem in the future is likely useful. Otherwise, the big question is: can the user fix it, possibly by waiting, possibly by asking for help? If yes, advise them what to do. If not, tell them that it didn’t work. Make sure your error handling is not worse than the error.
And important, try to prevent any situation where an error could create problems. For example, in the case of data corruption it may be better to restore data from a bsckup than to continue with corrupted data.
Important: There is not one rule to follow. Handle every situation individually and intelligently.