The general consensus seems to favor the Crash Early approach, the most reputable source being the acclaimed Pragmatic Programmer book.
And while I understand and agree with the advice in many situations, I wonder if once the program is deployed in production, isn’t it also valid to simply log a warning and proceed the execution?
In many cases might be best to show a partial or empty response than no response at all. Imagine the stack exchange module to fetch similar questions, isn’t it best to show some results than no results at all?
Could a method that iterates over a collection, when passed a null value coerce it into an empty collection and effectively skip execution, instead of crashing? Or a method that receives a negative number coerce it into the minimum/maximum applicable value, instead of crashing?
1
… I wonder if once the program is deployed in production, isn’t it also valid to simply log a warning and proceed the execution?
Short answer: No.
Every function you write runs in its only tiny little world that only knows about the argument values passed to it. It can only change or reinterpret the intention of those values if it understands the world “outside” of itself better than the code that’s actually “out there”, passing those dodgy values in. (hint: that’s unlikely)
… isn’t it best to show some results than no results at all?
“Are there any indications that this nuclear reactor core has gone runaway and is about to start melting its way through the floor? Hmm; this is taking too long; I’ll just return ‘OK’. ”
OK, slightly facetious example, but in this particular case, then yes, it really does matter that you bring back all the right results. It comes down what’s considered “good enough”?
Could a method that iterates over a collection, when passed a null value coerce it into an empty collection and effectively skip execution, instead of crashing?
And do what?
A box of chocolates is a set of zero or more confectioneries; given a box and a big enough appetite, you can “iterate” through all of them. How can you do that if you’re not even given the box?
Your function has been told to expect a collection but it’s been passed nothing. That’s a really good time to let an Exception get thrown; that way, the calling code becomes responsible for handling the mess that it just created, passing you bad (missing) data.
Or a method that receives a negative number coerce it into the minimum/maximum applicable value, instead of crashing?
Why would a negative number cause a crash? Perhaps it’s an indexer into an Array or List? Unless there’s a business reason to allow negative values (and, admittedly, there could be) then here’s another good reason to “pass the buck” upwards, back to the calling code, courtesy of another Exception.
3
As usual, there are obvious upsides and downsides to coercing. It will let your program run on in production, when it is weak and alone and has no one to hold its hand; it must manage whatever conditions it is put into all by itself, and warping to the minimum value is the obvious thing to do. If crashing or skipping a task altogether would be very expensive or catastrophic to your business, then there’s a good case to be made for making software robust.
But that’s only one side of the coin. In the long run, programing like this makes your code unreadable, long, ugly and unmaintainable. If a negative value comes in, you should be asking yourself: why? If it’s from direct user input, why did your front-end allow the minus sign in the first place? If it’s from an internal value, was that supposed to happen? If not, why did it happen? Is there a completely different defect in your program that allowed this? Pursuing and correcting such errors will improve the quality of your code base much more than adding yet another patch for yet another special condition that might cause an exception.
Unfortunately, these two approaches are hard to combine. The point of testing software is to try it out in the situations that it will actually be in, so writing finicky, error-sensitive code in development and then deploying robust code isn’t possible at the same time. Therefore it is important that you are aware of such trade-offs so that you can make the correct judgement call whether robust processing no matter what is more valuable to you than having a well-designed code base that reduces the number of systemic problems. This is something that no runtime, no coding policy and not even the best online programmer’s forum can decide for you.
9
In your examples you are not talking about failing early but rather about how to handle expected failure cases.
Handling expected failure cases
You gave the similar questions module as example. It can be aware, that one of its submodules may not be able to deliver and then just shows the rest of the questions that it got by its submodules. A counterexample would be a medical software for finding patients with similar symptoms. It would not just show the rest, but rather print a warning that some patients could not be listed because of some error, but still continues to work and fulfill its job, which would be “print all patients, or an error message if you can’t print all the patients”. This is a important difference, but has nothing to do with failing early. It is about handling expected failures because the modules are aware of this specific possibly occuring failure. Note that this has only to do with business logics, nothing with the technical handling.
Handling unexpected failure cases (or rather, not doing so…)
The submodule for delivering (part of) the similar questions could fail because it for example may expect, that there is a network connection available. If there is none, it could go on and and continue working with an emtpy question list but that is not reasonable – it just makes no sense. So, as it does not know how to handle this situation, it is considered as an unexpected error and so it throws an exception, therefore stopping its work immediatly, telling its caller that it cannot fulfill its contract and deliver the expected results. So, instead of pretending to be able to fulfill its contract (and in the worst case produce business logic errors somewhen later) it fails early.
So, to answer your question: yes, you can have your similar questions module to print just some of the questions if it cannot fetch all and still following the guideline for failing as early as possible, because they do not contracit each other.
Could a method that iterates over a collection, when passed a null value coerce it into an empty collection and effectively skip execution, instead of crashing?
However, if the method contract says “I expect a collection and then do something” and it then receives a null value, it should fail early, because it cannot fullfill its contract. If it does not fail early and continue doing stuff, then bad things may happen and I cannot see any advantages of it. Unless the whole software blows up because of one low level exception – but then it is poorly designed anyways and the – correctly failing fast – function is not the root of the problem.