I have often heard unit tests help programmers build confidence in their software. But is it enough for verifying that software requirements are met? I am losing confidence that software is working just because the unit tests pass.
We have experienced some failures in production deployment due to an untestedunverified execution path. These failures are sometimes quite large, impact business operations and often requires an immediate fix.
The failure is very rarely traced back to a failing unit test. We have large unit test bodies that have reasonable line coverage but almost all of these focus on individual classes and not on their interactions.
Manual testing seems to be ineffective because the software being worked on is typically large with many execution paths and many integration points with other software. It is very painful to manually test all of the functionality and it never seems to flush out all the bugs.
Are we doing unit testing wrong when it seems we still are failing to verify the software correctly before deployment? Or do most shops have another layer of automated testing in addition to unit tests?
6
In general no. For toy programs (say solutions to Project Euler problems), sure.
This is before getting into a religious discussion of what are unit tests (if it uses a file system object, is it a unit test? Let the inquisition begin!)
For N-Tier applications, where N > 1, integration tests are needed, and even they may not actually be suitable for requirements verification, as most integration tests are between two tiers, and not normally end-to end testing. Most end-to-end testing (a minimal test for most requirements) is usually done as / in conjunction with system level test.
On a large project I worked on, we had CUT (Code and Unit Test) done by the developer. The build code then went into Integration and Test (I&T), where some minor requirements that could be verified (warning alerts will be displayed in ‘FUNKY_RED’ in the admin console). When it passed that, it was released to System Level Test (SLT). The ‘Requirements Testing’ was done in SLT, where they had a mirror of the operational site. Finally, it was deployed and went to Acceptance Test.
Some of this would change, depending on the scale of the project. But in my experience, anything with more than one module/jar/library will not be sufficient with just unit tests.
Add in testing a GUI, and it gets really unlikely that a you can have 100% confidence without live interaction.
EDIT:
Looking back, I think that take away is that for complex systems, simple testing (like unit testing) is not going to be sufficient for requirements validation. Complex systems will require complex (sophisticated?) testing to verify.
3
You have several things to consider:
-
Requirements must be testable. If they’re not testable, then you can’t verify them with tests.
-
Unit tests are not proof of correctness, mainly because they cannot provide 100% code coverage in any practical way.
The first bullet is fairly easy to correct, if difficult to implement in practice: make your requirements testable. The best way to do that is to define an acceptance test at the same time that you write the requirement.
Bullet two just raises the question, “How do I write more reliable software?” That, my friend, is the subject of a lifetime of study.
However, from a testing standpoint, you can improve the situation by adding Integration Tests and System Level Tests. Unit tests, on their own, are often insufficient to completely describe a rigorous test suite. Work with your stakeholders to define and write tests, the scope, cost and benefits of which everyone understands.
1
To fully test a system, you would have to put it through every possible series of input conditions, so for any non-toy system, no there’s no way any testing system should be expected to catch all errors.
However, this is a case where perfection is the enemy of “good enough”. If a testing system reduces the chance of shipping a defective product, or reduces the number of shipped defects, or reduces the amount of time it takes to reasonably test a system before it ships, then it has value.
Simply having “large unit test bodies that have reasonable line coverage” is no guarantee of bug-free code. However, it’s probably a lot better than no testing system at all.
Note that unit testing is only one type of software test. It wouldn’t hurt to create a few automated integration tests either. Not only that, when I am about to release a new version of software I always manually check that I can do a couple critical functions, just in case.
The phase “unit test” began as a description of testing individual functions and
other small bits of code / capabilities. If that is all you are testing–bits of
your system, in isolation from one another–it is no wonder that you are losing
confidence in your tests to be a fair measure of overall code quality or “does
it do what it’s supposed to do?”
Very narrow testing helps a little, but just a little. It’s great that you
aren’t getting production failures that trace back to things that have been
tested. The problem seems to be the white space in between–the untested
execution paths, the combination of elements that happens when all of the parts
come together that are not currently tested. Without being too Pollyannaish,
there is your problem. Or a good part of it, at any rate.
This is perhaps heresy, but it’s worked well for me: Unit tests are no longer
about isolated, narrow units. They are a general purpose automated testing
facility that can and should test not only individual functions, classes, and
objects, but combinations of them, at varying levels of composition.
I may be running PHPUnit
or py.test
or whatever the local “unit test”
framework is, but I eagerly add tests up to and including tests of the entire
integrated application or service. I run complex examples of the entire app
functionality, and do so not just against simple data cases, but also against
some gnarly, ugly, complex edge-cases.
It isn’t as easy to test that way. It requires creating some
mock objects, and/or building entire
virtual machines and populating them with servers, data sets, and other glue
logic just to run the tests. It’s more work to set up that test scenario, but
with virtualization and cloud instances, tools like
Fabric and Vagrant make
it doable in ways that it just wasn’t, five or so years back.
It can even be complicated to determine if a test succeeded or failed. If you’re
used to testing small functions, it’s hard to see why–but start trying to judge
whether you’ve properly manipulated a complex data structure and it becomes more
clear. It can be even worse if you’re trying to see a particular path through a
GUI or web app to ensure it “did what it’s supposed to do.” But with modern
tools, it is possible–and the results are extremely helpful at exercising the
whole system.
Testing can never be a proof of correctness under all possible conditions. But
exercising your app or service as a whole, with complex and edge/corner case
examples designed to stress your logic…first it will show you where a lot of
your bugs lie. (“The truth will set you free — but first it will make you
miserable.”)
But as you fix those and your system is executing hard cases successfully and
routinely, then your confidence in its correctness and robustness will grow.
As you find new bugs and fix them, you will add tests for them, and other cases
like them you hadn’t previously considered. On subsequent passes at the testing
job, you’ll find new things you want to test, or new variants you want to test
against. You add them in, and next run, you’re testing even more of the total
system.
If anyone wants to argue that I’m not really unit testing–that I’m doing some
form of integration, module, or system testing–well whatever! I use the same
(unit testing) frameworks, test runners, “did it pass?” reporting that I would
for testing a 2-line function. If you want to give fancier names for some of the
tests, knock yourself out. What I care about is that the tests are automated,
that I can run them easily for every software build, and that they test as much
of the total functionality as possible.
Long story short, instead of losing faith in testing, double down on it. Do the
testing that you need, but aren’t currently doing. Test things together, in ever
more challenging and complete combinations, against ever more execution
environments. You’ll be amazed at the bugs that shake out, and at how much you
learn about your system interactions as a result. Untimately, you’ll be amazed
at how you move with confidence from version to version, because your shakedown
has been real and systematic.
3
Indeed, as you have found, unit tests are frequently inadequate for this.
What you need is integrated testing that will look to test:
- dependencies
- interactions
- different viewing devices / platforms / versions
You may also need to revisit unit tests and review how much sad path (as opposed to happy path) is currently being tested. Developers frequently focus on the happy path but often do not anticipate or take the time to cover complex cases and all the sad paths.
We have large unit test bodies that have reasonable line coverage but almost all of these focus on individual classes and not on their interactions.
In my experience, this is caused by poor class design, and occasionally bad unit test writing.
In an (achievable) ideal world, you unit test class A: it takes the right input and produces the right output. Then you unit test class B: it takes the right input and produces the right output. If the “right” output of class A is fed as the “right” input to class B, then it’ll just work.
The only way that things go south is if your class defines some nuanced definition of “right” due to implied coupling either in data or time or sequence or… This is where good class design comes into play. By reducing your coupling and making your “right” input clearer, then there’s less chance that people can screw that up.
The other thing that can happen is that your unit tests work with some naive use of “right” that doesn’t test boundaries, or actually exercise your code while still getting code coverage. Bad tests will yield bad results – the worst of which will be the passing test that provides no confidence.