Googling “software testing theory” only seems to give theories in the soft sense of the word; I have not been able to find anything that would classify as a theory in the mathematical, information theoretical or some other scientific field’s sense.
What I’m looking for is something that formalizes what testing is, the notions used, what a test case is, the feasibility of testing something, the practicality of testing something, the extent to which something should be tested, formal definition/explanation of code coverage, etc.
UPDATE: Also, I’m not sure, intuitively, about the connection between formal verification and what I asked, but there’s clearly some sort of connection.
6
For a book exploring the math behind software testing … the seminal book to get is The Art of Computer Systems Performance Analysis: Techniques for Experimental Design, Measurement, Simulation, and Modeling
While it was first published in 1991 it is still popular today because it’s an applied mathematics book that focuses solely on performance analysis, simulation and measurements.
0
I can’t point to a good online resource (the English Wikipedia articles on these topics tend to be improvable), but I can summarize a lecture I heard which also covered basic testing theory.
Testing modes
There are different classes of tests, like unit tests or integration tests. An unit test asserts that a coherent piece of code (function, class, module) taken on its own works as expected, whereas an integration test asserts that multiple such pieces correctly work together.
A test case is a known environment in which a piece of code is executed, e.g. by using specific test input, or by mocking other classes. The behavior of the code is then compared to expected behavior, e.g. a specific return value.
A test can only prove the presence of a bug, never the absence of all bugs. Tests put an upper bound on program correctness.
Code Coverage
To define code coverage metrics, the source code can be translated to a control flow graph where each node contains a linear segment of the code. Control flows between these nodes only at the end of each block, and is always conditional (if condition, then goto node A, else goto node B). The graph has one start node and one end node.
- With this graph, statement coverage is the ratio of all visited nodes to all nodes. Full statement coverage isn’t sufficient for thorough testing.
- Branch coverage is the ratio of all visited edges between nodes in the CFG to all edges. This insufficiently tests loops.
- Path coverage is the ratio of all visited paths to all paths, where a path is any sequence of edges from the start to the end node. The problem is that with loops, there can be an infinite number of paths, so full path coverage can’t be practically tested.
It is therefore often useful to check condition coverage.
- In simple condition coverage, each atomic condition is once true and once false – but this doesn’t guarantee full statement coverage.
- In multiple condition coverage, the atomic conditions have taken on all combinations of
true
andfalse
. This implies full branch coverage, but is rather expensive. The program may have additional constraints that exclude certain combinations. This technique is good for obtaining branch coverage, can find dead code, but cannot find bugs stemming from the wrong condition. - In Minimal multiple condition coverage, each atomic and composite condition is once true and false. It still implies full branch coverage. It is a subset of multiple condition coverage, but requires fewer test cases.
When constructing test input using condition coverage, then short-circuiting should be taken into account. For example,
function foo(A, B) {
if (A && B) x()
else y()
}
needs to be tested with foo(false, whatever)
, foo(true, false)
, and foo(true, true)
for full minimal multiple condition coverage.
If you have objects that can be in multiple states, then testing all state transitions analogous to control flows seems sensible.
There are some more complex coverage metrics, but they are generally similar to the metrics presented here.
These are white box testing methods, and can be partially automated. Note that an unit test suite should aim to have a high code coverage by any chosen metric, but 100% isn’t always possible. It is especially difficult to test exception handling, where faults have to be injected into specific locations.
Functional tests
Then there are functional tests which assert that the code adheres to the spec by viewing the implementation as a black box. Such tests are useful for unit tests and integration tests alike. Because it is impossible to test with all possible input data (e.g. testing string length with all possible strings), it is useful to group the input (and output) into equivalent classes – if length("foo")
is correct, foo("bar")
is likely to work as well. For each possible combination between input and output equivalence classes, at least one representative input is chosen and tested.
One should additionally test
- edge cases
length("")
,foo("x")
,length(longer_than_INT_MAX)
, - values that are permitted by the language, but not by the contract of the function
length(null)
, and - possible junk data
length("null byte in x00 the middle")
…
With numerics, this means testing 0, ±1, ±x, MAX, MIN, ±∞, NaN
, and with floating point comparisons testing two neighboring floats. As another addition, random test values can be picked from the equivalence classes. To ease debugging, it is worth recording the seed used…
Non-functional tests: Load tests, Stress tests
A piece of software has non-functional requirements, which have to be tested as well. These include testing at the defined boundaries (load tests), and beyond them (stress tests). For a computer game, this could be asserting a minimum number of frames per second in a load test. A website may be stress-tested to observe response times when twice as many visitors as anticipated are battering the servers. Such tests are not only relevant for whole systems but also for single entities – how does a hash table degrade with a million entries?
Other kinds of tests are whole system tests where scenarios are simulated, or acceptance tests to prove that the development contract was fulfilled.
Non-testing methods
Reviews
There are non-testing techniques that can be used for quality assurance. Examples are walkthroughs, formal code reviews, or pair programming. While some parts can be automated (e.g. by using linters), these are generally time intensive. However, code reviews by experienced programmers have a high rate of bug discovery, and are especially valuable during design, where no automated testing is possible.
When code reviews are so great, why do we still write tests? The big advantage of test suites is that they can run (mostly) automatically, and are as such very useful for regression tests.
Formal verification
Formal verification goes and proves certain properties of the code. Manual verification is mostly viable for critical parts, less so for whole programs. Proofs put a lower bound on program correctness. Proofs can be automated to a certain degree, e.g. via a static type checker.
Certain invariants can be explicitly checked by using assert
statements.
All of these techniques have their place, and are complementary. TDD writes the functional tests up front, but the tests can be judged by their coverage metrics once the code is implemented.
Writing testable code means writing small code units that can be tested separately (helper functions with suitable granularity, single responsibility principle). The fewer arguments each function takes, the better. Such code also lends itself for insertion of mock objects, e.g. via dependency injection.
2
Maybe “specification-based testing” also kind of answers your question. Check these Testing Modules (which I haven’t used yet). They require you to write a mathy expression to specify sets of test values, rather than writing a unit test using selected single data values.
Test::Lectrotest
As the author says, this Perl Module was inspired by Haskell’s Quick-Check Module. There are more links on this page, some of which are dead.
One mathematically-based approach is all pairs testing. The idea is that most bugs are activated by a single configuration option choice, and most of the remainder are activated by a certain pair of options taken simultaneously. Thus most can be caught by testing “all pairs”. A mathematical explanation (with generalizations) is here:
The AETG system: an approach to testing based on combinatorial design
(there are many more such references)
There are some mathematical equations used, but it depends on the type of software testing that you’re using. For instance, the Critical Fault Assumption assumes that failures are hardly the product of 2 or more simultaneous faults. The following equation is: f = 4n + 1. f = function that computes the number of test cases for a given number of variables (n) + 1 is the addition of the constant one where all variables assume the nominal value.
Another type of testing that requires mathematical equations is Robustness Testing is testing the robustness, or correctness of test cases in a test process. In this test, you would input variables within the legit input range (clean test cases) and input variables outside the input range (dirty test cases). You would use the follow math equation: f = 6n + 1. 6n indicates that each variable has to assume 6 different values while the other values assume the nominal value. *+ 1 *represents the addition of the constant 1.