One of the few things that most software developers agree on is that
you shouldn’t rely on code to work correctly unless you test it. If
you don’t test it, it may have hidden bugs that are only going to
cause you more work down the road.
I understand how to test my normal code, but how should I test my test code to make sure it can effectively find and report errors when they are present? I personally have been stupid enough to write erroneous test cases that would pass when they should not have, thus defeating the purpose of my writing tests in the first place. Fortunately, I found and fixed the errors in time, but according to testing mantra it seems like no test suite would be complete without having its own set of tests to make sure it worked.
It seems to me that the best way to do this would be to make sure the test fails for buggy code.* If I spend 2 minutes alternately adding bugs to the code and making sure it fails, I should have an acceptable degree of confidence that the tests ‘work’. This brings me to my second question: What are good ways to introduce bugs to make sure that they are caught by the test cases? Should I just randomly comment out statements, make sure the wrong branch of an if-else
gets run by negating its condition, and change the execution order of code with side-effects, etc., until I’m satisfied my tests will catch most common bugs? How do professional developers validate that their tests actually do what they’re supposed to do? Do they just assume the tests work, or do they take the time to test them as well? If so how do they test the tests?
I’m not suggesting the people should spend so much time testing their tests and then testing the tests for their tests that they never actually write the real code, but I’ve done stupid enough things that I feel like I could benefit from a bit of ‘meta-testing’, and was curious about the best way to go about it. 😀
* I could check to see if the test passes when testing ‘bug-free’ code, but using the code as a spec for the test seem quite backwards…
3
The standard flow for TDD is:
- Write a failing test. (Red)
- Make the smallest code change that makes it pass (Green)
- Refactor (Keeping it green)
The test for your tests in this case is step 1 – making sure that the test fails before you make any code changes.
Another test that I like is whether you can delete some code and re-implement it a different way, and your tests fail after deletion but work with a different algorithm in place.
As with all things, there is no magic bullet. Forgetting to write a required test is just as easy for a developer to do as forgetting to write the code. At least if you’re doing both, you have twice as many opportunities to discover your omission.
7
One approach is Mutation Testing, using a tool like Jester:
Jester makes some change to your code, runs your tests, and if the tests pass Jester displays a message saying what it changed
Tests for tests? Don’t go that road. Then you’ll probably need tests for tests for tests, and then tests for tests for tests for tests… where do you stop?
Usual testing flow goes like this, and as a developer, you’ll spend majority of your time on points 1-3:
- Code
- Unit tests
- Integration tests
- System/other automated
- QA/human testers
If I spend 2 minutes alternately adding bugs to the code (…)
Your code will eventually “grow” its own bugs, don’t waste time introducing them by hand. Not to mention, is a thing you knew about upfront really a bug? Bugs will come, I wouldn’t worry about that.
Should I just randomly comment out statements, make sure the wrong branch of an if-else gets run by negating its condition (…)
This is actually a viable approach to verify whether you actually test what you think you do. I don’t think it is always that good as it suffers from the same problem as “test for test for test …” thing: when do you stop altering code knowing code you’re testing 100% works?
It’s also good to remember about all time classic pragmatic programmer advice – you ain’t gonna need it. Be agile, write tests and code for actual bugs, instead for those hypothetical that might or might not appear.
2
By construction, functional code and test code are tested one against the other. One problem remains: the case of common mode bugs, when a bug in functional code is hidden by a bug in test code. TDD is not immune to this effect. This is why testing is usually performed at multiple levels by different people in order to decrease this probability.
1
There is a mutation testing that evaluates and measures the suitability and quality of the test.
We can use this to evaluate “the test” itself.
In brief, We can evaluate our test (e.g. TestA) by testing TestA can find the difference between the code and its mutation codes(very similar but slightly different code with original code).
If TestA cannot find the difference between the code and its mutation codes, It means that the TestA has too rough regulations to test the original code.
0
You test your unit test once when you write it, in the debugger. Then you leave it alone and forget about it. There is no problem here.
Consider this. What is the purpose of a unit test? It notifies you when any of the numerous changes you will make in your main program accidentally changes the logic in that program. You want to have that because you know that any change potentially breaks something. Which is exactly why there is no problem if you do not test your test: you do not mess with your test until you purposely change the logic of your program (which would require you to revisit the test and test it once more), so your test is not likely to break accidently.
Maybe a slightly different question, but…
You might have more elaborate test setup for integration tests or if you’re in a larger project / software engineering org you might have a library of test fixtures, test tools, etc. There it might make sense to test your test fixtures and test tools by creating tiny test suites where you use those things. pytest has a neat example of how you might go about this in pytester.
Especially for new tests, a test failure doesn’t mean “my code failed” but “my code and my test disagree”. There’s no fairy godmother telling you which is wrong, so you check them both. And fix the one that is faulty, and sometimes both.