We have a project that was written over a period of 2 years, but has a poorly designed architecture and not unit tests.
The software works well, but we’re at the point where we want to refactor some core modules.
The budget is also limited so we can not hire a sufficient number of developers to write unit tests.
Would generating unit test code automatically to cover (as an example) integration scenarios via some too be a viable strategy? An assumpion is that considering the system works fine right now, and the its output can be converted into XML data for unit testing?
This approach gives us a possibility of quickly starting to refactor existing code and receieve immediate feedback if some core functionality is broken because of those changes.
2
Well, I’ve done that with a hierarchical tree data structure I wrote. The data structure parses an input data set and creates a tree based on that data set and its defined data relationships.
I created trees using various input data sets that I knew would adequately cover the various cases (there are about sixty tests), serialized each tree to XML, and used those serialized strings as expected results for the unit tests, when I was satisfied with the XML output each tree produced.
Did it work? Yes, pretty well, in fact, and it took a fraction of the time it would have taken painstakingly writing individual tests by hand.
Are there disadvantages?
- The tests are fairly brittle (they break when any change is made to the code).
- It’s not always clear what exactly is being tested in each test, since there is no mapping of tests to individual requirements.
- If one test breaks, several others also tend to break.
- Diagnosing a failed test requires taking the expected and actual test results, pasting them into a text editor, and running a diff to see what changed.
How are you going to generate matching input and output? On one hand you have the output produced by the code being tested. On the other hand you have — what? The output of a different implementation of the same algorithm? That may make sense if there is a trivial but slow algorithm that can be used to test a more complex faster algorithm. Or are you going to eyeball the expected output and just test against undesired changes? It is very easy to accept incorrect output in that case.
In my experience generated tests are as valuable as the labor that goes into them.
I never heard about unit test generation. If it was possible, it would be used by everyone. If something exists, I’m sure that it can’t give you the guarantee you that your code is properly covered.
I think that the best approach for this kind of legacy code is to write test for any new feature and any new refactoring. The first steps will be quite dangerous for the application stability but you will be soon the owner of a relatively large set of tests.
This way could be really efficient because, in a first time, you will be in front of the worst parts of the code (for necessary refactoring) and the core features (because they will be impacted by any evolution). Like that, the most critic sections of your application will be covered quickly.
You will not achieve a complete covering but if you never need to go back into an old piece of code it probably means that this code has a weak granularity with the rest.
What do you use to stimulate the unit test generator?
Consider:
int foo(int x, int y)
{
return (x + y);
}
Your unit test generator can certainly generate code to test the addition, but it has no way of knowing that what you really wanted was a multiplication. I saw something like that bite a co-worker of mine really badly several years ago. We were using Ada, doing operator overloading to impose dimensional analysis on our arithmetic. He cut-and-pasted the rename for “+” four times, and changed the operator on the left, but not on the right. It cost him some hair-tearing to figure out why his multiply routine was returning sums instead of products. (Doing this protects you against trying to add a velocity and a distance, which is by definition meaningless.)