I’ve recently had a little disagreement with fellow developers. We’re transforming various ontologies from the original source format (Pica+, RDF, etc) into our data format and have several converters dedicated to exactly this task. The argument came whether or not we should use the full ontology-data file, or create a subset specifically used for testing.
I argued the case for a subset: a custom-created file of all use-cases required to confirm the converters are running correctly. They argued that a production file is always up-to-date and any change in the production file will break the tests. True, I said, but if a change of production files is needed, then invariable the requirements are changed, therefore this is not a fault of using test data. They considered this a very weak argument and summarily decided against the use of a custom-created test-file.
The question, as it boils down to, is whether or not to use the in-production file for testing our converters, or a subset of data. What are the pros- and cons? I’m very much in favor of using a test-file for converting, but I’m willing to assume that my premise is flawed. If I am correct, is there any way in which I can more eloquently and persuasively argue my case?
I strongly suggest you to have both
-
artificial data in one or more small test files to check each requirement on its own (maybe for unit testing)
-
one or more production files of a certain size to check things you did not think of when designing your artificial data (this gives you an integration test)
To my experience, the chances are high that those two types of test files will catch different kinds of bugs. Furthermore, tests with small data are typically executed much quicker, this makes the chance much higher that someone actually uses that tests regularly (don’t know if that’s a factor for your product).
When your requirements change, you will have to update both kind of files, so this is neither an argument for the first nor for the second kind of test files.
5