Simply put: is there a best practice for whether developers should write automated tests to check that data is correct (the contents of config files, databases, etc.)? If so, what is it?
Assume that the data is static for a given release, and specified as part of requirements.
12
In most cases this will make no sense, a test if your data is correct may end up in comparing file A1 to a copy of the file itself A2. Lets say you need a new version of A1 because the format of your data has evolved in version 2.0 of your software, you would change it to A1_V2, and now your test tells you that A2 is different from A1_V2 – so you copy A1_V2 to A2_V2 (or edit it until both files are equal again). But if you made an error in the first transition from A1 to A1_V2, you now introduced the same error into A2_V2 – and the test does not prevent that. The only form of quality assurance related to the content of A1 that will work (especially if A1 is part of the requirements, as you wrote), is to let a second person proofread A1.
However, if A1 is in a data format which can be checked for consistency (for example, an XML file, maybe with a schema), then it makes sense to have a test for A1 beeing consistent with that file format. Of course, you probably have already such a test, because if A1 is already part of an automatic test of some production code, then this code reads A1. And if the reading routine is designed to be robust and fail-safe, as any reading code should be, then it does already this kind of consistency checking and will show up an error is A1 is broken. So in most cases, this means you don’t need any extra tests especially for data. But beware: such a test does not guarantee that the contents of A1 are correct, it only gives you a formal check.
There is, however, no rule without exceptions. There can be special cases (which I expect to be rare) where your specific test data can be checked for containing some content, maybe in a non-obvious form, to make sure the test will work as intended. For example, you may have a complex test database, and you want to make sure later manual edits of that database don’t destroy your set of ten well-designed, non-obvious test cases contained in that database. Then it would make sense to write an automatic test to verify that.
Also, if your data is itself some kind of “code” (for example, some scripts or functions written in some kind of DSL), then it is obvious that automatic tests make sense here.
I would like to mention that it does not make a difference if we don’t think about test data or data to be released to production as part of your product (since such data can be always used a test data).
To summarize: in most real-world cases I guess it won’t make sense, but the more complex your data is and the more implicit redundancies or formal constraints your data has got, the more tests maybe a good idea. So think about your real case and apply some common sense – that will surely help 😉
A tester must control the relationship between inputs and outcomes of its tests in order to reliably check the correctness of the execution. This relationship is called an oracle.
Usually this is performed by owning the input data, in which case data check is useless.
If the oracle is straightforward, the tester doesn’t need to control input data: run the oracle on input and outcome to tell whether the test has succeded or failed. Data check is useless again. The drawback is that the tester doesn’t control the execution path and therefore the test coverage.
The only case when data check is required is when assumptions are made by the software under test about its inputs and these inputs are not under tester control (e.g., a division algorithm implementing successive substractions: a null divider puts the execution in an infinite loop).
I can’t see why not. Imagine you have a config file that contains a database connection string, this gets run perfectly well on the dev environment but when you come to run it on the integration environment it still contains the old dev DB server connection.. disaster! (well, probably not as the dev box would surely pass all tests, but disaster if your DB upgrade scripts that you ran to bring the integration DB up to spec didn’t work and you never noticed because all your tests got run against the dev DB!)
so yes, I’d consider some form of test for config to be perfectly valid. The question then is, what should the data be, and how should you test it.
To test, I’d suggest a set of regex expressions that just print out what you expect to be valid. That’d be simple to implement and could run on any form of config file (ie no need to go parsing that xml or json).
For DB config, simple select and compare the contents of the (valid) recordset is effective.
I wouldn’t store these tests in the same place as the rest of the code though (or it wouldn’t be useful to test that the connection string is correct on the integration environment by comparing the DB server instance with “DEV” – as is likely to happen if the test and the config is stored in the same place). That means you’re better off with an integration test that can be run externally to the main source code. And would also form something you can give to support engineers to run against customer site’s config to check if they’re the right ones, and also that no-one had fiddled with them.