Consider a text index such as a suffix tree or a suffix array supporting Count queries (number of occurrences of a pattern) and Locate queries (the positions of all the occurrences of a pattern) over a given text. How would you go about unit testing such a class ?
What I have in mind is to generate a big random string then extract a random substring from this big string and compare the results of both queries with naive implementations (such as string::find). Another idea I have is to find the most frequent substring of length l
appearing in the original string (using perhaps a naive method) and use these substrings for testing the index.
This isn’t the best way, so what would be a good design of the unit tests for a text index ?
In case it matters, this is in C++ using google test.
The random generated big string can be a nice addition to test the cases that you did not think of. To make it effective you would need to run it several times which slows down your unit testing suite, so you need to find a balance between the number of times you run the test and how big you would make the string.
I would still create a normal test set with a given input for which you know what the answer is.
- a test for zero occurrences
- a test for a random number of occurrences
- a test for an occurrence in the first element
- a test for an occurrence in the last element
- a test for case sensitive
- a test with an empty tree or array
- a test for a tree or array with exactly one element
- tests for any other functional requirements you might have (escape characters, line endings, ignoring white space, concurrent use, exception handling, more occurrences than your counter can hold)
Hope this helps.
3