Generating test data for a search application

I have a general question about testing search applications, and what I’m looking for is pointers to resources on the topic that I can go and research on my own. I’ve tried semi-informed, semi-undirected googling, but that’s yielding a lot of distractions and blind alleys (or maybe my search skills aren’t all that sharp).

A bit of setup first. When I say “a search application”, what I mean this:

you have some data sources which you can put together in a search index
your application has an API that takes as input a search query (keywords and optionally other stuff), and its output is a relevance-ranked list of results from the search index.
there is a whole bunch of business logic on top of just retrieving the results from the index – the final result set in the output could have a large edit distance from the original result set from search.
assume that in real life, the index is large and takes a while to build

The task is to write tests for the application. The basic structure of a test is “given search request X, I expect response Y consisting of relevance-ordered results”. The problem, therefore, is: what is a good strategy for generating the underlying data for the tests?

Here are some approaches I’m aware of (and have used in practice):

Don’t generate test data. Start with a real index, and apply targeted modifications to it to “introduce” edge cases for your tests as needed. Upside: close to real life. Downsides: large test index; has to be rebuilt every time some change is made to the indexing scheme; most of it is unused by existing test cases.
Generate fake data such that for each request X there is a well-defined, intentionally constructed set of results Y that will be returned. Upsides: full control over the search data; only as much data as needed for the tests, quicker and easier to change. Downsides: still have to rebuild everything in the indexing scheme changes; not necessarily realistic data, which may leave aspects of the system untested or under-tested; too much flexibility and test-specific domain knowledge separate from the real-life domain of the application.

Actually that’s where my current knowledge ends. Something tells me there is either a nice middle ground which allows for testing flexibility without deviating from how the application works in real life, or a completely different testing approach that does away with these concerns. What approaches might you consider?

Short answer: you need both

fake data, with well defined input X and output Y
real-world data, probably with the modifications you suggested

Use the first one especially when doing TDD (as your tag indicated), and after you have the basic algorithm ready, use the second kind of data for integration or acceptance tests. The first kind of tests will prevent you from the need of running the (probably slow) second kind of tests more often than necessary.

Something tells me there is either a nice middle ground which allows for testing flexibility without deviating from how the application works in real life or a completely different testing approach

Sorry, but there is no “magic bullet” so far. Testing complex algorithms is hard work, sometimes difficult, requiring analytic skills. There are whole books written about how to construct test cases efficiently, and the techniques described, for example, by Glenford Myers in his book about software testing, which was published first 1979 AFAIK, are still valid today.

It’s not clear to me what you try do, it can be one of two very different things that you want to test, and you should not conflate them together:

Test that something is not awfully broken with your system – i.e. “write testcases”. In this case, it’s best if you generate some fake (probably relatively small) corpus of documents that you index, and write precision/recall tests against that.
Actually measure the performance of the “search system” in the sense of measuring “how well it serves the user in real life”. E.g. this is something that you are likely to do if you want to, say, build a competitor to Google (you’d want to evaluate how good is your search when compared to Google; or, if you’re already working at google, you’d want to measure “how good is my new Hummingbird algorithm, when compared to Panda and Penguin?”).
If this is what you measure, then this is a completely different goal, and you shouldn’t measure anything synthetic, but instead you should measure actual user behaviour. There are AFAIK two widely-accepted (and used) performance measures:

2.a Measure “time-to-answer” for the typical user (how long does it take for a user of the system to ‘succeed’? This is especially applicable where you have a good measure for success – e.g. if you’re working for a stock photography site, success may mean “image was downloaded or added to wishlist”). Note that it’s likely that the user will perform more than one query in a single session, you will actually assume that all those queries are related and that the user may have difficulties expressing his need, and may try different searches, all with the same goal.

2.b Measure Mean Average Precision (see http://en.wikipedia.org/wiki/Information_retrieval#Mean_average_precision for a quick definition, and http://fastml.com/what-you-wanted-to-know-about-mean-average-precision/ for a nice simple explanation of what MAP really is).

In the wikipedia article you’ll find other performance& correctness measures, but these two are all you really need to know 🙂

What is a good strategy for generating the underlying data for the tests?

I would use a modified version of the second approach:

Generate fake data such that for each request X there is a well-defined, intentionally constructed set of results Y that will be returned.

But instead of querying the database directly your searchengine should be implemented against datasource-specific repository-interfaces

Each repository-interface has one implementation that uses the database and one fake implementation that can read the result from a human-readable textfile. This way your testdata is less dependant to database-schema-changes.

While testing the searchengine uses the fake-repository.
For each test there is test-specific answer-file that can be maintained with a text editor.

Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa Dịch vụ tổ chức sự kiện 5 sao Thông tin về chúng tôi Dịch vụ sinh nhật bé trai Dịch vụ sinh nhật bé gái Sự kiện trọn gói Các tiết mục giải trí Dịch vụ bổ trợ Tiệc cưới sang trọng Dịch vụ khai trương Tư vấn tổ chức sự kiện Hình ảnh sự kiện Cập nhật tin tức Liên hệ ngay Thuê chú hề chuyên nghiệp Tiệc tất niên cho công ty Trang trí tiệc cuối năm Tiệc tất niên độc đáo Sinh nhật bé Hải Đăng Sinh nhật đáng yêu bé Khánh Vân Sinh nhật sang trọng Bích Ngân Tiệc sinh nhật bé Thanh Trang Dịch vụ ông già Noel Xiếc thú vui nhộn Biểu diễn xiếc quay đĩa Dịch vụ tổ chức tiệc uy tín Khám phá dịch vụ của chúng tôi Tiệc sinh nhật cho bé trai Trang trí tiệc cho bé gái Gói sự kiện chuyên nghiệp Chương trình giải trí hấp dẫn Dịch vụ hỗ trợ sự kiện Trang trí tiệc cưới đẹp Khởi đầu thành công với khai trương Chuyên gia tư vấn sự kiện Xem ảnh các sự kiện đẹp Tin mới về sự kiện Kết nối với đội ngũ chuyên gia Chú hề vui nhộn cho tiệc sinh nhật Ý tưởng tiệc cuối năm Tất niên độc đáo Trang trí tiệc hiện đại Tổ chức sinh nhật cho Hải Đăng Sinh nhật độc quyền Khánh Vân Phong cách tiệc Bích Ngân Trang trí tiệc bé Thanh Trang Thuê dịch vụ ông già Noel chuyên nghiệp Xem xiếc khỉ đặc sắc Xiếc quay đĩa thú vị

Filed under: softwareengineering - @ 21:02

Thẻ: acceptance-testing, functional-testing, search, search-engine, tdd

Thiết kế website giá rẻ

Danh mục

Generating test data for a search application