What is the best practice for caching paginated search results whose ordering/properties can be changed?
Say, in my application, someone wants to see the last 20 discussion threads (out of 10,000). A request would be sent to the database, via servlet
, to fetch the first 20 records from the discussion threads table as XML/JSON. If they then want to see the next 20, they go onto the next page of results and this fires off another request to get the next lot (limit and offset = 20, etc.).
In order to reduce server load and client-waiting, I would like to cache the previous pages of results. However, I have two questions:
- The table the results are shown in can be ordered by more than one attribute (i.e., thread-creation-date, thread-author, last-post-date). This means that a statement like ‘first 20 results’ makes no sense without context (i.e., what are we ordering by). How does the front-end, then, communicate to the back-end what it has already loaded? My first thought was to use IDs for each result but sending these back to the server on subsequent requests (and filtering results based on them) would just as time-consuming as sending everything back blindly. How can I do this?
- What if an attribute of a previously returned result (i.e., most-recent-post-date) has changed? We then need a way of checking each result to see if it’s been modified server-side since it was paged in. How can I do this?
4
It seems what you need is a wrapper for all the parameters that define a page (say, pageNumber
, pageSize
, sortType
, totalCount
, etc.) and use this DataRequest
object as the key for your caching mechanism. From this point you have a number of options to handle the cache:
- Implement some sort of timeout mechanism to refresh the cache (based on how often the data changes).
- Have a listener that checks database changes and updates the cache based the above parameters.
- If the changes are done by the same process, you can always mark the cache as outdated with every change and check this flag when a page is requested.
The first two might involve a scheduler mechanism to trigger on some interval or based on an event. The last one might be the simpler if you have a single data access point.
Lastly, as @DanPichelman mentioned, it can quickly become an overly complicated algorithm that outweighs the benefits, so be sure the gain in performance justify the complexity of the algorithm.
0
Just a thought – in your server call, pass in the usual parameters plus an array of MD5 hashes representing currently cached previously viewed pages of data.
The return call would contain all the usual data for the new current page, plus updates for any outdated previously viewed pages. You can use the old hash as the key.
I’d recommend a lot of performance & timing tests first – your client side code is going to be a lot more complicated than it would be if you simply hit the server for each page of data. Be sure the extra complexity results in a meaningful improvement.
1
I would probably handle it like this:
- Treat different orderings as different sequences all together. It won’t be worth the extra bookkeeping to track what each client has (or send it back over and over).
- Whenever the user pages, display immediately from cache while at the same time sending a GET to the server that includes either a hash or a last access time. The server only sends back a full page if something has changed.
- Retrieve from the server more than one UI page at a time. For example, if your UI displays 20 entries, query 60. I need to test this one, but my expectation is that the most efficient return size will usually be larger than the average amount of data shown on one page. This also makes the UI very responsive for some page turns.
- Prefetch reslts when you are nearing a boundary. This helps preserve those fast load times from cache.