It is pretty common for a web application to display a list of items and for each item in the list to indicate to the current user whether they have already viewed the associated item.
An approach that I have taken in the past is to store HasViewed objects that contain the Id of a viewed item/object and the Id of the User who has viewed that item/object.
When it comes time to display a list of items this requires querying the database for the items, and separately querying the database for the HasViewed objects, and then combining the results of these queries into a set of objects constructed solely for the purpose of displaying them in the view.
Each e.g li then uses the e.g. has_viewed property of the objects constructed above.
I think it is time to find a better approach and would like to know what approaches you would recommend.
Although this could be answered with a number of database design ideas, there is a problem with the premise of the question as posed. The question as you’ve written it makes it apparent that you are looking to solve this problem on your own, but why reinvent the wheel, and especially, why reinvent it on the server?
Your first sentence says “It is pretty common for a web application … to indicate to the current user whether they have already viewed the associated item.”
This is true — it is common — but I’ve rarely (maybe never) seen simple click-visiting-status achieved through server-side databases. Almost universally it is accomplished using the native browser history, through the use of :visited
CSS pseudo-selectors.
How to do it, and how REST helps
- The user visits a page
- The browser records that page (URL) visit in history
- Any time a link’s href matches a visit in history, the
:visited
pseudo-selector will apply to that element.
For example:
:visited {
color:red;
}
will apply a red color to any visited link on a page (unless a conflicting style rule is more specific).
How REST helps
When you use REST URIs appropriately, you are more fully able to utilize the web browser’s native features, including visited status, and other features such as caching.
For example, if your web application uses consistent URIs, then the browser is easily able to recognize whether you’ve viewed an item no matter where its URI appears throughout your application:
- /user/13
- /user/13/friends
- /post/welcome-page
But, if you have poor, inconsistent URI creation, this is problematic. Here are some example URIs that break caching and visited links (based on common real-world patterns):
- /user?id=13
- /user?id=13&nav=profilepic
- /user?id=13&tab=details
- /user?tab=details&id=13
To the viewer, these links may all appear to point to the same page but the browser will have no knowledge of that.
If your application has this problem, it’s probably because the URI has been overloaded in purpose to track things that it’s really not meant to. But, it is probably still easier to fix this than to track “visited” state for every viewer in your database.
Why use REST and native browser features?
This pattern has many advantages.
-
Don’t reinvent the wheel. Visited status already exists in browsers.
-
Using appropriate URIs is a server problem. Solve server problems on the server.
-
I can’t bold this enough: Scalability (!!!) Viewing those URIs is a distributed problem. Always prefer to solve distributed problems on the client.
The complexity of this feature could absolutely explode if you attempt to solve it on the server. Think about it. If you have N visitors and M distinct resources, then you have a problem magnitude of N × M. You have to resolve the problem every time N or M (or both) outgrow your previous solution. Any time you have a new type of resource that you didn’t account for you will need to do more fixing. Your ongoing resource costs will be higher as well.
If you use browser native features you don’t have to do any of this. The problem is already distributed.
-
You are using the features of HTTP for the purposes they were designed for. URIs were designed to identify resources. Browsers are good at using URIs to get them, and servers are good at delivering them.
When does a server solution make sense?
Finally, there are a few cases where a server solution might make sense:
:visited support insufficient:
- If you need to change content more significantly than just colors, since the capability of
:visited
has been reduced for the privacy of the user. - Your users frequently have more complicated browsing behavior than can be served by the browser history, such as frequently changing devices
Other features:
- “Recently viewed/edited items” or similar — display lists of only visited items.
- “Not viewed yet” or similar — display lists of only unvisited items, such as marking inbox items as read
- “Who has viewed this” — Tracking popularity or alerting other users to who has viewed a particular item, like LinkedIn’s profile views feature.
1
If you do not want to store the visited state information on the server, and in case the user does not have an account with your application, but still you must know if s/he has already been on a page or has reviewed an item, then what comes to my mind is a cookie. You can store a list of the item ids that the user has visited in a cookie, and check with it each time you load an item to see if it has already been visited.
Keep in mind that of you have many items that a user can visit, the cookie may grow too big. Also, the user may delete it at any time without you to notice or take any measure.
A plus with this solution is, if you particularly aim at visual effects or other front-end related functionality, that you can do this on the client with javascript, with no need to modify your server-side code
When it comes time to display a list of items this requires querying the database for the items, and separately querying the database for the HasViewed objects, and then combining the results of these queries into a set of objects constructed solely for the purpose of displaying them in the view.
What type of items are those ? Aren’t those already stored in db ?
If those are not already in DB, then of course you can use @NickC solution.
If they are already coming from database, then I guess you have to fire another query to figure out which one has been read or not. It’s about database-design, to cache the repeated queries.
The improvement here is to improve the database structure, because that another query can’t be ignored. All that can be done, is proper indexing & caching.