I’m building an MVC
ASP.NET
IMDb-like website which presents various information about movies to users, relying on 3rd party APIs to (initially) fetch movie info.
The IDs representing each movie on the website are identical to IMDb’s (ID tt1234567
points to the same movie on both websites).
The way movie info are fetched (and subsequently stored on SQL server) is as follows:
- User requests movie with id
tt1234567
- MVC checks if movie with id
tt1234567
exists in the database.- If not, it queries external APIs to fetch movie information and stores it in the database.
- If yes, it fetches the information directly from the database.
This works fine and I’m at the point where I’m implementing caching. There are some particular columns for each row in the Movie
table which change every certain amount of days (e.g. overall rating and number of votes for that movie in IMDb). As such, I need to implement a function where relevant columns of the row are updated after certain amount of time (e.g. every 10 days).
What would be the correct approach to do this? Should I implement a scheduled task which runs every 10 days and updates the needed columns in the DB, or should I somehow add caching into the mix?
I think that Ewan’s advice is ok. 10 days is a long period of time, especially when your database grows to tens of thousands or more movies. Things to be considered when implementing the solution:
- if your web app is used less in some period of the day (based on logs), try to use that period for async operations like this (this is particularly useful when inserts or updates volume is high and tables are locked for a larger period of time)
- synchronous check since is also a solution, but pay attention to the extra delay for the client (time to fetch data from IMDB, parse, update in db). Based on discussion from here, it should not be a problem, though
I would favor the async option, since you have better control over how often you query IMDB (I expect to have some limitations to avoid abuse). Theoretically, if someone or a bot is crawling your site and you use the sync option, you might run into lots of fetches from IMDB.
if you go the task route write the task to run every 10 minutes, select movies which are out of date and update them. otherwise you will end up with a massive task every ten days.
But I would suggest you just run this check when a movies is retrieved from the db instead. If its not found OR out of date, goto imdb and update
Yes, a scheduled task could be one way of retrieving and updating your data.
You should consider setting up a windows service, quartz.net or hangfire.io are all alternatives for creating schedule based job execution.