I’m starting to learn about caching and have thought of the following problem, and really only see one way to solve it but I’m sure I am missing something
Lets image that I have two endpoints, one for creating a car struct and one for deleting a car struct. They look like so
localhost:2000/car/{id} (GET)
localhost:2000/car/{id} (DELETE)
I want to imagine a scenario where I am using lazy-loading to populate the cache, and the cache is currently empty. The code for the endpoints could look something like so
getCar(id):
car = getCarFromCache(id)
if car == null {
car = getCarFromDB(id)
updateCacheWithCar(id, car)
}
return car
deleteCar(id):
deleteCarFromDB(id)
deleteCarFromCache(id)
Let’s imagine that our database has 1 car in it, with ID of 100. Lets now imagine that two users simultaneously hit both the GET and DELETE endpoints, and the CPU manages the threads like so:
- getCar(100) is called, and a cache miss occurs
- the car with id 100 is fetched from the database
- deleteCar(100) is called, and deletes the car from the database
- deleteCarFromCache(100) is ran, and removes the field from the cache
- the updateCacheWithCar(100, car) is ran and the cache now has a car in it with id 100, despite the car being deleted
Subsequent requests to getCar(100) will return the car, even though it has been deleted!
Is the only way to avoid something like this to wrap BOTH the call to the DB and the cache update in a mutex? Or is there a better way to go about it. Any information regarding caching and invalidation is really appreciated, as I’m new to it
1
To specifically handle this issue, I would probably:
- Store the ‘deleted state’ in the cache just like undeleted state.
- When you do a
GET
request and find there’s no existing cache, at the time of storing make sure you only update the cache if there’s not already an entry. Most caches systems support this type of operation.
You can do the same for your PUT
operations and has another advantage that you’re warming the cache after a change.
Another option is to store the cached copy in the database as well, so you can use a transaction to handle these kinds of conflicts. This of course only makes sense if your reason for caching is because calculation is expensive. If your goal is to reduce database load this makes less sense.