These days I was asked by our technological leader to populate a cache.
The exact nature of the request impacts a back-end (BE) system that operates on a database through a client library.
The BE is a Java Spring Boot application and the client library carries out its operations and internally maintains its Caffeine cache.
It can happen the system takes a few hundreds of milliseconds to answer client requests; partly due to the high complexity of carried-out checks and to a single instanced slow database, on which I can’t operate.
The request about populating the cache has been justified as a strategy to avoid a set of requests to the BE system to prevent slowdowns the client library could suffer. The client library is in turn exploited by a third BE to provide service.
Here is a synthetic version of the schema: 3rd BE <=> library (cache here) <=> BE 1 <=> DB
.
One important detail about BE 1
is that the DB
is responsible for storing entities and generating keys, that are incremental integers. The strategy he suggested is to update the library and allow adding the entity to the cache, in such a way that when requests are made cache hits will prevent BE 1
from being queried. This, in turn, prevents the entity from being persisted on the database (it should be persisted later).
To me, this request is absurd.
It presents problems on several fronts:
- concurrency issues,
- cache clearing problems,
- lack of persistency,
- high programming costs,
- high maintenance costs.
One strategy to achieve this result is to query the database and identify the ID the entity should get when persisted, add the entity to the cache, and then persist it to the database.
When an entity is actually persisted I should check whether the ID has been already taken by another concurrent write and check they do not conflict. In case the ID has been taken by another entity I should clear and re-populate the cache.
The question is, are there alternatives to this (anti-)pattern?
Chaos is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
9
Your concerns 1,2,3 are solved by UUID or a sharded identifier (where part of identifier is either globally unique or identifies a library instance).
It would take some time to implement, but maintenance would not be a concern.
In general, write cache is a well-explored topic and should not be conceptually hard (just a tad harder than read cache).
One point to care about is about loss of atomic consistency of data across multiple instances of library. You have already solved that somehow with your read cache invalidation procedure, that invalidation may require an overhaul.
1
I rephrase the description to support the answer: there is a database and three backends, where the third backend is just exposing the database and the main concern is concurrency while persisting in high throughput conditions.
To avoid concurrency the entities has to be identifiable by a combination of properties excepting the randomly generate identifier while persisting the entity. Based on the combination of the properties identifying it an entity could be placed in cache making the caching operation insensitive, that is caching either after writing, before writing or after reading turning the choice of when to cache an entity a sole concern choice that is saving database interactions by caching in 2nd backend, in a common cache for read and write, before passing the entity to the 3rd backend to persist and by caching in the 3rd backend after read. To be effective both caches, from the second backend and from the third backend, has to support expiring cached entities.
user449800 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.