I am a young engineer recently employed at a small company that sells products to the general public. We use Ruby On Rails and MySQL. Our database has a lot of customer data, but a great deal more of “static” persistent data. This data is so rarely changed that some of the more senior engineers have begun to talk about moving it to a cache data store. One of the touted benefits of this plan is that it would make deployments easier, since we would no longer have to bootstrap the database; instead, a lot of data would have permanent IDs, and we could just backup the cache and deploy the backup.
My query is related to the operational aspects of running the site once the change is done, not the change itself. We have persistent data that is used for database queries in response to user activities, which is used on a day basis. Putting it in a cache would remove our abilities to run SQL queries, instead querying the cache from RoR code. I don’t have enough experience to judge if this is a good plan or not, but it smells funny to me.
Has anyone ever seen a situation similar to this, or have experience in this area? I’m looking for reassurance that this is not a crazy idea.
1
Databases are meant to store data that is rarely changed. That’s the whole point.
Caches are meant to store data that is frequently used, to make access to it faster than the database would provide on its own.
Unless the data to be cached is very frequently used, or your company has a lot of money that it needs to waste, it makes little sense to cache it.
Horrible idea. I have no problem with caching the data and updating it when the database changes but if you remove the data from the database you are likely to remove data integrity and make it impossible for things which do not use the applications, like reporting qeries or dataimprots to find the information they need. Further, ORMs do not write efficient code by and large and you might have to write some complex code in a stored proc to be able to performance tune it later. You are removing the ability to that effectively.
If someone suggested removing the lookup values from my database, I would be most unhappy as it would make it impossible to do my job and impossible to keep the integrity of my data.
I would agree that this does have a strange smell to it. But as the new guy it’s important that you really dig into the deployment process and understand the pain points before you argue against changes proposed by more senior team members, or you risk alienating yourself from your peers.
One question worth asking is why the MySQL deployment is such a burden. I’m not sure why a production database needs regular re-deployments. You should only need to deploy it once and populate it with data. From that day on it should be straightforward to maintain and improve your DB with non-destructive schema modifications and data migrations against a live installation, so your data can live forever.
You’ll also want to come from the other side and understand the caching solution being proposed, what the deployment overhead for that will be, and what practical functionality you will lose by switching (if any). For example, it’s straightforward for clients to integrate analytics tools against a DB with an SQL interface which is decoupled from your application logic. Also, as a developer you will likely find that this decoupling gives you more flexibility to fix problems and improve your app without rolling a new code release.
Another question to ask is how your cache will be populated and how much data will go in it. Caching implies non-persistent storage, which means you’ll still need to populate it on each deployment. It also means you’ll need to have enough RAM in your production environment to store all of this data. And anyway, caches are typically backed by some sort of persistent storage, such as a MySQL DB…another reason why this smells funny.
As with many things it depends.
Moving data that changes rarely to a different place may make sense if the data is getting in the way. If it’s not getting in the way, why would you do it?
From your description I see that some people think the data gets in the way and you think that moving it out makes your life harder. We can’t really weigh these against each other, only you and your other engineers can.