I built an application with spring boot and spring data jpa.
I am now trying to figure out what I need to do myself to let my application recover from a complete infrastructure failure, or a partial infrastructure failure.
My thoughts are purely academic at the moment and there should not be a a scenario like the ones I am going to describe. I just want to know what measures I should take to make my application more robust in further development.
My application is managing employee data and orchestrated different background management tasks.
What happens if the database dies or becomes unavailable. Maybe there is a really expensive query running that blocks all resources of the database. Maybe the network goes down due to a temporary blackout. Maybe the database crashes.
What does jpa do in these cases? Does it retry automatically? Does it go into an offline mode and cache all changes locally writing them to the database when it becomes available again?
Does it throw an exception and discard the data?
Is there any mechanism to automatically retry failed writes. I do not really care about failed reads, since I usually don’t read directly but search from the entity manager. My contract is that I need to create entities in a specific order, so in case sb reads fail short term. (like when the virtual server gets frozen for a couple of seconds due to a backup) the entity manager has the new entities managed and I can access them.
Since the backups usually only occur in the night I don’t expect any writes from user activity to happen during backups and any recurring jobs are scheduled far outside the backup window for my application, so there should not be a chance for backup and a scheduled job colliding. And after we changed our backup solution I will make sure to integrate with it to hold backup of my servers until the jobs are run, so we do not get inconsistent data if we need to roll back a day or two.
I am going to implement methods to deal with external systems like our ldap server going down, our user management locking up, or our message bus getting amnesia. Everything that I mentioned did happen already and I need to protect against that.
I did not test anything yet, but I would expect one of two things to happen:
Best case:
Database crashes, queries get cached by hibernate, backend locks up until database comes back online.
Worst case:
Database crashes, connection pool crashes, backend crashes.