I have a Java web application that supports multi-tenancy to keep customer data
separate.
Connection pools to each customer database are created at runtime. The
details of each customer shard (jdbc url, credentials, etc) are stored in a shard
“control” database. When the application needs to access a customer shard, it tries
to obtain a JDBC connection from that shard’s connection pool. If that connection
pool has not be established yet, the application looks up the necessary details from
the control schema and creates the pool.
This is a fine setup but it has one major issue that I’m not sure how best to
solve. This application is eventually going to be clustered with each app instance managing its own customer shard connection pools. The problem is what to do when the location of a customer shard changes. This doesn’t happen very often but it does happen. The application has an admin interface for managing the shard details and I could put a function in there that says “rebuild this shard’s connection pool” but you would only be interacting with whichever application instance the load balancer sent you to on that request and all of the other instances would not be aware that you wanted the pool rebuilt.
So here are the ways I can think of to deal with this along with the pros and cons as I see them.
1) Since this scenario is not a frequent occurrence, just forcibly restart all of the application cluster instances when this situation arises and call it a day.
PROS: Requires no further work.
CONS: This is more workaround than solution. That said, restarting all app
instances in a cluster might not be an easy as pushing
a button and then going for a coffee break. Additionally, an application
user with permission to manage customer shards might not be someone who
is allowed to initiate a cluster restart.
2) Before releasing a connection from the connection pool, check the control database to see if the pool’s definition has been changed and if so, reconfigure the pool before releasing the connection to whomever requested it.
PROS: Most straight-forward approach. Application instances are guaranteed to get a connection to the right shard using the freshest information.
CONS: Major performance overhead. You need to hit the control schema and read all of the shard definitions and then compare them to what you’ve currently got loaded for every single connection request in the application.
3) Have a background thread on a timer that looks for shard definition changes and rebuilds the pools when necessary.
PROS: You avoid the performance hit involved with doing the check upon every request for a connection.
CONS: Once a shard definition is changed, there is going to be a period of time in which each application instance has the wrong information. This can have lots of consequences including application exceptions until the shard definition change is picked up. The longer the check interval, the worse this problem is going to be so you’d like it to be as short as possible, but the shorter it is, the bigger the performance hit because you need to load the shard definitions from the control database.
4) Wait for an exception to be thrown while trying to obtain a connection from a shard’s connection pool and then take that opportunity to hit the control schema and see if the shard definition has changed.
PROS: None of the performance problems of options 2 and 3
CONS: Assumes that you’re changing the shard definition because the old location is no longer functioning which may not be true. That alone I think is reason enough to kill this option but I’m listing it here because it came to mind.
5) Create a “shard definition change” topic in messaging infrastructure that all application instances would be subscribed to and reload connection pools in response to those.
PROS: None of the performance issues of options 2 and 3. None of the stale data issues of option 3.
CONS: I don’t have any messaging infrastructure available to me today. It’s coming but nothing I can put a date on yet.
I’d appreciate any thoughts/experience/advice you can share with this problem.