I’ve been toying with the idea of feature toggles for various configuration/access purposes, but have been a little unsure of myself when it comes to toggling something like a downtime or maintenance mode, where users should be unable to access anything during that time.
It seems like such a simple thing to do implementation-wise, but I wanted to get some input from those more experienced than I on the subject.
My first thought was just checking a flag in a database (this is a clustered environment, so a file-based flag seems nonideal in our setup), but it feels wrong with the database hit on every request. So, I moved to do it similar to a timed in-memory cache, but that also doesn’t feel right to me, as I wouldn’t want the cache to be so short-lived that it has to update too frequently from the db, but I also don’t want to turn on maintenance mode and potentially have users able to come in during the time before the update occurs. Should I maybe have a cache with a separate forced update trigger as mentioned here?
How have you handled toggling an app-wide maintenance or downtime mode?
0
Two-step switch
One possibility is to use two-step switch.
Imagine you need to perform a maintenance task during which users should not be able to access the database.
Step 1
You start by changing a flag in a database. This flag is not a maintenance mode flag, but indicates that clients should not use the database any longer since it will soon be put into maintenance mode.
The value of this flag can be cached for, say, five minutes to avoid repeatedly querying the database.
Step 2
Five minutes later, you start the actual maintenance task.
You’re now sure that everyone got their cache invalidated, so either the client already queried for the flag and know that the database is about to go into maintenance mode, or will necessarily query for this value before doing any other queries.
Pros
-
You don’t need to query the database too often for the value of the pre-maintenance flag.
-
There is no way for the client to start querying the database while maintenance is performed.
Cons
- You should need to wait for several minutes (five in my example) before performing the maintenance task. This is OK if this is a scheduled task or something which can be planned, but would be a problem for maintenance tasks which should be performed as soon as possible, such as changes to configuration related to a newly discovered security issue or measures used to prevent an ongoing attack.
Two-ways connection
If it is necessarily to be able to perform maintenance tasks quickly without additional planning (see Cons above), you may need to implement a two-ways connection which makes it possible for you—the service provider—to push a message to your clients, notifying them that the service will be switched to maintenance mode.
Although this sort of communication becomes more and more popular for web applications with the support of Web sockets by all major browsers, if I understand well, you’re dealing with desktop applications. If Java ecosystem doesn’t have a similar technology for desktop apps (I’m pretty sure it does) or if for some reason you can’t use it, you may be interested in legacy techniques used in web apps, such as long polling. You may read about those techniques on Comet page, especially how they affect bandwidth and what side effects they have.
Pros
- You can perform maintenance task nearly immediately while knowing that the clients are informed that you have switched to maintenance mode.
Cons
- Those techniques may be difficult to implement correctly (for example, how would you handle the case where you sent a message to a client, but the client neither haven’t responded, nor dropped the connection?)
1
There is a specific tool within Java EE for doing this. Java Management Extensions aka JMX (Oracle best practices)
With, you would set up a ‘managed bean’ (aka MBean) which you can then either poll externally (useful for setting up notifications on performance, resources, problems) or used to poke data back into the application (go into maintain mode now).
Many servlet containers support JMX administration. For example, Tomcat. Its right there – no external dependencies of a database or caching or other approaches. This is what it is meant for – the remote administration of an application (be it stand alone Java or a web application within some container).
The reason that I’m so high level with this is that it is a fairly large technology area of its own. It is the right way to handle this type of operation, but this is also akin to saying “for a CRUD app use REST” – the problem domain is very large and a more specific answer can’t really drill down too deeply.
1