Not a specific question as such, so understand any down votes or negative reaction.
I recently joined a team with an existing codebase which segregates the I/O operations into categories each having it’s own thread executor e.g. one for JDBC operations and another for REST calls into which CompletableFutures are submitted. This kind of makes sense and cleanly separates the I/O responsibilities I guess.
However it’s the size of the thread pools which caused me to raise an eyebrow. For instance the JDBC pool is an executor with 50 threads. When I questioned this and pointed out the idiomatic sizing guidelines provided by Goetz for CPU (#processors + 1) and I/O bound operations, I was told not to worry. They are just “soft threads”.
The JDK is pre-21, therefore virtual threads are not in the picture.
Am I being spun a line of nonsense here ?
5
It just doesn’t matter. For all the FUD about having loads of threads, the ‘cost’ of a single thread is infinitesemal.
The cost of a ‘context switch’ is also small; let’s get a sense of perspective here, writing a software stack that hops from thread to thread 100 times a second is no problem, CPUs are a lot more capable than these fear stories insinuate. Also, context switches are fundamental – a large part of the cost is paid by clearing out CPU cache pages and loading in new context, which you have to do regardless of threading model – if it’s all a bunch of pooled NIO-optimized async wundercode, it… still context switches just as much if not more. cache pages are small, and trying to serve 1000 incoming requests at the same time means you’ll be invalidating those cache pages a lot no matter how you care to solve the problem!
There are a few places where you run into actual trouble, and these problems are solved by pools. The trouble that the pools solve occurs when you have thousands of threads; hence, having ~50 threads per ‘concern’, with 10 concerns, for a total of 500 threads, is not a problem whatsoever.
The problems that thousands of threads cause:
-
If 1000 connections come in all at the same time, your system will now attempt to serve all 1000 jobs roughly simultaneously. If the job at hand fundamentally just takes 20ms to complete no matter how many threads or optimized code paths you care to throw at it, then it’s going to have to take 20,000ms (20 entire seconds) to get through all 1000 jobs. In this setup even if 100% optimal, each job will take 20s to complete. However, there is some number such that trying to parallelize more than that amount of jobs does not meaningfully improve performance (in fact, likely, decreases it). Let’s say it’s 10. Then it’d be better to pick up 10 out of the incoming 10k jobs, deal with them all and deliver (those people got a near instant response: 200ms), then another 10 (400ms), then another, and so on.
-
Overload. If the requests are endless, and come in faster than they can be processed, eventually some system somewhere needs to start hanging up, because not doing that means your machine will just crash completely (Out of memory errors because the amount of open jobs just keeps growing forever). A thread pool system makes that easy: You have a queue of things left to do, that queue has a hard limit above which the inflow system just starts hanging up in requestors immediately, and the thread pool just goes through em top to bottom.
-
stack sizes. Each thread gets a stack and the RAM for those starts adding up. That’s RAM not usable by heap and such. If each thread has 1MB worth of stack (that’s relatively little, even!) and you have 1000 threads, 1GB of RAM is gone just for stacks alone. You can easily make a system that runs 5000 threads simultaneously, but that does mean using the custom
Thread
constructor to pick a stack size, make it very small, but now your code needs to be written to deal with that; if you ever get one of those stack traces that has 140 lines in it, that’s probably not going to work on a small stack. It’s a complication that you can solve by limiting the total # of threads. With 500 threads in the system each having 1MB stacks that’s ‘just’ half a gigabyte worth, that is managable; in contrast to have 5000 threads total where it is not.
All of those problems simply do not occur if you have a small constant x multiple threads vs cores. So, 8 cores, 40 threads? No problem. 8 cores, 4000 threads? That might not be such a good idea unless you really know what you are doing.
Looking at it from the other side, if you have 8 cores, and 10 threads for those cores, and 5 of the threads are ‘stuck’ waiting for something, that is bad – now your CPUs are idling. The java library ecosystem, including most of the java.*
classes themselves, do not really document how and when they freeze a thread, let alone make it possible to use automated source introspection tools to know. And in a certain way, they all do – any time code attempts to run code in a class that has never been loaded yet, the classloader system runs in that thread and accesses disk which therefore means the CPU falls asleep! In practice this cost is amortized and goes away a few seconds after boot, but, theoretically speaking…
Making that 50 threads reduces the odds significantly.
It’s hard to know: How do you test that your CPUs are idling? Are you in the habit of writing unit tests that set up a full system and then fake a flood of simultaneous requests, then check how fast the system plows through it, how it deals with a sustained stream of requests coming in faster than can be handled? Because you should really have that if you decide to try to ‘optimize’ by running only as many threads as you have cores, or apply the Goetz rule.
NB: However, for different reasons, that setup seems highly suspect to me. It is much more difficult to write and reason about code that has to ‘farm out’ any DB interaction that it does to a separate pool, it gets you into a java version of callback hell. You should have e.g. one thread per incoming request, and a pool of JDBC connection objects. Not ’50 threads for JDBC queries, 50 threads for reading files, 50 threads for calculating order lists, 50 threads for processing HTTP headers’ and so on.
3