I currently have made a multithreaded server in c++ for windows, where each connected client gets a thread to themselves which stays running until the client’s socket is closed.
When a client connects, they can run commands (fetch motd, get current version of the client) or can login to get access to other commands (fetch user information, download files, etc).
So far this basic approach has worked well, but I have noticed that there have been an increasing amount of clients using my service and I’m starting to worry about how efficient this approach is. I’ve read online that a process in windows can only handle somewhere around 1500-2000 threads, and I think I might soon hit that amount of simultaneous users, and I think it’s about time I changed my design around.
What would be the best way to handle this amount of users?
2
The 2000 thread limit per process is the result of the default maximum stack size for a Windows process. As the linked article states, you can increase the limit by having one dedicated thread per client but is not the most scalable solution.
Instead, use asynchronous I/O completion ports to handle your I/O. Instead of blocking the thread, the code waits on a group of open sockets and is notified when data is received from one of them. Pass the data as a “work item” to a thread pool, a group of threads that Windows manages. Windows even provides BindIoCompletionCallback to automate it for you. This should make much better use of the resources you have, allowing you to scale up further.
The other option is to shard your users into separate groups, each on a different server. If the users do not need to communicate with each other and the data (such as the motd or files you mention above) is stored centrally. You can add a new server every few thousand users.
Similarly, if users need not be tied to specific servers, you can allocate users to servers dynamically in a cluster, meaning a failed server is handled transparently. You do not say where your server is hosted but load balancing is cheap these days, available as part of AWS and Azure.
0
You don’t mention how many users you think this thing is going to scale to, and if that number is unlimited or unknown (but growing) this isn’t going to scale very well in the long run.
One way to do this is that you keep a pool of authenticated user IDs with some sort of expiry value of 15 minutes or something. If a user makes a request, your logic should check to see if they are logged in and within the expiry limit (also reset this limit) if they are outside the expiry limit, ask them to log in again.
It seems that you’re going to also run into trouble if you keep it this way but expand your server horizontally and do load balancing. Race conditions may or may not affect you in this instance.
Just some naive thoughts, but it seems like the goal would be to push multiple users using a single thread and keep a ratio of n users to 1 thread based on load and blocking.