When reading about web servers, frameworks, etc most of the time I notice that the goal is to have a technology that has the next features:
- Able to handle as many connections as possible.
- Fit an I/O model (connections to DBs and other web services).
Those features fit the actual web model but I am interested in knowing which technologies will fit a heavy-cpu user case.
For example, Node.js is a technology that really shines when you have to write an application that uses a lot of I/O. On the other hand, due to the Node.js nature of being evented, It is not suitable for being use in CPU-heavy user cases.(video encoding, machine learning, graphics)
I also have take a look at Haskell web frameworks like Snap and Warp and at the benchmarks they really are fast.
Are they Haskell web frameworks suitable for CPU-heavy problem? Which other languages/technologies are candidates?
What you’re asking for is typical of an n-tier architecture.
You want a web tier that is designed for IO, that’s its primary task. When it needs to perform some heavy processing then you need to farm that out to another tier that is designed for CPU crunching.
A lot of websites do this (at least the really big ones), they have a farm of web servers spreading the IO load, but these are there solely for presentation work – handing back static files and taking in requests. When a request for work comes in, the web server passes the request to an application tier, where app servers crunch the data and pass the results back to the webserver to return to the user.
The benefits are in scalability – you can add a new app server anytime you like if you have more cpu crunching to do, similarly, if the amount of IO requests gets too much, you can add a new web server.
What you must not do is think of a web server as an applications server. Whilst it can perform that role in a small application, that’s only there as a convenience.
So node.js can perform the task you want, you just need to send any CPU-heavy work off somewhere. It’s easy to send requests (on an async method!) to another service or process. As node runs on a single thread, that means you have several others on the same box available, assuming you’ve not filled them with DB work (which is a CPU-heavy task itself).
1
You will need a technology that allows you to add as much cpu as you need.
As you will most likely outgrow a single server, for Java you need an application server which is built to span several servers. An example is Glassfish 3.1 which can be clustered to do exactly this.
See http://glassfish.java.net/public/clustering31.html for more.
1
You might want to check out Erlang It has a very easy concurrency model and will easily support very heavy loads. It also is designed so that you can build very reliable systems with relative ease.
Other advantages of Erlang include being able to upgrade running code without taking a system offline. So if you have a server that is streaming data to users you don’t need to disconnect the users to upgrade the server.
Any asynchronous web server (Node.js being one of them) can be easily adapted for heavy CPU lifting. Question is, that the CPU intensive task should be spawned independently of the main server (separate thread, separate process, thread or process pool or better yet completely decoupled service). Asynchronous servers can easily poll numerous running task, without creating much overhead.
On the other hand, what would be very bad idea, is to execute CPU heavy tasks within the web server itself. A practice not unheard of in Apache+PHP setups, but that’s far from being good practice. It’s also neither efficient nor scalable.
Don’t forget the massive improvements (two orders of magnitude) you can get from GPU acceleration for some of the tasks you mentioned.If this is not possible, order of magnitude improvements are still possible with parallelism and vectorization.
1