We are trying to build a high transactions per second, lower latency application using uWS. In this app, replies to http requests cannot be sent in the request callback as the server needs to consult other external systems to get the response. Hence, we have architect-ed the server to post responses to the uws loop via Loop::defer. The throughput of the system reduces drastically when we do that however. For example, the simple server below can handle 190,000 requests per second in a laptop.
int main() {
uWS::App* server = new uWS::App();
server->get("/foo", [](auto* res, auto* req) {
res->writeStatus("200 OK")->end("foo called");
}).listen(4333, [](auto *socket) {
std::cout << "Server running on 4333n";
}).run();
return 0;
}
If we change it as follows:
- Rather than responding in the callback, we queue the request/response objects in a queue.
- A worker thread picks up these requests from the queue, creates the response (in our case same as above – “foo called”
- The consumer thread posts the result to loop thread by calling loop::defer
- on the function passed to defer, the response is sent by calling writeStatus followed by end
In the above flow, there are no major performance issues such as contention, etc.
The resulting server can only handle 20000 TPS. This has become a bottle neck because rest of the parts of the server can handle higher throughput. Even if we scale the event loops, it is difficult to get the throughput we need.
My question is, what is the recommended architecture for this type of use? Is calling defer not the correct way? Can we install high resolution timer callback instead so we can queue up the ready responses and send them inside this call back? Or are we using uWS in an application which is not aligned with uWS goals?
Thanks
P