I started working on a new project which is basically a fancy proxy:
-
An user makes an HTTP request to our service
-
Our service makes between 30 and 100 HTTP calls, mostly to third parties
-
A bare minimum of business logic is run to formulate an answer to the call
Seems easy, the only problem is scale: during peak time we reach tens of millions of requests per second, using around 5000 ec2 instances.
I see our CPU is often at 100%, but increasing or decreasing CPU doesn’t affect response times, which makes me think that the CPU is just busy polling while waiting for the response to the HTTP requests.
Questions:
-
How do I validate my hypothesis that CPU is busy polling and not doing real work
-
How could I mitigate the problem? Maybe using
bun.js
which, thanks to the use ofio_uring
is more apt to such concurrency levels -
Which scaling metric could I use, instead of CPU? (maybe requests / connections per second per node? But not all requests are the same…)