The current architecture is as follows:
- User sends a request to server A
- Server checks if the solution exists in the database. If exists, step 6. If not, step 3.
- Server A sends request to a highly scalable stateless server B which does the computing that takes anywhere from 1-30 seconds and responds back to server A.
- Server A stores the solution for future requests.
- Server A responds to the user with solution.
The problem:
Server A experiences a spurt load of requests (for which a solution does not exist in the database, otherwise caching would have worked), like 100x from time to time just like how a result site would when a result is declared. This load doesn’t last for more than 1-2 hours and happens only once every few days.
Server A handles at most around 100 requests concurrently before reaching 100% CPU utilization and fails to serve the extra load.
Solutions that don’t work:
- Configuring more capacity: This will lead to idol capacity 90% of time.
- Sending request directly to server B: We need to store the solutions. Plus, we don’t want to compute for the same request twice. Also, we don’t want to expose this server directly to the client.
Lakshay Kananiya is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.