ActiveMQ Artemis – 8 CPU, 13 Ram, 10000 IOPs
Conducted load testing of ActiveMQ Artemis. We discovered strange behavior when RPS reached 350. Writing/reading became unstable, until the cluster stopped.
Startup parameters: acceleration of each pair of chains – 1200 sec (30 min), buffer before starting acceleration of the next chain – 600 sec (10 min), maintaining a stable load until the test is completed.
Chains:
Service_1.V2 - 40rps
Service_2.V2 - 30rps
Service_3.V3 - 30rps
Service_4.V2 - 30rps
Service_5.V5 - 30rps
Service_6.V2 - 30rps
Service_7.V2 - 30rps
Service_8.V3 - 20rps
Service_9.V3 - 20rps
Service_10.V5 - 20rps
Result: Total maximum performance achieved ~ 350 rps
Service_3.V2 began to return errors, later errors from Service_6.V2 were added to it, and massive accumulations began in the RESP and RESP.ER queues of access services.
There was an increase in Artemis memory consumption.
The CPU consumption of the WS connector by the Service_6 reached 1.5GiB at the moment (with a set limit of 750 MiB).
The response time for 95pct at the level of 230 rps was 50 ms; with a further increase in rps, the response time increased multiple times.
Question:
I don’t understand what we’re running into. There is a reserve of resources, the channel is sufficient. Just at one moment everything starts to fall apart. This can be seen on the charts from 21-30.
Requestresponse per second
Response Time
Cluster performance
Queues