I’ve noticed very strange behavior of OpenShift load balancing. I make a call via internal route (not GTM) from one microservice to another. I additionally put roundrobin load balancing and disabled sticky session (this changes applied 100%)
haproxy.router.openshift.io/balance: roundrobin
haproxy.router.openshift.io/disable_cookies: 'true'
I performing requests via curl with connection reuse disabled
curl.exe $url -X GET -H "Connection: close" --no-keepalive
And with 2 pods for 20 requests I see very strangle behavior
With pause between requests:
- 100ms – 13/7
- 200ms – 16/4
- 1sec – 19/1
- 30sec – 20/0
The problem is that my service performing heavy computations, and pause in 20 seconds – it is production scenario. But in such case all requests came to a single pod, kill it, then to the second and kill it and so on.
I understand that round robin performed not only for a single endpoint which I am testing, but for all endpoints of a service. But any distribution between endpoints can’t lead to situation 20/0 all the time. The problem is in something else and I do not understand where. Can you please explain me why does it happened? How can I achieve distribution at least close to 10/10 between 2 pods in a route