We use WSO2 APIM as an API Gateway. One of our clients reported that they sometimes get errors back like this
{
"code": "101504",
"type": "Status report",
"message": "Runtime Error",
"description": "Send timeout"
}
That error is a socket timeout from WSO2 down to the back-end. So one option is to increase the timeout. However, when I look at the WSO2 logs I see entries like this
TID: [] [] [2024-07-05 13:34:30,967] WARN {org.apache.synapse.transport.passthru.TargetHandler} - ERROR_CODE = 101504, STATE_DESCRIPTION = Socket Timeout occurred after Server read the response headers but prior to reading the response body from the backend, INTERNAL_STATE = RESPONSE_BODY, DIRECTION = RESPONSE, CAUSE_OF_ERROR = Connection between the EI and the BackEnd timeouts, TARGET_HOST = 192.168.157.39, TARGET_PORT = 443, TARGET_CONTEXT = https://my.service.example.com/foo, HTTP_METHOD = POST, TRIGGER_TYPE = api, TRIGGER_NAME = ComponentsAPI:vv1, REMOTE_ADDRESS = my.service.example.com/192.168.157.39:443, CONNECTION = http-outgoing-1093655, SOCKET_TIMEOUT = 360000, CORRELATION_ID = 1861aa31-c631-438d-acf4-344c4bfefc25
The thing is, I have an Apache HTTPD as a reverse proxy in front of the WSO2 instance. The logformat we use for the access log there includes the “%D” which is “The time taken to serve the request, in microseconds.” So, I looked at my access log for that endpoint and there were a couple that took longer than expected, but nothing that took long enough to trigger the timeout. The longest I found, looking at the whole hour from 1300-1400 for that URL (the example log entry is timestamped 13:34:30) was 44919018 which is 44 seconds (Apache logs in microseconds).
The WSO2 timeout should be set to five minutes (as if reflected by the SOCKET_TIMEOUT = 360000
in the log line above). (I know this is long, but we have some reporting endpoints that need this.)
This is just one example. I have done a similar analysis multiple times (on different days and for different requests on each day) and each time there is no request time anywhere near five minutes.
So, why is WSO2 timing out, when the timeout has not been exceeded? Does anyone have any tips as to how to troubleshoot further?