The problem
I have a collection of System.Threading.Thread
executing different web queries.
The threads share a network abstraction that use an HttpClient
.
In that abstraction I retrieve asynchronously the response using GetAsync(url)
then the content using ReadAsStringAsync()
.
The method GetAsync(url)
throw a HttpRequestException
with error message Error while copying content to a stream
up to 5% of the time.
Why am I using System.Threading.Threads ?
Before using System.Threading.Threads
, I was using the Parallel library and it was working fine.
I tried to use DataFlow, but without better results than with the Parallel library.
The cpu load was low as most of the time was spent waiting for the answer and the parallel library don’t scale more than the cpu core count. (more or less, there are other factors, but in my case, with MaxDegreeOfParallelism
set to maximum value, it was scaled up to the cpu core count).
Also, the network wasn’t bottlenecking.
By using System.Threading.Thread
, I can increase the number of concurrent requests and the overall time is reduced by 75% according to my measurement, which is quite a great improvement, as I need to do thousands of get requests.
The current main bottleneck is my network bandwidth.
The current workaround
Considering that using classic threads introduced the problem and I get an exception on a shared object, I think I introduced a race condition.
HttpClient
should be thread safe, but the documentation affirming this specify it’s when the configuration and usage doesn’t change. And I can’t find information for my scenario nor similar issues in similar context.
So I might be mutating the internal state of the shared HttpClient
when calling GetAsync(url)
, leading to a race condition.
I tried to verify this by looking at the GetAsync(url)
implementation, but can’t confirm.
Using a mutex before executing the request would defeat the multithreading benefits and is not possible. You can’t call async methods in a lock, for good reasons. It would make little to no sense and introduce bugs.
I found a solution by requesting a new HttpClient
instance for each request.
The problem with that, is that the socket isn’t released when the HttpClient
is disposed until a specific timeout period, leading to a socket exhaustion situation.
To avoid this, I’m sharing the underneath and appropriate SocketsHttpHandler
and specify to the HttpClient
constructor to not dispose the provided SocketHttpHandler
instance.
Now I don’t get HttpRequestException
anymore and according to netstat, I don’t overuse my client sockets, despite the thousands of requests.
Also, I don’t measure significant performance impact regarding the constant HttpClient
instantiation.
My questions
Do you share my diagnostic ? Or am I missing a known issue in my context ?
What is your opinion about the current solution ?
Do you know a better solution that would solve the issue ?