I’m experiencing a very weird behavior in an SSH-like program I’m writing in C for Linux.
Code Overview
The code is completely nonblocking and designed as a reactor where all fds are registered to a list, and on each cycle I use select to poll all fds and handle read/write events.
The code being nonblocking means my sockets are set this way too using setsockopt, thus when the program tries to connect somewhere the syscall sometimes return the EINPROGRESS errno (which I expect since TCP three-way-handshake can take a while) and that is properly handled in the code.
Finally, when tcp handshake completes the fd is set as readable in select, I check the connect result and if it worked I mark the connection established.
Now the weird stuff
I tried my program on a different machine which runs an older version of Ubuntu and realized the program connected twice to the same IP address. Debugging my way through I proved that the connect syscall was only called once! (checked both with gdb, and strace).
I tried to setup a simple server to see what’a going on there and actually saw two connections incoming from 2 different source ports.
Both of them were accepted by the server and a quick moment later one was closed by the peer (as if the client sent a FYN packet).
When the client connected it always received EINPROGRESS on the connect call and some cycles later finished the connection successfully.
The same test on my dev machine did not reproduce the problem and worked normally.
Does connect lead to undefined behavior when using nonblocking sockets?
The only thing I can think of now is maybe Unix sockets are not supposed to be nonblocking when calling connect, but that doesn’t seem logical to me and also works on my local machine.
Any thoughts on what may cause this behavior to happen?