send, recvfrom, threads, “Broken Pipe” and SO_RCVTIMEO bug
-
14-06-2021 - |
Question
I have a server coded in C++ running on ubuntu 10.04, currently in production, which exhibit a weird bug.
Context :
Each client connecting to the server has one socket and 2 threads
- 1 thread for writing to the socket,
- 1 thread for reading from the socket.
The socket is configured via ::setsockopt
with SO_RCVTIMEO
of 10 seconds.
Each ::send
on the socket has flag MSG_NOSIGNAL
set
(each ::recvfrom
also, but it seems it should have no impact)
Bug :
I have some evidence (but not 100% sure) that the following scenario may occur rarely :
::recvfrom
is called and block until either data is present or timeout is reached::send
is called and the write on the socket triggers an error, returnsEPIPE
(Broken Pipe) error- Bug :
::recvfrom
is still blocked, and will never return, somehow ignoringSO_RCVTIMEO
option
Does the above scenario makes some sense to you ?
Metrics :
The bug happens approximatively every week. During a week, there is approximatively :
- 20K sockets used
- 30M
::recvfrom
called - 60M
::send
called
Should I rather use the timeout feature from ::select
? (supposing that the timeout implementation would be different from the SO_RCVTIMEO
one)
Thanks a lot for any idea on this matter !
No correct solution