Question

We've got an application written in Delphi that uses Delphi On Rails and acts as a server and communicates with clients using HTTP, JSON and websockets. We ran into some issues lately and it's hard to debug them and find the problem's source.

Using Wireshark for traffic analysis, we could see the following behaviour: There's a request from a client (HTTP GET for a file). Usually, we process that request and send a HTTP status code, the file (if not cached) etc. However, we have a reproducable problem where there's only the request from the client, a TCP SYN from the server, but after that, the server sends a RST packet and the TCP communication stops.

The strange thing is, we can reproduce the problem quite well (although the files where the RST packet disrupts the communication differ) and it mysteriously vanishes in one of the following cases:

  • In debug environment (Delphi IDE), disabling madExcept
  • In release environment, not patching the executable with madExceptPatch
  • Give focus to a different window than the main application window.

As we had some problems with Delphi On Rails and had to do minor modifications to it to avoid access violations and debug exceptions, I suspect DOR to be the culprit and some strange memory corruption or uncatched exception to be the bug, but it's still confusing, especially the fact that the problem vanishes if we change focus.

My main question is not how to solve this problem, but how to debug it and where to look for problems. The source of the TCP reset also puzzles me, as we don't run into the usual procedures that process requests in that case and it seems like either DOR or something else (the application, Winsock, OS) resets the connection by mistake.

For completeness, as it might be related, here are issues that I reported at the Delphi On Rails project and a forum thread where I asked the madExcept author about the problem: Issue #6, Issue #7, Issue #8, forum entry.

Was it helpful?

Solution

As a test, we checked out some older DOR sources from version control where no connection issues were known, and it works without showing any of the above problems.

So we decided to solve the problem the other way round: Rolling back the DOR specific source code (about 20 files) to the last stable version and "re-updating" it piece by piece until the error will occur again. If this happens, we can

  1. Go back to the latest working version quickly
  2. Hopefully be quite close to the original DOR sources so we can react on updates on the library.
  3. Analyse the occuring error and report an detailed issue (and perhaps even a solution) to the DOR project.

EDIT: We could now update all but one file back to the old state without getting connection issues. The file that creates problems is dorSynchronizer.pas, more exact it's r179 of this file that caused the issues - threads were changed from Windows API to Delphi TThread there. We'll investigate this further and might add an issue to the DOR project in the next days.

EDIT2: It turned out that DOR uses the deprecated procedures TThread.Suspend and TThread.Resume that can cause undefined behaviour. I reported an issue to the DOR project.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top