RTI DDS subscriber not getting data from publisher

Question 1

More than a year later this happened to me again with a different virtual machine. I had it working yesterday, so I was very suspicious. I scoured all my code changes for the past 24 hours for issues, but didn't find any. Then I decided to see if IT had pushed any patches to my computer.

Guess what? The Windows Firewall had been aggressively updated since the day before. Rules missing or changed, etc. The log said packets were being dropped. I opened the firewall filters up a bit, and suddenly, everything worked again. I hesitate to close this issue, as I am not 100% this was exactly the same as last year, but it feels very similar. I suspect that last year the settings in the firewall were not logging the packet drops.

Long and short of it: if DDS suddenly stops working, check your firewall settings.

Question 2

A couple of things to try:

Try running rtiddsping -publisher on the broken VM and rtiddsping -subscriber on Windows. This has two advantages:
- The data type is small and well-known, so if there's some problem with the data being fragmented due to the different Ethernet drivers, it will not happen with rtiddsping, and may help track down the problem.
- Rtiddsping prints out when the publisher and subscriber discover each other, so you will be able to confirm that discovery is completing correctly on both sides. I am guessing discovery is working, because Analyzer is showing both applications, but it is good to confirm.
If you see the same problem with rtiddsping that you see with your application, increase the verbosity to rtiddsping -verbosity 3, and then 5. At the highest verbosity level, this will print (a lot of) additional output, which may give us a hint about what is happening.

To answer your bonus question about spy: The reason why spy is showing that IP address is because that is one of the addresses that is being announced as part of discovery. During discovery, a DomainParticipant can announce up to four IP addresses that can be used to reach it. Spy will choose one of those to display, but it may not be the actual address that is being used to communicate with the application. If your machine does not have any interface with the 192.168.11.1 IP address, this could indicate a larger problem. (This may be normal, though - as long as the correct IP is one of the four that are announced.)

Looking through the packet trace images, there is nothing that is obviously the problem. A few things I notice:

There seems to be a normal pattern of heartbeats/ACKNACKs in the final packet trace image. This indicates that there is some bidirectional communication between the two applications.
It is difficult to tell from these images whether the DATA being sent from .113 to .203 consists of participant-to-participant messages, or real discovery messages - except for two packets: packet #805, and packet #816 (fragments 811-815) look like discovery announcements that are being sent to .203. This indicates that you have at least four entities (DataWriters or DataReaders) in your application on .113.

So, discovery data is being sent by the application on .113. It is being received and reassembled by WireShark, but that doesn't always mean it was received correctly by the application.

Packet #816 has a heartbeat on the end of it. It is possible that packet #818 or #819 might be the ACKNACK that is responding to that heartbeat, but I can't be sure from the image. The next step is to look at those ACKNACKs from .203 to .113 to see if .203 thinks it has received all the discovery data. Here is an example of a HB/ACKNACK pair where a discovery DataReader has received all data:

Submessage: HEARTBEAT
... 
firstSeqNumber: 1
lastSeqNumber: 1

The heartbeat sequence number is 1, which indicates it has only sent an announcement about a single DataReader.

Submessage: ACKNACK
... 
readerSNState: 2/0:
    bitmapBase: 2
    numBits: 0

The readerSNState is 2/0, meaning it has received everything before sequence number two, and there is nothing missing. If there is something other than a 0 in the bitmap, it indicates the DataReader did not receive some data.

If you can confirm that the application is receiving all the discovery data correctly, it will be helpful if you can use a WireShark filter to show only user data, since the images aren't highlighting discovery vs. user data.

WireShark filter for just rtps2 user data: rtps2 && (rtps2.traffic_nature == 3 || rtps2.traffic_nature == 1)

Question 3

We had a similar issue with this. Here is the environment in a very summarized way:

A publisher
A working subscriber (laptop)
A non-working subscriber (desktop)

Both subscribers held exactly the same software (the desktop was a clone from the laptop, through Clonezilla), but rtiddsspy was blind from the desktop point of view; however, the opposite way worked well: the publisher machine's rtiddsspy saw the desktop. Laptop and publisher machines' always worked well. Laptop and desktop too (they saw each other's subscriptions)

The workaround for this (based on https://community.rti.com/content/forum-topic/discovery-issues) was to increase the MTU on the desktop NIC. Don't ask me why, but it worked.

EDIT: At the beginning, the MTU in the publisher was set to a higher value than the subscriber. So, we changed the MTU in the subscriber to match the publisher's.