質問

I am sending packets over a TCP socket between a Linux Centos 4 machine and a Windows XP machine running Interix with Gentoo. When the packet is received by Interix, about 10% of the characters are consistently scrambled at the exact same offsets from the beginning of the packet. On the sending Linux side, the packet has this correct contents:

-----BEGIN PUBLIC KEY----- 
MIIBojCCARcGByqGSM4+AgEwggEKAoGBAP//////////yQ/aoiFowjTExmKLgNwc
                                        ^   ^^^^^^^^^^^^^   
0SkCTgiKZ8x0Agu+pjsTmyJRSgh5jjQE3e+VGbPNOkMbMCsKbfJfFDdP4TVtbVHC
^^^^^^^^
ReSFtXZiXn7G9ExC6aY37WsL/1y29Aa37e44a/taiZ+lrp8kEXxLH+ZJKGZR7OZT 
gf//////////AgECAoGAf//////////kh+1RELRhGmJjMUXAbg5olIEnBEUz5joB 
Bd9THYnNkSilBDzHGgJu98qM2eadIY2YFYU2+S+KG6fwmra2qOEi8kLauzEvP2N6 
JiF00xv2tYX/rlt6A1v29xw1/a1Ez9LXT5IIviWP8ySUMyj2cynA//////////8D 
gYQAAoGAKcjWmS+h/a6xY6HfNeVBk+vU4ZQoi4ROBT8NXdiFQUeLwT/WpE/8oAxn 
KCOssVcoF54bF8JlEL0McWjQUzMrqoQedizALRRdH7kTUM/yqZZdxLgRFmiFDUXT 
XxsFFB5hlLpMqy9lqpNMN8+e5m9ISgu8zHMlTBQXsnwds0VkbeU=
-----END PUBLIC KEY-----

But on Interix, the packet contents are slightly scrambled (but the majority is correct):

-----BEGIN PUBLIC KEY-----
MIIBojCCARcGByqGSM4+AgEwggEKAoGBAP//////y////iFowjTExQ/aomKLgNwc
                                        ^   ^^^^^^^^^^^^^ 
KigTCkS0Z8x0Agu+pjsTmyJRSgh5jjQE3e+VGbPNOkMbMCsKbfJfFDdP4TVtbVHC
^^^^^^^^
ReSFtXZiXn7G9ExC6aY37WsL/1y29Aa37e44a/taiZ+lrp8kEXxLH+ZJKGZR7OZT 
gf//////////AgECAoGAf//////////kh+1RELRhGmJjMUXAbg5olIEnBEUz5joB 
Bd9THYnNkSilBDzHGgJu98qM2eadIY2YFYU2+S+KG6fwmra2qOEi8kLauzEvP2N6 
JiF00xv2tYX/rlt6A1v29xw1/a1Ez9LXT5IIviWP8ySUMyj2cynA//////////8D 
gYQAAoGAKcjWmS+h/a6xY6HfNeVBk+vU4ZQoi4ROBT8NXdiFQUeLwT/WpE/8oAxn 
KCOssVcoF54bF8JlEL0McWjQUzMrqoQedizALRRdH7kTUM/yqZZdxLgRFmiFDUXT 
XxsFFB5hlLpMqy9lqpNMN8+e5m9ISgu8zHMlTBQXsnwds0VkbeU=
-----END PUBLIC KEY-----

I've pointed to the differences with the ^ characters above. There could be a couple more characters around the y given the repeated / would hide additional characters that were moved in that section.

This code works fine between several platform pairs:

  • Linux and Linux
  • Linux and BSD
  • Linux and Cygwin

Could this be a bug in the Interix and Gentoo code? I'm running on Windows XP, Interix v3.5. I notice that all the right characters are present, but their order is consistently scrambled, portions are reversed, others are cut and reinserted in a different place. The packet is being read on the receiving side with ::read() on the TCP socket file descriptor. There is lot of code in play here, so I'm not sure what portions would be most relavent to include, but will try and add more code if specific requests are made.

const int fd; // Passed in by caller.
char *buf;    // Passed in by caller.

size_t want = count; // This value is 625 for the packet in question.
// As ::read() is called, got is adjusted, until the whole packet is read.
size_t got = 0;

while (got < want) {
  // We call ::select() to ensure bytes are available before calling ::read().
  ssize_t result = ::read(fd, buf, want - got);

  if (result < 0) {
    // Handle error (not getting called, so omitted).
  } else {
    if (result != 0) {
      // We are coming in here in one try and got is set to 625, the amount we want...
      // Not an error, increment the byte counter 'got' & the read pointer,
      // buf.
      got += result;
      buf += result;
    } else { // EOF because zero result from read.
      eof = true; // Connection reset by peer.
      break;
    }
  }
}

What experiments might I perform to help nail down where the error is coming from?

役に立ちましたか?

解決 2

Mystery solved! The issue was that off_t was 32 bits wide on the windows XP machine and 64 bits wide on the Centos machine. When the packet is sent, its memory layout that includes some off_t objects is put from host into network byte order (little endian to big endian) then on the windows machine when it gets the packet, it goes back from network to host. Because the memory layout differed, I got the scrambling seen above.

I resolved the issue by using my own soff_t everywhere that is 64 bits wide.

However, I then ran into another issue where the compiler did not pack a structure the same way on both machines and on windows it inserted 4 bytes to make the long long 8 byte aligned, whereas on Centos it did not do this:

typedef struct Option
{
  char[56]    _otherStuff;
  int         _cpuFreq;   
  int         _bufSize;
  soff_t      _fileSize;    // Original bug fixed by forcing these 8 bytes wide.
  soff_t      _seekTo;      // Original bug fixed by forcing these 8 bytes wide.
  int         _optionBits;
  int         _padding;     // To fix next bug, I added this 4 bytes
  long long   _mtime;
  long long   _mode;
} __attribute__ ((aligned(1), packed)) Option;

I had used the __attribute__ ((aligned(1), packed)) to force the packing to be consistent and dense, but on Windows XP this was not or could not be honored. I solved this by adding the _padding to force the next 8 byte member to be 8 byte aligned on Centos and thus agree with Windows XP.

他のヒント

I would say you have a concurrency bug on 'buf', or possibly a duplicate free() or a re-use after free().

ライセンス: CC-BY-SA帰属
所属していません StackOverflow
scroll top