سؤال

For a homework assignment I created a simple compression/decompression program that makes use of a naive implementation of run-length encoding. I've gotten my program working; compressing and decompressing any text file with a pretty large number of characters (e.g. the program source) works flawlessly. As an experiment I tried to compress/decompress the binary of the compression program itself. This resulted in a file that was much smaller than the original binary, and is obviously un-runnable. What is causing this data-loss?

My assumption was that it's related to how binary files are represented, but I can't figure much out past that.

هل كانت مفيدة؟

المحلول

Possible issues:

  • Your program opens the binary file in the text mode, which damages the '\r' and '\n' bytes
  • Your program incorrectly handles zero bytes, treating them as ends of strings ('\0') and not as data of its own
  • Your program uses char (that is actually signed char) for the bytes of data and correctly works only with non-negative values, which ASCII chars of English text are, but fails to work with arbitrary char/byte values, which may be negative
  • Your program has an overflow somewhere which shows up only on big files
  • Your program has some other data-dependent bug

نصائح أخرى

If the platform is linux (as the question is tagged), there's no difference between binary and text modes. So it shouldn't be that; but even so, the files should be opened as binary.

I suspect that your problem is the program treats '\0' characters as terminators (or otherwise specially) instead of as valid data.

مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top