Question

I'm comparing two large directories on two different external FireWire 800 disks using

diff -r /path/to/directory1 /path/to/directory2 

The size of the directory (and its subdirectories) on one FireWire disk is 118.2 GB for 30,000 items and 119.56 GB for 30,031 items.

I was surprised about the many differences reported in the output, like

Binary files /path/to/directory1/with/subdirectory/file_xyz and /path/to/directory2/with/subdirectory/file_xyz differ

and started comparing some of them individually. When I compare them with

diff /path/to/directory1/with/subdirectory/with/subdirectory/file_xyz /path/to/directory2/with/subdirectory/with/subdirectory/file_xyz

or even

diff  /path/to/directory1/with/subdirectory/ /path/to/directory2/with/subdirectory/ 

diff doesn't report any differences between these files or directories.

What could be a reason why the "large comparison" reports so many differences (or seemingly fails), while the smaller chunk comparison doesn't?


Edits since original post:

  1. The version of diff I'm using is GNU diffutils 2.8.1.
  2. Just a wild guess here, but could it have something to do with the fact that these directories are on external hard drives that could experience some sort of timeout?
  3. I ran another comparison and was again presented with lots of differences between those directories. I then set the system preference's Energy Saver to not set the display to sleep for 1 hour, because I had measured with time diff -r /path/to/directory1 /path/to/directory2 that it takes between 45 and 50 minutes for the diff to finish. My hard drives and the Mac never go to sleep.
    I then unmounted both drives and remounted them. Ran the diff again and voilà no differences found apart from one file. I manually compared that one reported differing file and found it to be identical.
    This seems to confirm what I found by comparing smaller chunks before. It also seems to confirm that there is something going on with a timeout as previously suspected, but I still wonder that if a FireWire drive would become unavailable or whatever, diff should not report a difference, but rather say "file not available" or "Only in /yada/yada/directory: file_xyz".
    Interestingly, the diff before the remount took 28 minutes, after the remount it took almost 51 minutes.
    In the light of that: What can I do to prevent something like that from happening?
    Of course one could say to never set the display to go to sleep or whatever, but that hardly seems to solve the underlying issue. Something else must be going on, I just can't figure out what.

    As an aside, on previous other occasions, after some time of inactivity on those FireWire hard drives I was trying to write to a file and got this *error code -50" message (mind you I didn't see any of that during the diff operation). I could always "resolve" the issue by unmounting and remounting the drives, but I believe there must be an entirely different solution to that:
    Error Code -50 appears during write operation after external hd inactive for a while
    Solutions like those presented here hardly seem to be tackling the underlying problem.
Was it helpful?

Solution

For a week I've tested a recommended solution I found at iFixit, when someone encountered an error code -50 on an external HD. It seems to resolve the issue I had. I also came across error code -50 before when I tried copying files across, but didn't immediately link this to my issue of failing diffs. I had the suspicion that some inactivity on the external hard drives was responsible for the failure and the article seems to confirm that. To quote from there for the solution should you run into similar problems:

To work around this error you'll want to go into System Preferences->Energy Saver and for both Battery and Power Adapter tick the box next to "Put the hard disk(s) to sleep when possible" on your computer. Yes that means the OS will put all of your drives to sleep when it can but that's the only way I've found to fix the problem.

What seems to happen is that some external drives have firmware which detects inactivity and spins the drives down. If OSX is not configured with the energy saver setting mentioned above then OSX is not expecting the drives to go to sleep. When accessing the drive after it puts itself to sleep obviously something gets messed up and the error -50 is thrown among other problems.

By configuring the OS to put disks to sleep the OS will issue spin-up commands.


The one thing that puzzles or surprises me though is why diff would report actual differences in files, when in fact it couldn't access(?) the file properly.

Licensed under: CC-BY-SA with attribution
Not affiliated with apple.stackexchange
scroll top