How to compare files with same names in two different directories using a shell script

StackOverflow https://stackoverflow.com/questions/119788

  •  02-07-2019
  •  | 
  •  

Question

Before moving on to use SVN, I used to manage my project by simply keeping a /develop/ directory and editing and testing files there, then moving them to the /main/ directory. When I decided to move to SVN, I needed to be sure that the directories were indeed in sync.

So, what is a good way to write a shell script [ bash ] to recursively compare files with the same name in two different directories?

Note: The directory names used above are for sample only. I do not recommend storing your code in the top level :).

Was it helpful?

Solution

The diff command has a -r option to recursively compare directories:

diff -r /develop /main

OTHER TIPS

diff -rqu /develop /main

It will only give you a summary of changes that way :)

If you want to see only new/missing files

diff -rqu /develop /main | grep "^Only

If you want to get them bare:

diff -rqu /develop /main | sed -rn "/^Only/s/^Only in (.+?): /\1/p"

The diff I have available allows recursive differences:

diff -r main develop

But with a shell script:

( cd main ; find . -type f -exec diff {} ../develop/{} ';' )

[I read somewhere that answering your own questions is OK, so here goes :) ]

I tried this, and it worked pretty well

[/]$ cd /develop/
[/develop/]$ find | while read line; do diff -ruN "/main/$line" $line; done |less

You can choose to compare only specific files [e.g., only the .php ones] by editing the above line as

[/]$ cd /develop/
[/develop/]$ find -name "*.php" | while read line; do diff -ruN "/main/$line" $line; done |less

Any other ideas?

here is an example of a (somewhat messy) script of mine, dircompare.sh, which will:

  • sort files and directories in arrays depending on which directory they occur in (or both), in two recursive passes
  • The files that occur in both directories, are sorted again in two arrays, depending on if diff -q determines if they differ or not
  • for those files that diff claims are equal, show and compare timestamps

Hope it can be found useful - Cheers!

EDIT2: (Actually, it works fine with remote files - the problem was unhandled Ctrl-C signal during a diff operation between local and remote file, which can take a while; script now updated with a trap to handle that - however, leaving the previous edit below for reference):

EDIT: ... except it seems to crash my server for a remote ssh directory (which I tried using over ~/.gvfs)... So this is not bash anymore, but an alternative I guess is to use rsync, here's an example:

$ # get example revision 4527 as testdir1
$ svn co https://openbabel.svn.sf.net/svnroot/openbabel/openbabel/trunk/data@4527 testdir1

$ # get earlier example revision 2729 as testdir2
$ svn co https://openbabel.svn.sf.net/svnroot/openbabel/openbabel/trunk/data@2729 testdir2

$ # use rsync to generate a list 
$ rsync -ivr --times --cvs-exclude --dry-run testdir1/ testdir2/
sending incremental file list
.d..t...... ./
>f.st...... CMakeLists.txt
>f.st...... MACCS.txt
>f..t...... SMARTS_InteLigand.txt
...
>f.st...... atomtyp.txt
>f+++++++++ babel_povray3.inc
>f.st...... bin2hex.pl
>f.st...... bondtyp.h
>f..t...... bondtyp.txt
...

Note that:

  • To get the above, you mustn't forget trailing slashes / at the end of directory names in rsync
  • --dry-run - simulate only, don't update/transfer files
  • -r - recurse into directories
  • -v - verbose (but not related to file changes info)
  • --cvs-exclude - ignore .svn files
  • -i - "--itemize-changes: output a change-summary for all updates"

Here is a brief excerpt of man rsync that explains the information shown by -i (for instance, the >f.st...... strings above):

The  "%i"  escape  has a cryptic output that is 11 letters long.
The general format is like the string YXcstpoguax,  where  Y  is
replaced  by the type of update being done, X is replaced by the
file-type, and the other letters represent attributes  that  may
be output if they are being modified.

The update types that replace the Y are as follows:

o      A  < means that a file is being transferred to the remote
       host (sent).

o      A > means that a file is being transferred to  the  local
       host (received).

o      A  c  means that a local change/creation is occurring for
       the item (such as the creation  of  a  directory  or  the
       changing of a symlink, etc.).

...
The file-types that replace the X are: f for a file, a d  for  a
directory,  an  L for a symlink, a D for a device, and a S for a
special file (e.g. named sockets and fifos).

The other letters in the string above  are  the  actual  letters
that  will be output if the associated attribute for the item is
being updated or a "." for no change.  Three exceptions to  this
are:  (1)  a newly created item replaces each letter with a "+",
(2) an identical item replaces the dots with spaces, and (3)  an
....

A bit cryptic, indeed - but at least it shows basic directory comparison over ssh. Cheers!

The classic (System V Unix) answer would be dircmp dir1 dir2, which was a shell script that would list files found in either dir1 but not dir2 or in dir2 but not dir1 at the start (first page of output, from the pr command, so paginated with headings), followed by a comparison of each common file with an analysis (same, different, directory were the most common results).

This seems to be in the process of vanishing - I have an independent reimplementation of it available if you need it. It's not rocket science (cmp is your friend).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top