Question

I have a script running that is checking multiples directories and comparing them to expanded tarballs of the same directories elsewhere.

I am using diff -r -q and what I would like is that when diff finds any difference in the recursive run it will stop running instead of going through more directories in the same run.

All help appreciated!

Thank you

@bazzargh I did try it like you suggested or like this.

for file in $(find $dir1 -type f); 
do if [[ $(diff -q $file ${file/#$dir1/$dir2}) ]]; 
then echo differs: $file > /tmp/$runid.tmp 2>&1; break; 
else echo same: $file > /dev/null; fi; done 

But this only works with files that exist in both directories. If one file is missing I won't get information about that. Also the directories I am working with have over 300.000 files so it seems to be a bit of overhead to do a find for each file and then diff.

I would like something like this to work, with and elif statement that checks if $runid.tmp contains data and breaks if it does. I added 2> after the first if statement so stderr is sent to the $runid.tmp file.

for file in $(find $dir1 -type f); 

do if [[ $(diff -q $file ${file/#$dir1/$dir2}) ]] 2> /tmp/$runid.tmp; 

then echo differs: $file > /tmp/$runid.tmp 2>&1; break; 

elif [[ -s /tmp/$runid.tmp ]]; 

then echo differs: $file >> /tmp/$runid.tmp 2>&1; break;

else echo same: $file > /dev/null; fi; done

Would this work?

Was it helpful?

Solution

You can do the loop over files with 'find' and break when they differ. eg for dirs foo, bar:

for file in $(find foo -type f); do if [[ $(diff -q $file ${file/#foo/bar}) ]]; then   echo differs: $file; break; else echo same: $file; fi; done

NB this will not detect if 'bar' has directories that do not exist in 'foo'.

Edited to add: I just realised I overlooked the really obvious solution:

diff -rq foo bar | head -n1

OTHER TIPS

It's not 'diff', but with 'awk' you can compare two files (or more) and then exit when they have a different line.

Try something like this (sorry, it's a little rough)

awk '{ h[$0] = ! h[$0] } END { for (k in h) if (h[k]) exit }' file1 file2

Sources are here and here.

edit: to break out of the loop when two files have the same line, you may have to do the loop in awk. See here.

You can try the following:

#!/usr/bin/env bash

# Determine directories to compare
d1='./someDir1'
d2='./someDir2'

# Loop over the file lists and diff corresponding files
while IFS= read -r line; do

  # Split the 3-column `comm` output into indiv. variables.
  lineNoTabs=${line//$'\t'}
  numTabs=$(( ${#line} - ${#lineNoTabs} ))

  d1Only='' d2Only='' common=''
  case $numTabs in
    0)
      d1Only=$lineNoTabs
      ;;
    1)
      d2Only=$lineNoTabs
      ;;
    *)
      common=$lineNoTabs
      ;;
  esac

  # If a file exists in both directories, compare them,
  # and exit if they differ, continue otherwise
  if [[ -n $common ]]; then
    diff -q "$d1/$common" "$d2/$common" || {
       echo "EXITING: Diff found: '$common'" 1>&2;
       exit 1; }
  # Deal with files unique to either directory.
  elif [[ -n $d1Only ]]; then # fie
    echo "File '$d1Only' only in '$d1'."
  else # implies: if [[ -n $d2Only ]]; then
    echo "File '$d2Only' only in '$d2."    
  fi

  # Note: The `comm` command below is CASE-SENSITIVE, which means:
  #   - The input directories must be specified case-exact.
  #     To change that, add `I` after the last `|` in _both_ `sed commands`.
  #   - The paths and names of the files diffed must match in case too.
  #     To change that, insert `| tr '[:upper:]' '[:lower:]' before _both_
  #     `sort commands.

done < <(comm \
  <(find "$d1" -type f | sed 's|'"$d1/"'||' | sort) \
  <(find "$d2" -type f | sed 's|'"$d2/"'||' | sort))

The approach is based on building a list of files (using find) containing relative paths (using sed to remove the root path) for each input directory, sorting the lists, and comparing them with comm, which produces 3-column, tab-separated output to indicated which lines (and therefore files) are unique to the first list, which are unique to the second list, and which lines they have in common.

Thus, the values in the 3rd column can be diffed and action taken if they're not identical. Also, the 1st and 2nd-column values can be used to take action based on unique files.

The somewhat complicated splitting of the 3 column values output by comm into individual variables is necessary, because:

  • read will treat multiple tabs in sequence as a single separator
  • comm outputs a variable number of tabs; e.g., if there's only a 1st-column value, no tab is output at all.

I got a solution to this thanks to @bazzargh.

I use this code in my script and now it works perfectly.

for file in $(find ${intfolder} -type f);
do if [[ $(diff -q $file ${file/#${intfolder}/${EXPANDEDROOT}/${runid}/$(basename ${intfolder})}) ]] 2> ${resultfile}.tmp;
then echo differs: $file > ${resultfile}.tmp 2>&1; break;
elif [[ -s ${resultfile}.tmp ]];
then echo differs: $file >> ${resultfile}.tmp 2>&1; break;
else echo same: $file > /dev/null;
fi; done

thanks!

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top