Question

I'm working on a mac, with some fairly old files. Different files were created by different programs, so some of them end with \r (mac) and some with \n (unix). I want to be able to run commands like diff, grep, etc on these files, but the ones that have \r are treated as one giant line. does anyone know of a version of diff, grep, etc that will work correctly with all new-lines?

ETA: I'd also like them to be unix utilities so I can use them in scripts, emacs, etc...

Was it helpful?

Solution

As Jay said, Diff'nPatch seems what you are looking for. Alternatively you can convert all your '\r' line endings in '\n' in a single command like this:

sed -ie 's/\r/\n/' filename

or

find . | xargs -n1 sed -ie 's/\r/\n/'

(You may want to filter the list of files in some way in the latter case or it will be applied to all the files in all subdirectories.)

OTHER TIPS

If you use diff -w it will ignore whitespace in the files, which is probably sufficient for your needs.

EDIT: just realized I misread the post the first time and you're actually looking for a diff that will work with \r line endings. My suggestion would be to convert the files with something like flip that can convert the files to a \n standard format.

EDIT 2: Just found something that looks like what you want - Diff'nPatch:

Diff'nPatch is a port to the Macintosh of the GNU 'diff', 'patch' and 'cmp' utilities. It lets you compare and find differences between two files or folders, collate two files, generate diffs in various formats (normal, context, unidiff, etc.), apply patches, compare files byte by byte. It can handle any type of line endings (mac, unix or windows)

The diff utility bundled with OSX Lion has an option 'strip-trailing-cr' that does that you want. You use it like so:

diff -cpt a.c b.c --strip-trailing-cr

The dos2unix command could be helpful in converting your files to a consistent format first. I believe it's available for just about every platform you can think of and can run on lots of files at once. I believe there's a package available for mac.

PHPStorm's diff view's "ignore whitespace" just works. It automatically ignores differences in the carriage return / EOL / newline / what-have-you. You can waste your time fiddling with arcane unix commands or whatever, or you could just get something that actually works and move forward with life.

  • Using any of the above-mentioned solutions failed on Mountain Lion (including the one marked as the correct answer). All the download links for "Diff-npatch" failed. (I did find http://webperso.easyconnect.fr/bdesgraupes/tools.html but I really don't like the idea of having to resort to using a diff tool that cannot be invoked from the command line and thus integrated with whatever IDE or VCS tool I might be using, like BBEdit, SourceTree, or SmartSVN -- all of which, BTW, failed to ignore newlines with their built-in diff tool.

Yes, my newlines are \r, but so what? Arrr! If the software is too stupid to realize that \r == \n then I'm just going to use different software that is smart enough.

PHPStorm was the only software that had a diff tool that "just worked" -- which is what I expect Mac software to do. I expect Mac software to just work. I use a Mac so I can do my job instead of learning arcane terminal commands at every turn, which are almost all poorly documented, expecting you to just understand how the commands are supposed to be formatted without any clear examples, so you never know if you're doing it wrong or if the command simply doesn't work just like all other bad software. Take this example from "man diff":

   -I RE  --ignore-matching-lines=RE
          Ignore changes whose lines all match RE.

OK, so having read this, I have no idea what it means. There is no example of its usage. What is "RE"? It doesn't say anywhere.

Then there's this jewel:

  --GTYPE-group-format=GFMT
          Similar, but format GTYPE input groups with GFMT.

   --line-format=LFMT
          Similar, but format all input lines with LFMT.

   --LTYPE-line-format=LFMT
          Similar, but format LTYPE input lines with LFMT.

   LTYPE is `old', `new', or `unchanged'.
          GTYPE is LTYPE or `changed'.

          GFMT may contain:

   %<     lines from FILE1

   %>     lines from FILE2

   %=     lines common to FILE1 and FILE2

   %[-][WIDTH][.[PREC]]{doxX}LETTER
          printf-style spec for LETTER

          LETTERs are as follows for new group, lower case for old group:

   F      first line number

   L      last line number

   N      number of lines = L-F+1

   E      F-1

   M      L+1

          LFMT may contain:

   %L     contents of line

   %l     contents of line, excluding any trailing newline

   %[-][WIDTH][.[PREC]]{doxX}n
          printf-style spec for input line number

          Either GFMT or LFMT may contain:

   %%     %

   %c'C'  the single character C

   %c'\OOO'
          the character with octal code OOO

I could make no sense whatsoever of this passage. What is the "input"? Is it both files or just the "to" file or just the "from" file? What is "similar" referring to? What does "is" mean in the sentence, "GFMT 'is' LTYPE or `changed'"? Does it mean "may be replaced by"? If so then why isn't "GFMT" in quotations or brackets etc.? Since no example is given, there is no way to know; the documentation's wording is totally ambiguous. What does "GFMT may contain"... mean? Does "contain" mean that the text replacing the acronym GFMT may contain that? Without a clear example it's completely useless.

Why even bother to write a man page if you're going to make it so cryptic and ambiguous it's useless to anyone who doesn't already know how to use the software, basically? At that point, it's not a manual; it's just a quick-reference page for the guys who wrote the software so they can remember how to use it. I guess they assume you'll just read the source-code itself if you want to know what it actually does.

My time is valuable. I'd rather just pay the money to have a piece of software that actually works correctly and has proper documentation.

Because these all failed:

 diff -d --strip-trailing-cr --ignore-all-space --from-file=rest.phtml test.phtml

...failed to ignore \r characters.

 diff -wd --strip-trailing-cr --ignore-all-space --from-file=rest.phtml test.phtml

...failed to ignore \r characters.

 diff -wd --suppress-common-lines --strip-trailing-cr --ignore-all-space --from-file=rest.phtml test.phtml

...failed to ignore \r characters.

 diff -wd test.phtml rest.phtml --suppress-common-lines --strip-trailing-cr --ignore-all-space

...failed to ignore \r characters.

 diff -awd test.phtml rest.phtml --suppress-common-lines --strip-trailing-cr --ignore-all-space

...failed to ignore \r characters.

For that matter if they were \n characters it also failed when the \n characters are added.

Where test.phtml ==

foo

bar

and rest.html ==

foobar

The "diff" command always gives you something like:


* 1,2 ** ! foo ! bar \ No newline at end of file

--- 1 ---- ! foobar \ No newline at end of file

... fail!

I used the following quick fix, which has drawbacks (see below):

1: Do a diff and list only the filenames

diff -r -q dir1/ dir2/

2: Open and save every listed file with the editor that was used, this will change the line-endings.

3: Do a regular diff

Drawbacks include:

  • less robust, error prone
  • more work if you have lots of files

This worked for me:

diff -r --ignore-all-space dir1/ dir2/

I am on OSX, and have mixed files from OSX and windows. Credit: http://www.codealpha.net/514/diff-and-ignoring-spaces-and-end-of-lines-unix-dos-eol/

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top