Comparing large text files

https://stackoverflow.com/questions/23510603

16-07-2023
|

質問

I have to compare large text files to determine the difference between file1 and file2. Each file is containing domain names. Size of files around: 2GB.

Content sample:

domain1.com
domain2.com

I would prefer to use unix tool to get the results. Basically I want to output all lines from file1, which are not present in file2. Generally what I am trying to accomplish is to determine the list of expired domains.

Thanks in advance.

解決

As a first cut I would try the following:

comm -13 <( sort file1 ) <( sort file2 )

This will give you lines occurring only in file2. You may be surprised how fast this actually is considering how little effort that one-liner is to type.

If this is something you'll be doing frequently, it is a good idea to keep the files sorted, then you can just to the comm. If your files contain many duplicates, you may also save some time by doing a | uniq after the sort.

他のヒント

You can try diff

diff file1 file2

Using grep, you can specify a file to read the patterns from (-f file1) and negate the output, that is print non-matches, via -v:

grep -v -f file1 file2

ライセンス： CC-BY-SA と帰属

所属していません StackOverflow