Find matches between 2 files

https://stackoverflow.com/questions/22228118

10-06-2023
|

Question

I'm trying to output matching lines in 2 files using AWK. I made it easier by making 2 files with just one column, they're phone numbers. I found many people asking the same question and getting the answer to use :

awk 'NR==FNR{a[$1];next}$1 in a{print $1}' file1 file2

The problem I encountered was that it simply doesn't want to work. The first file is small (~5MB) and the second file is considerably larger (~250MB). I have some general knowledge of AWK and know that the above script should work, yet I'm unable to figure out why it's not.

Is there any other way I can achieve the same result? GREP is a nice tool, but it clogs up the RAM and dies within seconds due to the file size. I did run some spot checks to find out whether there are matches, and when I did a grep of random numbers from the smaller file and grep'd them through the big one and I did find matches, so I'm sure that there are.

Any help is appreciated!

[edit as requested by @Jaypal]

Sample code from both files : File1:

01234567895
01234577896
01234556894

File2:

01234642784
02613467246
01234567895

Output:

01234567895

What I get:

xxx@xxx:~$ awk 'NR==FNR{a[$1];next}$1 in a{print $1}' file1 file2
xxx@xxx:~$

Solution

Update

The problem happens to be with the kind of file you were using. Apparently it came from a DOS system and had many \r around. To solve it, do "sanitize" them with:

dos2unix

Former answer

Your awk is pretty fine. However, you can also compare files with grep -f:

grep -f file1 file2

This will look for lines in file1 that are also in file2.

You can add options to make a better matching:

grep -wFf file1 file2

-w matches words
-F matches fixed strings (no regex).

Examples

$ cat a
hello
how are
you
I am fine areare
$ cat b
hel
are

$ grep -f b a
hello
how are
I am fine areare

$ grep -wf b a
how are

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow