Question

I have File1 and File2 as below. I found similar questions but not quite the same.

Use File1 rows as input for grep and extract 1st column of File2. In below toy example, if column2 in File2 equals to a or b then write 1 to File_ab.

So far I am using double loop, and estimated time is 4 days. I was hoping to get something like: cat File1 | xargs -P 12 -exec grep "$1\|$2" File2 > File_$1$2.txt But failed to get the syntax right. I am trying to run 12 greps in parallel with OR condition.

File1
a b
c d

File2
1 a
2 b
3 c
1 d
4 a
5 e
6 d

Desired output is 2 files, File_ab and File_cd:

File_ab
1
2
4
File_cd
1
3
6

Note: My File1 is 25K rows, and File2 is 10Mln rows.

Was it helpful?

Solution

Use perl:

#!/usr/bin/perl                                                                                               

use FileCache;

@a=`cat File1`;
chomp(@a);
for $a (@a) {
    @parts = split/ +/,$a;
    push @re, @parts;
    for $p (@parts) {
    $file{$p} = "File_".join "",@parts;
    }
}

$re = join("|",@re);

while(<>) {
    if(/(\d+).*($re)/o and $file{$2}) {
    $fh = cacheout $file{$2};
    print $fh $1,"\n";
    }
}

Then:

chmod 755 myscript
./myscript File2
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top