Pregunta

I have two text files with thousands of rows. File A has only one column (ID)

#ID
rs111
rs222
rs333
rs444

File B looks like this:

#CHROM POS ID REF ALT QUAL ......

22 111 rs111 T C . ....

22 222 rs222 A G ....

22 333 rs666 G T ...

22 444 rs777 A A ..

This is the output I want:

#CHROM POS ID REF ALT QUAL ......

22 111 rs111 T C . ....

22 222 rs222 A G ....

i.e. I want to extract only those rows from file B whose ID matches the ID given in file A. How can I achieve this? thanks

¿Fue útil?

Solución

You can use this awk:

awk 'FNR==NR{a[$1];next} ($3 in a)' fileA fileB
22 111 rs111 T C . ....
22 222 rs222 A G ....

Otros consejos

Although the awk solution posted by anubhava is more elegant you might get away with:

$ grep -f filea fileb 
22 111 rs111 T C . ....
22 222 rs222 A G ....

See "Query a dbSNP VCF File" in biostars: http://www.biostars.org/p/88799/ or http://www.biostars.org/p/12707/

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top