fast intersection, complement and union of tab-delimited text files?

https://stackoverflow.com/questions/8378524

28-10-2019
|

Pergunta

Can someone recommend a fast unix-based utility (ideally written in C) for getting efficient, streaming intersection/union of tab-delimited text files? For example, allow queries such as "give me the all the entries that in file A that have a column value K that does not appear in any column K of file B".

e.g., if file A is:

bob sally sue
bob mary john

and file B is:

john sally sue
foo bar quux

then complement of file A relative to B on column 2 would return "bob mary john", since that's the only in file B that has a value in column 2 that does not appear in file B.

I'd prefer not to use a database, but would like a command line based utility. Is awk the answer or is there something simpler? thanks.

Solução

If it were only for that particularly query, I'd probably go with awk, hash B's 2. columns and filter A based on the hash.

Licenciado em: CC-BY-SA com atribuição

Não afiliado a StackOverflow