Question

Need help combining/joining two tables.

Table_1 assigns an item (item_ID) to a term (term_ID).

item_ID    term_ID
-------    -------
C051890    C535944
C061133    C535944
C402769    C535944
D000082    C535944
C006632    D017624
C051890    D017624

Table_2 identifies the location (row number) of the term in a numbered list (term_locator).

term_ID    term_locator
-------    ------------
C535944    1340
C535944    1523
C535944    1829
C535944    1864
D017624    1277
D017624    4290

How can I use awk to generate the combination of table_1 and table_2? For example, the desired output is Table_3.

item_ID    term_ID    term_locator
-------    -------    ------------
C051890    C535944    1340
C061133    C535944    1340
C402769    C535944    1340
D000082    C535944    1340
C051890    C535944    1523
C061133    C535944    1523
C402769    C535944    1523
D000082    C535944    1523
C051890    C535944    1829
C061133    C535944    1829
C402769    C535944    1829
D000082    C535944    1829
C051890    C535944    1864
C061133    C535944    1864
C402769    C535944    1864
D000082    C535944    1864
C006632    D017624    1277
C051890    D017624    1277
C006632    D017624    4290
C051890    D017624    4290

Additional information:

  • An item may be assigned to multiple terms (e.g. C051890 is assigned to C535944 and D017624).

  • The term_locator is a unique number (i.e. the first number in the list is 1 and the last number is greater than 4290).

My platform:

  • Windows 7 64-bit with 8GB of memory; GnuWin32 and gawk-3.1.6.

Can use other GnuWin32 utilities to solve this problem.

Am open to alternative to awk.

Was it helpful?

Solution

My solution is not perfect, but it is simple:

join -1 2 -2 1 -o 1.1,1.2,2.2 table1.txt table2.txt

Output

item_ID term_ID term_locator
------- ------- ------------
C051890 C535944 1340
C051890 C535944 1523
C051890 C535944 1829
C051890 C535944 1864
C061133 C535944 1340
C061133 C535944 1523
C061133 C535944 1829
C061133 C535944 1864
C402769 C535944 1340
C402769 C535944 1523
C402769 C535944 1829
C402769 C535944 1864
D000082 C535944 1340
D000082 C535944 1523
D000082 C535944 1829
D000082 C535944 1864
C006632 D017624 1277
C006632 D017624 4290
C051890 D017624 1277
C051890 D017624 4290

Discussion

  • The line order is a little different
  • You must have the join command installed
  • The flags -1 2 -2 1 simply says: join file1/column2 with file2/column1
  • The flag -o 1.1,1.2,2.2 says: output file1/column1, file1/column2, and file2/column2
  • Based on the command line order, file1=table1.txt, file2=table2.txt
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top