I realize you are looking for an awk solution, but your question struck me as one that could benefit from the power of python dictionaries. Below is a python script that performs your stated goal: to match all elements from file with entries in subfile, and output the appropriate Approved Symbol from subfile, or NA
otherwise.
Please note that this is written for Python 3.x -- but not hard to modify slightly for Python 2.x.
# Build dictionary of approved symbols from synonyms
approved_symbols = {}
with open("subfile") as subfile:
subfile.readline() # skip header line
for line in subfile:
columns = line.strip().split() # split into columns on whitespace
approved = columns[0]
for col in columns:
synonyms = col.split(',') # split into elements on comma
# Add each synonym to dictionary
for syn in synonyms:
approved_symbols[syn] = approved
# Process file
with open("file") as file:
for line in file:
for element in line.strip().split():
# If symbol found, print it, otherwise output "NA".
print(approved_symbols.get(element, "NA"), end='\t')
print('')
Output:
MAP2K4 FLNC
MYPN ACTN2
NA NA
NA NA