You were almost there. You needed to match the extra tokens (that I assume you don't care about). Just make sure that the extra match comes at the end, so to doesn't gobble something you were interested in. Using names
as the list defined in your post:
from pyparsing import *
def marker(key):
return Combine(CaselessLiteral(key) + Word(nums))
pat = marker("a")
visit = marker("r")
clone = marker("clone")
primer = marker("ba") | marker("primer")
sep = oneOf("- _").suppress()
other = Word(alphanums + ":")
file_ext = Literal(".").suppress() + Word(alphanums)
EOL = LineEnd().suppress()
tokens = [pat("pat"),
visit("visit"),
clone("clone"),
primer("primer"),
sep,other]
grammar = OneOrMore(MatchFirst(tokens)) + file_ext + EOL
By giving the intermediate results a name, e.g. clone("clone")
we can create a dictionary of them for easy access:
for result in grammar.scanString(names):
print result[0].asDict()
resulting in
{'clone': 'clone11', 'primer': 'ba28', 'pat': 'a0038'}
{'clone': 'clone11', 'primer': 'ba31', 'pat': 'a038'}
{'clone': 'clone11', 'primer': 'ba32', 'pat': 'a0038'}
{'pat': 'a0001', 'primer': 'ba29', 'visit': 'r00'}
{'pat': 'a0001', 'primer': 'ba31', 'visit': 'r00'}
{'pat': 'a0001', 'primer': 'ba43', 'visit': 'r00'}
{'pat': 'a0001', 'primer': 'ba81', 'visit': 'r00'}
{'pat': 'a0002', 'primer': 'primer7', 'visit': 'r07'}
{'pat': 'a0053', 'primer': 'primer5', 'visit': 'r01'}
{'pat': 'a0016', 'primer': 'primer7', 'visit': 'r02'}
{'pat': 'a0054', 'primer': 'primer5', 'visit': 'r04'}
{'pat': 'a0054', 'primer': 'primer5', 'visit': 'r07'}
{'pat': 'a0037', 'primer': 'primer7', 'visit': 'r06'}
{'pat': 'a0037', 'primer': 'primer5', 'visit': 'r07'}
{'pat': 'a0041', 'primer': 'ba87', 'visit': 'r01'}
{'pat': 'a0094', 'primer': 'ba88', 'visit': 'r00'}
{'pat': 'a0094', 'primer': 'ba88', 'visit': 'r02'}
{'pat': 'a0107', 'primer': 'ba86', 'visit': 'r01'}
{'pat': 'a0111', 'primer': 'primer5', 'visit': 'r04'}
{'pat': 'a0179', 'primer': 'ba83', 'visit': 'r02'}