how to omit duplicates in pyparsing?

Question

I didn't make a single change to your parser, but made a few changes to your post-parsing code.

You are not really getting "duplicates", the issue is that you print out the current patient data every time you see a Gleason score, and some of your patient data records include multiple Gleason score entries. If I understand what you are trying to do, here is the pseudo-code I would follow:

accumulator = None
foreach match in (patientDataExpr | gleasonScoreExpr).searchString(source):

    if it's a patientDataExpr:
        if accumulator is not None:
            # we are starting a new patient data record, print out the previous one
            print out accumulated data
        initialize new accumulator with current match and empty list for gleason data

    else if it's a gleasonScoreExpr:
        add this expression into the current accumulator

# done with the for loop, do one last printout of the accumulated data
if accumulator is not None:
    print out accumulated data

This converts to Python pretty easily:

def printOut(patientDataTuple):
    pd,gleasonList = patientDataTuple
    print( "['{0.accDate}','{0.accNum}','{0.patientNum}',{1}]".format(
        pd, ','.join(''.join(gl.rhs) for gl in gleasonList)))

accumPatientData = None
for match in partMatch.searchString(TEXT):
    if match.patientData:
        if accumPatientData is not None:
            # this is a new patient data, print out the accumulated 
            # Gleason scores for the previous one
            printOut(accumPatientData)

        # start accumulating for a new patient data entry
        accumPatientData = (match.patientData, [])

    elif match.gleason:
        accumPatientData[1].append(match.gleason)
    #~ print match.dump()

if accumPatientData is not None:
    printOut(accumPatientData)

I don't think I'm dumping out the Gleason data correctly, but you can tune it from here, I think.

EDIT:

You can attach diceGleason as a parse action to gleason and get this behavior:

def diceGleasonParseAction(tokens):
    def diceGleason(glrhs,gllhs):
        if len(glrhs) == 0:
            pri = gllhs[0]
            sec = gllhs[2]
            #~ tot = pri + sec
            tot = str(int(pri)+int(sec))
            return [pri, sec, tot]
        elif len(glrhs) == 1:
            pri = gllhs[0]
            sec = gllhs[2]
            tot = glrhs
            return [pri, sec, tot]
        else:
            pri = glrhs[0]
            sec = glrhs[2]
            tot = gllhs
            return [pri, sec, tot]

    pri,sec,tot = diceGleason(tokens.gleason.rhs, tokens.gleason.lhs)

    # assign results names for later use
    tokens.gleason['pri'] = pri
    tokens.gleason['sec'] = sec
    tokens.gleason['tot'] = tot

gleason.setParseAction(diceGleasonParseAction)

You just had one typo where you summed pri and sec to get tot, but these are all strings, so you were adding '3' and '4' and getting '34' - converting to ints to do the addition was all that was needed. Otherwise, I kept diceGleason verbatim internal to diceGleasonParseAction, to isolate your logic for inferring pri, sec, and tot from the mechanics of embellishing the parsed tokens with new results names. Since the parse action does not return anything new, the tokens are updated in-place, and then carried along to be used later in your output method.