Question

So I tried asking this before, but I guess I wasn't really clear enough with what I was looking for. I'm making an optimal string alignment algorithm, it's really just a dynamic programming problem. So I decided to write it recursively. The program consists of two parts:

  • Finding the "edit distance" between two words, where you minimize alignments based on some associated cost. For example, aligning the same letter costs 0, aligning two vowels costs 0.5, but aligning a letter with a gap costs 1.
  • Visualizing the alignment: i.e. putting the strings on top of each other with the optimal alignment of characters and gaps.

I think my edit distance works. I'm getting the same values as my peers and there seem to be no immediate problems. However, I'm having a hard time figuring out how to recover the sequence of matchings, insertions, and deletions to visualize the alignment. My problem comes from the fact that I have a recursive function that takes the minimum of three recursive calls. So, I end up with a longer sequence than necessary because each recursive call appends a "move" (match, insertion, deletion), that may not be used because it is not the least costly one.

Here is my code:

newseq = []
@memoize
def opt(a, b):
    global newseq # Visual Alignment 'move' sequence
    gap = 1 # Gap cost
    if not a: 
        return len(b)
    if not b: 
        return len(a)

    if a and b:
        p1 = a[0] in v   # First letters vowells?
        p2 = b[0] in v   
        if a[0] == b[0]: # First letters equal each other?
            alpha = 0
        elif p1 ^ p2:    # Vowel and Consonant?
            alpha = 1.2
        elif p1 and p2:  # Vowel and Vowel?
            alpha = 0.5
        else:            # Consonant and Consonant?
            alpha = 0.6

    r1 = opt(a[1:], b[1:]) + alpha
    r2 = opt(a[1:], b) + gap
    r3 = opt(a, b[1:]) + gap
    # Reset newseq
    newseq = newseq[:-3]

    # Takes min of recursive calls, and gives associated 'move'
    res = min((r1, 'Match'),      # Match
              (r2, 'Insertion'),  # Insertion
              (r3, 'Deletion'),   # Deletion
              key = lambda x: x[0])

    newseq.append(res[1])
    return res[0]

So yea, I know that what I have is not working. My global newseq variable currently has a length of one because I try to reset it by removing all the appends that happen during the recursive calls. How do I set up a way to record the sequence of 'moves' that make up the optimal alignment with this recursive algorithm?

Edit: Here is my memoize decorator function:

def memoize(f):
    cache = {}
    def decorated_function(*args):
        if args in cache:
            return cache[args]
        else:
            cache[args] = f(*args)
            return cache[args]
    return decorated_function
Was it helpful?

Solution

1) Pass as an argument through your recursive function a Stack (or other collection).

2) When you call yourself recursively, also push onto the stack what step you are taking (e.g. using an enumeration of step types and ints/chars/strings/whatever indicating what it is doing).

3) When you return from the call in 2), pop the stack and repeat 2).

4) When you have a solution, you can store the stack associated with its result/score.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top