So I tried asking this before, but I guess I wasn't really clear enough with what I was looking for. I'm making an optimal string alignment algorithm, it's really just a dynamic programming problem. So I decided to write it recursively. The program consists of two parts:
- Finding the "edit distance" between two words, where you minimize alignments based on some associated cost. For example, aligning the same letter costs 0, aligning two vowels costs 0.5, but aligning a letter with a gap costs 1.
- Visualizing the alignment: i.e. putting the strings on top of each other with the optimal alignment of characters and gaps.
I think my edit distance works. I'm getting the same values as my peers and there seem to be no immediate problems. However, I'm having a hard time figuring out how to recover the sequence of matchings, insertions, and deletions to visualize the alignment. My problem comes from the fact that I have a recursive function that takes the minimum of three recursive calls. So, I end up with a longer sequence than necessary because each recursive call appends a "move" (match, insertion, deletion), that may not be used because it is not the least costly one.
Here is my code:
newseq = []
@memoize
def opt(a, b):
global newseq # Visual Alignment 'move' sequence
gap = 1 # Gap cost
if not a:
return len(b)
if not b:
return len(a)
if a and b:
p1 = a[0] in v # First letters vowells?
p2 = b[0] in v
if a[0] == b[0]: # First letters equal each other?
alpha = 0
elif p1 ^ p2: # Vowel and Consonant?
alpha = 1.2
elif p1 and p2: # Vowel and Vowel?
alpha = 0.5
else: # Consonant and Consonant?
alpha = 0.6
r1 = opt(a[1:], b[1:]) + alpha
r2 = opt(a[1:], b) + gap
r3 = opt(a, b[1:]) + gap
# Reset newseq
newseq = newseq[:-3]
# Takes min of recursive calls, and gives associated 'move'
res = min((r1, 'Match'), # Match
(r2, 'Insertion'), # Insertion
(r3, 'Deletion'), # Deletion
key = lambda x: x[0])
newseq.append(res[1])
return res[0]
So yea, I know that what I have is not working. My global newseq
variable currently has a length of one because I try to reset it by removing all the appends that happen during the recursive calls. How do I set up a way to record the sequence of 'moves' that make up the optimal alignment with this recursive algorithm?
Edit: Here is my memoize decorator function:
def memoize(f):
cache = {}
def decorated_function(*args):
if args in cache:
return cache[args]
else:
cache[args] = f(*args)
return cache[args]
return decorated_function