質問

A common modification to the Needleman-Wunsch to reduce running time is to only fill in the cells along a diagonal band of the matrix (slides 27/28). Let 2d + 1 represent the width of the band. Then based on the provided text, this ensures that the total number of inserts and deletes does not exceed d.

However, I can't understand the intuition behind the algorithm. If we let L represent a list of the size of each mutation in the alignment in-order (where negative sizes represent deletions), doesn't the band restriction merely say that the sum of values in any prefix slice of L must stay between -d and d? But the total number of mutations could easily exceed d.

E.g. If d = 4, throughout a long alignment, there could be 5 distinct mutations, where L refers to a series of [insertion, deletion, insertion, insertion, deletion] with sizes: [3,-2,2,1,-4]. Visually, such an alignment would represent a wiggly line through the DP matrix that results in plenty of small insertions and deletions, alternating to ensure that the alignment stays within the band.

In other words, the band does not actually restrict the total number of insertions and deletions as claimed? If I'm misunderstanding something, I'd appreciate any clarification. Thanks!

P.S. Was unsure if this post should go under the bioinformatics stackexchange beta.

正しい解決策はありません

ライセンス: CC-BY-SA帰属
所属していません cs.stackexchange
scroll top