Question

I have 2 sequences, AACAGTTACC and TAAGGTCA, and I'm trying to find a global sequence alignment. I managed to create a 2D array and create the matrix, and I even filled it with semi-dynamic approach.

Here is my code to fill the matrix:

void process() {
    for (int i = 1; i <= sequenceA.length; i++) {
        for (int j = 1; j <= sequenceB.length; j++) {
            int scoreDiag = opt[i-1][j-1] + equal(i, j);
            int scoreLeft = opt[i][j-1] - 1;
            int scoreUp = opt[i-1][j] - 1;
            opt[i][j] = Math.max(Math.max(scoreDiag, scoreLeft), scoreUp);
        }
    }
}

private int equal(int i, int j) {
    if (sequenceA[i - 1] == sequenceB[j - 1]) {
        return 1;
    } else {
        return -1;
    }
}

My main problem is that this code generates this output:

 0   -1   -2   -3   -4   -5   -6   -7   -8     
-1   -1    0   -1   -2   -3   -4   -5   -6      
-2   -2    0    1    0   -1   -2   -3   -4     
 -3   -3   -1    0    0   -1   -2   -1   -2      
 -4   -4   -2    0   -1   -1   -2   -2    0      
 -5   -5   -3   -1    1    0   -1   -2   -1       
-6   -4   -4   -2    0    0    1    0   -1      
-7   -5   -5   -3   -1   -1    1    0   -1       
-8   -6   -4   -4   -2   -2    0    0    1      
 -9   -7   -5   -5   -3   -3   -1    1    0      
-10   -8   -6   -6   -4   -4   -2    0    0 

But I want it to look like this (all i care are numbers in the pictures):

enter image description here

I got to apply penaltys: per mismatch 1 and per gap 2, if its match 0.

Was it helpful?

Solution

There are several things that you need to modify:

  1. Note that in the image you give us the alignment goes from the bottom-right corner to the top-left corner. So in that image they are not really aligning AACAGTTACC and TAAGGTCA, but CCATTGACAA and ACTGGAAT.
  2. You say that you want a global alignment, but you actually compute a local alignment. The main difference is in penalties at the beginning of the sequences. In a global alignment you have to compute insertions and deletions at the first row and columns.
  3. Third, you are not applying correctly the penalties you mention. Instead, you always penalize with -1 and reward with +1.
  4. In the example image they are not taking the maximum value at each position, but the minimum (this is because your penalties are positive and the rewards is 0, not the other way around, so you want to minimize the values).

The full solution is:

// Note that these sequences are reversed!
String sequenceA ="CCATTGACAA";
String sequenceB = "ACTGGAAT";

// The penalties to apply
int gap = 2, substitution = 1, match = 0;

int[][] opt = new int[sequenceA.length() + 1][sequenceB.length() + 1];

// First of all, compute insertions and deletions at 1st row/column
for (int i = 1; i <= sequenceA.length(); i++)
    opt[i][0] = opt[i - 1][0] + gap;
for (int j = 1; j <= sequenceB.length(); j++)
    opt[0][j] = opt[0][j - 1] + gap;

for (int i = 1; i <= sequenceA.length(); i++) {
    for (int j = 1; j <= sequenceB.length(); j++) {
        int scoreDiag = opt[i - 1][j - 1] +
                (sequenceA.charAt(i-1) == sequenceB.charAt(j-1) ?
                    match : // same symbol
                    substitution); // different symbol
        int scoreLeft = opt[i][j - 1] + gap; // insertion
        int scoreUp = opt[i - 1][j] + gap; // deletion
        // we take the minimum
        opt[i][j] = Math.min(Math.min(scoreDiag, scoreLeft), scoreUp);
    }
}

for (int i = 0; i <= sequenceA.length(); i++) {
    for (int j = 0; j <= sequenceB.length(); j++)
        System.out.print(opt[i][j] + "\t");
    System.out.println();
}

The result is just as in the example you gave us (but reversed, remember!):

0   2   4   6   8   10  12  14  16  
2   1   2   4   6   8   10  12  14  
4   3   1   3   5   7   9   11  13  
6   4   3   2   4   6   7   9   11  
8   6   5   3   3   5   7   8   9   
10  8   7   5   4   4   6   8   8   
12  10  9   7   5   4   5   7   9   
14  12  11  9   7   6   4   5   7   
16  14  12  11  9   8   6   5   6   
18  16  14  13  11  10  8   6   6   
20  18  16  15  13  12  10  8   7

So the final alignment score is found at opt[sequenceA.length()][sequenceB.length()] (7). If you really need to show the reversed matrix as in the image, do this:

for (int i = sequenceA.length(); i >=0; i--) {
    for (int j = sequenceB.length(); j >= 0 ; j--)
        System.out.print(opt[i][j] + "\t");
    System.out.println();
}

OTHER TIPS

Have a look at http://en.wikipedia.org/wiki/Longest_common_substring, the code is pretty much copy-paste for several languages, and is easily adapted to also tell you the alignment index. I had to do a similar thing and ended up with https://github.com/Pomax/DOM-diff/blob/rewrite/rewrite/rewrite.html#L103

(The SubsetMapping it returns is basically a simple struct that gives the index for both contexts, https://github.com/Pomax/DOM-diff/blob/rewrite/rewrite/rewrite.html#L52)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top