Needleman-wunsch Algorithm

The Needleman-Wunsch algorithm performs a global alignment on two sequences(called A and B here). It is commonly used by bioinformaticists to align protein or nucleotide sequences. The algorithm was proposed in 1970 by Saul Needleman and Christian Wunsch in their paper A general method applicable to the search for similarities in the amino acid sequence of two proteins. The Needleman-Wunsch algorithm is based on the concept of Dynamic programming, and so is guaranteed to find the alignment with the maximum score. Needleman-Wunsch the first time it had been applied to biological sequence comparison. The scoring system used gives scores for aligned characters, based on a similarity matrix. Here, S(i, j) is the similarity of characters i and j. It uses a linear gap penalty, here called d. For example, if the similarity matrix was
-AGCT
A10-1-3-4
G-17-5-3
C-3-590
T-4-308
then the alignment:
    AGACTAGTTAC    CGA---GACGT 
with a gap penalty of -5, would have the following score...
    S(A,C) + S(G,G) + S(A,A) + 3\times d + S(G,G) + S(T,A) + S(T,C) + S(A,G) + S(C,T)    = -3 + 7 + 10 - 3\times 5 + 7 + -4 + 0 + -1 + 0 = 1 
To find the alignment with the highest score, a two dimensional array(or matrix) is allocated. This matrix is often called the F matrix, and its i,jth entry is often denoted F_{ij} There is one column for each character in sequence A, and one row for each character in sequence B. This means that if we are aligning two sequences, one of size n and one of size m, this algorithm uses O(nm) of space(although there is a modification called Hirschberg's insight, which uses linear space and more computing time). As the algorithm progresses, the F_{ij} will be assigned to be the optimal score for the alignment of the first i characters in A and the first j characters in B. The principle of optimality is then applied as follows...
    Basis:    F_{11} = 0    F_{1j} = 0    F_{i1} = 0    Recursion, based on the principle of optimality:    F_{ij} = \max(F_{i-1,j-1} + S(A_{i-1}, B_{j-1}), F_{i,j-1} + d, F_{i-1,j} + d) 
The pseudo-code for the algorithm to compute the F matrix therefore looks like this(array indexes start at 0)...
    for i=0 to length(A)-1      F(i,0) <- 0    for j=0 to length(B)-1      F(0,j) <- 0    for i=1 to length(A)-1      for j = 1 to length(B)-1      {        Choice1 <- F(i-1,j-1) + S(A(i-1), B(j-1))        Choice2 <- F(i-1, j) - d        Choice3 <- F(i, j-1) - d        F(i,j) <- max(Choice1, Choice2, Choice3)      } 
Once the F matrix is computed, note that the bottom right hand corner of the matrix is the maximum score for any alignments. To compute which alignment actually gives this score, you can start from the bottom left cell, and compare the value with the three possible sources(Choice1, Choice2, and Choice3 above) to see which it came from. If it was Choice1, then A(i) and B(i) are aligned, if it was Choice2 then A(i) is aligned with a gap, and if it was Choice3, then B(i) is aligned with a gap.
    AlignmentA <- ""    AlignmentB <- ""    i <- length(A) - 1    j <- length(B) - 1    while (i > 0 AND j > 0)    {      Score <- F(i,j)      ScoreDiag <- F(i - 1, j - 1)      ScoreUp <- F(i, j - 1)      ScoreLeft <- F(i - 1, j)      if (Score - S(A(i), B(j)) == ScoreDiag)      {        AlignmentA <- A(i) + AlignmentA        AlignmentB <- B(j) + AlignmentB        i <- i - 1        j <- j - 1      }      else if (Score - d == ScoreLeft)      {        AlignmentA <- A(i) + AlignmentA        AlignmentB <- "-" + AlignmentB        i <- i - 1      }      otherwise (Score - d == ScoreUp)      {        AlignmentA <- "-" + AlignmentA        AlignmentB <- B(j) + AlignmentB        j <- j - 1      }    }    while (i >= 0)    {      AlignmentA <- A(i) + AlignmentA      AlignmentB <- "-" + AlignmentB      i <- i - 1    }    while (j >= 0)    {      AlignmentA <- "-" + AlignmentA      AlignmentB <- B(j) + AlignmentB      j <- j - 1 
}

 

<< PreviousWord BrowserNext >>
expressways of china
palo monte
black rock
sarah manners
vytorin
dejan stefanovic
swedish steam battleships
franz josef i of liechtenstein
five ways
thami
matthew taylor (footballer)
r. c. lehmann
caulfield
b of the bang
militainment
arak
old japanese language
line out code
ascom
robot monster
democratic left
chadstone
democratic left (united kingdom)
trade marks act 1994
daniel boucher
narodism
huskvarna
main ridge
dysarthria
moon maid
phosphine as a fumigant
torch theatre, dublin
hockey at the 1996 summer olympics
tous
william watson
masaru emoto
hockey at the 1992 summer olympics
savalia
similarity matrix
hockey at the 1988 summer olympics
paul scherrer
hockey at the 1984 summer olympics
cibse
hockey at the 1980 summer olympics