유전체학 마커를 가로 지르는 변화하는 방출 매트릭스를 사용하여 ViterBI 알고리즘을 HMM에서 구현하기

https://stackoverflow.com//questions/20052521

26-12-2019
|

문제

SNP 유전자형 데이터를 기반으로 조상을 지정하는 숨겨진 마르코프 접근 방식을 구현하는 데 도움을 요청하고 싶습니다.이와 같이 생성 된 전환 행렬이 있음을 감안할 때 :

states <- c("A1","A2","A3","A4","A5","A6","A7","A8") # Define the names of the states
A1 <- c(0.9,0.1,0.1,0.1,0.1,0.1,0.1,0.1) # Set the probabilities of switching states, where the previous state was "A1"
A2 <- c(0.1,0.9,0.1,0.1,0.1,0.1,0.1,0.1) # Set the probabilities of switching states, where the previous state was "A2"
A3 <- c(0.1,0.1,0.9,0.1,0.1,0.1,0.1,0.1) # Set the probabilities of switching states,  where the previous state was "A3"
A4 <- c(0.1,0.1,0.1,0.9,0.1,0.1,0.1,0.1) # Set the probabilities of switching states, where the previous state was "A4"
A5 <- c(0.1,0.1,0.1,0.1,0.9,0.1,0.1,0.1) # Set the probabilities of switching states, where the previous state was "A5"
A6 <- c(0.1,0.1,0.1,0.1,0.1,0.9,0.1,0.1) # Set the probabilities of switching states, where the previous state was "A6"
A7 <- c(0.1,0.1,0.1,0.1,0.1,0.1,0.9,0.1) # Set the probabilities of switching states, where the previous state was "A7"
A8 <- c(0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.9) # Set the probabilities of switching states, where the previous state was "A8"
thetransitionmatrix <- matrix(c(A1,A2,A3,A4,A5,A6,A7,A8), 8, 8, byrow = TRUE) # Create an 8 x 8 matrix
rownames(thetransitionmatrix) <- states
colnames(thetransitionmatrix) <- states
thetransitionmatrix # Print out the transition matrix
    A1  A2  A3  A4  A5  A6  A7  A8
A1 0.9 0.1 0.1 0.1 0.1 0.1 0.1 0.1
A2 0.1 0.9 0.1 0.1 0.1 0.1 0.1 0.1
A3 0.1 0.1 0.9 0.1 0.1 0.1 0.1 0.1
A4 0.1 0.1 0.1 0.9 0.1 0.1 0.1 0.1
A5 0.1 0.1 0.1 0.1 0.9 0.1 0.1 0.1
A6 0.1 0.1 0.1 0.1 0.1 0.9 0.1 0.1
A7 0.1 0.1 0.1 0.1 0.1 0.1 0.9 0.1
A8 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.9

및 방출 매트릭스는 N8x4 매트릭스 목록이며, n은 데이터의 SNP / 행 수와 동일한 N과 같습니다.예를 들어, 3 개의 SNP / 행을 가로 질러 8 개의 샘플 (A1-A8)에 대한 다음 데이터가 주어졌습니다.

A1 A2 A3 A4 A5 A6 A7 A8 T T T T T T T C T C T T T T T C A A A G G A A A
.

매트릭스 1 목록에서

A C G T A1 0 0 0 1/7 A2 0 0 0 1/7 A3 0 0 0 1/7 A4 0 0 0 1/7 A5 0 0 0 1/7 A6 0 0 0 1/7 A7 0 0 0 1/7 A8 0 1 0 0
.

샘플 중 7 개가 행 1에 T를 소지하기 때문에 각 샘플은 1/7의 확률이 있습니다.A8만이 C를 가지고 있기 때문에 A8에 C ~ A8을 할당 할 확률이 100 %입니다.행 3의 경우 출력은

이어야합니다.
A C G T A1 1/6 0 0 0 A2 1/6 0 0 0 A3 1/6 0 0 0 A4 1/2 0 0 0 A5 1/2 0 0 0 A6 1/6 0 0 0 A7 1/6 0 0 0 A8 1/6 0 0 0
.

전술 한 전이 행렬 및 방출 매트릭스 목록을 사용하여, 모든 대립 유전자의 서열에 비터 비 알고리즘을 착신하고자한다.내가 현재 가지고있는 코드는 각 행에 대해 다른 방출 매트릭스를 사용할 수 없습니다

viterbi <- function(sequence, transitionmatrix, emissionmatrix) # This carries out the Viterbi algorithm. # Adapted from "Applied Statistics for Bioinformatics using R" by Wim P. Krijnen, page 209 # ( cran.r-project.org/doc/contrib/Krijnen-IntroBioInfStatistics.pdf ) { # Get the names of the states in the HMM: states <- rownames(theemissionmatrix) # Make the Viterbi matrix v: v <- makeViterbimat(sequence, transitionmatrix, emissionmatrix) # Go through each of the rows of the matrix v (where each row represents # a position in the DNA sequence), and find out which column has the # maximum value for that row (where each column represents one state of # the HMM): mostprobablestatepath <- apply(v, 1, function(x) which.max(x)) # Print out the most probable state path: prevnucleotide <- sequence[1] prevmostprobablestate <- mostprobablestatepath[1] prevmostprobablestatename <- states[prevmostprobablestate] startpos <- 1 for (i in 2:length(sequence)) { nucleotide <- sequence[i] mostprobablestate <- mostprobablestatepath[i] mostprobablestatename <- states[mostprobablestate] if (mostprobablestatename != prevmostprobablestatename) { print(paste("Positions",startpos,"-",(i-1), "Most probable state = ", prevmostprobablestatename)) startpos <- i } prevnucleotide <- nucleotide prevmostprobablestatename <- mostprobablestatename } print(paste("Positions",startpos,"-",i, "Most probable state = ", prevmostprobablestatename)) } # the viterbi() function requires a second function makeViterbimat(): makeViterbimat <- function(sequence, transitionmatrix, emissionmatrix) # This makes the matrix v using the Viterbi algorithm. # Adapted from "Applied Statistics for Bioinformatics using R" by Wim P. Krijnen, page 209 # ( cran.r-project.org/doc/contrib/Krijnen-IntroBioInfStatistics.pdf ) { # Change the sequence to uppercase sequence <- toupper(sequence) # Find out how many states are in the HMM numstates <- dim(transitionmatrix)[1] # Make a matrix with as many rows as positions in the sequence, and as many # columns as states in the HMM v <- matrix(NA, nrow = length(sequence), ncol = dim(transitionmatrix)[1]) # Set the values in the first row of matrix v (representing the first position of the sequence) to 0 v[1, ] <- 0 # Set the value in the first row of matrix v, first column to 1 v[1,1] <- 1 # Fill in the matrix v: for (i in 2:length(sequence)) # For each position in the DNA sequence: { for (l in 1:numstates) # For each of the states of in the HMM: { # Find the probabilility, if we are in state l, of choosing the nucleotide at position in the sequence statelprobnucleotidei <- emissionmatrix[l,sequence[i]] # v[(i-1),] gives the values of v for the (i-1)th row of v, ie. the (i-1)th position in the sequence. # In v[(i-1),] there are values of v at the (i-1)th row of the sequence for each possible state k. # v[(i-1),k] gives the value of v at the (i-1)th row of the sequence for a particular state k. # transitionmatrix[l,] gives the values in the lth row of the transition matrix, xx should not be transitionmatrix[,l]? # probabilities of changing from a previous state k to a current state l. # max(v[(i-1),] * transitionmatrix[l,]) is the maximum probability for the nucleotide observed # at the previous position in the sequence in state k, followed by a transition from previous # state k to current state l at the current nucleotide position. # Set the value in matrix v for row i (nucleotide position i), column l (state l) to be: v[i,l] <- statelprobnucleotidei * max(v[(i-1),] * transitionmatrix[,l]) } } return(v) }
.

해결책

단순히 하나의 사전 계산 된 배출 행렬 목록을 단순히 멈추는 것은 무엇입니까?

makeViterbimat <- function(sequence, transitionmatrix, emissionmatrixList)
  # This makes the matrix v using the Viterbi algorithm.
  # Adapted from "Applied Statistics for Bioinformatics using R" by Wim P. Krijnen, page 209
  # ( cran.r-project.org/doc/contrib/Krijnen-IntroBioInfStatistics.pdf )
  {
 # Change the sequence to uppercase
 sequence <- toupper(sequence)
 # Find out how many states are in the HMM
 numstates <- dim(transitionmatrix)[1]
 # Make a matrix with as many rows as positions in the sequence, and as many
 # columns as states in the HMM
 v <- matrix(NA, nrow = length(sequence), ncol = dim(transitionmatrix)[1])
 # Set the values in the first row of matrix v (representing the first position of the sequence) to 0
 v[1, ] <- 0
 # Set the value in the first row of matrix v, first column to 1
 v[1,1] <- 1
 # Fill in the matrix v:
 for (i in 2:length(sequence)) # For each position in the DNA sequence:
 {
    emissionmatrix = emissionmatrixList[[i]]
    for (l in 1:numstates) # For each of the states of in the HMM:
    {
       # Find the probabilility, if we are in state l, of choosing the nucleotide at position in the sequence
       statelprobnucleotidei <- emissionmatrix[l,sequence[i]]

       # v[(i-1),] gives the values of v for the (i-1)th row of v, ie. the (i-1)th position in the sequence.
       # In v[(i-1),] there are values of v at the (i-1)th row of the sequence for each possible state k.
       # v[(i-1),k] gives the value of v at the (i-1)th row of the sequence for a particular state k.

       # transitionmatrix[l,] gives the values in the lth row of the transition matrix, xx should not be transitionmatrix[,l]?
       # probabilities of changing from a previous state k to a current state l.

       # max(v[(i-1),] * transitionmatrix[l,]) is the maximum probability for the nucleotide observed
       # at the previous position in the sequence in state k, followed by a transition from previous
       # state k to current state l at the current nucleotide position.

       # Set the value in matrix v for row i (nucleotide position i), column l (state l) to be:
       v[i,l] <-  statelprobnucleotidei * max(v[(i-1),] * transitionmatrix[,l])
    }
}
return(v)
  }

또는이 방출 성형 목록을 구성하는 방법에 대한 문제가 있습니까?

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 StackOverflow