Python numpy performing very slow

https://stackoverflow.com/questions/23422274

14-07-2023
|

Question

I'm trying to implement hidden markov model training in python and the resultant numpy code seems very slow. It takes 30 minutes to train a model. Below is my code and I do agree that it is terribly inefficient. I tried learning about numpy vectorization and advanced indexing methods, but couldn't figure it as how to use them in my code. I could determine that much of the execution is concentrated and more then 99% of the execution time is taken by the reestimate() function, especially the part where it prints CHK5 and CHK6.

    def reestimate(self):
        newTransition = numpy.zeros(shape=(int(self.num_states),int(self.num_states)))
        newOutput = numpy.zeros(shape=(int(self.num_states),int(self.num_symbols)))
        numerator = numpy.zeros(shape=(int(self.num_obSeq),))
        denominator = numpy.zeros(shape=(int(self.num_obSeq),))
        sumP = 0
        i = 0
        print "CHK1"
        while i < self.num_states:
            j=0
            while j < self.num_states:
                if j < i or j > i + self.delta:
                    newTransition[i][j] = 0
                else:
                    k=0
                    print "CHK2"
                    while k < self.num_obSeq:
                        numerator[k] = denominator[k] = 0
                        self.setObSeq(self.obSeq[k])

                        sumP += self.computeAlpha()
                        self.computeBeta()
                        t=0
                        while t < self.len_obSeq - 1:
                            numerator[k] += self.alpha[t][i] * self.transition[i][j] * self.output[j][self.currentSeq[t + 1]] * self.beta[t + 1][j]
                            denominator[k] += self.alpha[t][i] * self.beta[t][i]
                            t += 1
                        k += 1
                    denom=0
                    k=0
                    print "CHK3"
                    while k < self.num_obSeq:
                        newTransition[i,j] += (1 / sumP) * numerator[k]
                        denom += (1 / sumP) * denominator[k]
                        k += 1
                    newTransition[i,j] /= denom
                    newTransition[i,j] += self.MIN_PROBABILITY
                j += 1
            i += 1
        sumP = 0
        i = 0
        print "CHK4"
        while i < self.num_states:
            j=0
            while j < self.num_symbols:
                k=0
                while k < self.num_obSeq:
                    numerator[k] = denominator[k] = 0
                    self.setObSeq(self.obSeq[k])
                    # print self.obSeq[k]
                    sumP += self.computeAlpha()
                    self.computeBeta()
                    t=0
                    print "CHK5"
                    while t < self.len_obSeq - 1:
                        if self.currentSeq[t] == j:
                            numerator[k] += self.alpha[t,i] * self.beta[t,i]
                        denominator[k] += self.alpha[t,i] * self.beta[t,i]
                        t += 1
                    k += 1
                denom=0
                k=0
                print "CHK6"
                while k < self.num_obSeq:
                    newOutput[i,j] += (1 / sumP) * numerator[k]
                    denom += (1 / sumP) * denominator[k]
                    k += 1
                newOutput[i,j] /= denom
                newOutput[i,j] += self.MIN_PROBABILITY,
                j += 1
            i += 1
        self.transition = newTransition
        self.output = newOutput

    def train(self):
        i = 0
        while i < 20:
            self.reestimate()
            print "reestimating....." ,i
            i += 1

Solution

It is straightforward to vectorize your inner loops. Here is an example for the second part of your code (untested of course):

print "CHK4"
for i in xrange(self.num_states):
    for j in xrange(self.num_symbols):
        for k in xrange(self.num_obSeq):
            self.setObSeq(self.obSeq[k])
            # print self.obSeq[k]
            sumP += self.computeAlpha()
            self.computeBeta()
            alpha_times_beta = self.alpha[:,i] * self.beta[:,i]
            numerator[k] = numpy.sum(alpha_times_beta[self.currentSeq == j])
            denominator[k] = numpy.sum(alpha_times_beta)
        denom = numpy.sum(denominator)
        newOutput[i,j] = numpy.sum(numerator) / (sumP * denom) + self.MIN_PROBABILITY
self.transition = newTransition
self.output = newOutput

It might be possible to also vectorize the outer loops, but by far the biggest gain is usually obtained by focusing on the inner loops only. Some comments:

It seems that most of your while loops can be turned into for loops. Even though this does not make a lot of difference for speed, it is the preferred way if you know the number of iterations before the loop.
The convention is to use import numpy as np, and use np.function in the rest of the code
Simple loops that just compute a sum (accum = 0; for item in vector: accum += item) should be vectorized like accum = np.sum(vector).
Conditional summing in a loop can be converted to a vectorized sum with boolean indexing, so accum = 0; for i in range(n): if cond[i]: accum += vector[i] can be replaced with accum = np.sum(vector[cond])

I am interested to know how much faster your code becomes after these modifications, I guess you can easily gain more than a factor 10.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow