Question

I want to estimate the Transition Probability Matrix for a first order Markov chain from a given set of data sequences (i.e. clickstream data). Possibly in java, otherwise Matlab is ok.

I have each sequence in a different file (but of course I can merge everything in a single one) and one of the issues is that I don't have a standard length for the sequences. I Know the state space and I'm only interested in the state transitions.

I've read this: Estimate Markov Chain Transition Matrix in MATLAB With Different State Sequence Lengths but i'm not sure it fits to my problem. I was also wondering if there are Java libraries that handle this issues. If so, I wasn't able to find them.

Was it helpful?

Solution

You have to create a matrix which counts transitions.

For the row 1,4,4,6,7

You have to set

M(1,4)=M(1,4)+1
M(4,4)=M(4,4)+1
M(4,6)=M(4,6)+1
M(6,7)=M(4,7)+1

Finally normalize every row to sum 1.

Update: Using char indices. Matlab can transform every char to a number using double('A'), thus it is simple index shifting.

char2index=@(x)(double(x)-'A'+1)
index2char=@(x)(char(x+'A'-1))
M(char2index('A'),char2index('B'))=M(char2index('A'),char2index('B'))+1

The second function index2char transforms indices back to the character.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top