Count the number of times a string appears in a sequence

https://stackoverflow.com/questions/12443075

02-07-2021
|

Question

I have a matrix X which comprises of some sequences I have from a Markov Chain. I have 5 states 1,2,3,4,5. So for example row 1 is a sequence and row 2 an separate independent sequence.

    4   4   4   4   4   5   3   0   0   0
    1   4   2   2   2   2   3   4   0   0
x=  4   4   1   2   1   3   1   0   0   0
    2   4   4   2   4   3   3   5   0   0
    4   4   5   4   2   1   2   4   3   5

I'd like to count the number of transitions between states 1..5. ie. 1to1,1to2, 1to3, 1to4, 1to5. 2to1 etc. Eg. 1to1 happens 0 times. However 4to4 happens 6 times. etc. We can ignore the zeros, they are an artefact from importing an excel file.

Eg this question but there, the sequence has been concatenated. Please let me know if you need further clarification.

Solution

Here's code that does what you want:

N = max(max(X));                                   %# Number of states
[P, Q] = meshgrid(1:N, 1:N);
Y = [X, zeros(size(X, 1), 1)]';                    %# Pad for concatenation
count_func = @(p, q)numel(strfind(Y(:)', [p, q])); %# Counts p->q transitions
P = reshape(arrayfun(count_func, P, Q), N, N)

Short explanation: all lines of X into one long vector Y (the padding is necessary so that there are no undesired transitions in adjacent lines). p and q hold all possible combinations for state transitions, and count_func counts the number of transitions in Y for a specific p and q. arrayfun invokes count_func for all possible combinations of p and q and produces matrix P accordingly.

For your example, this code yields matrix P:

P =
     0   2   1   1   0
     2   3   0   3   0
     1   1   1   2   1
     1   3   1   7   1
     0   0   2   2   0

where P(m, n) indicates the number of transitions from the m-th state to the n-th state.

EDIT: If you're interested in finding the 2-step transition matrix (that is, i-th state → j-th state → i-th state) as in your follow-up question, you just need to slightly alter count_func, like so:

count_func = @(p, q)numel(strfind(Y(:)', [p, q, p]));

This should yield:

P =

   0   1   0   0   0
   1   2   0   1   0
   1   0   0   0   0
   0   0   0   3   0
   0   0   0   1   0

OTHER TIPS

An alternative solution:

%# Define the example data:
x = [
4 4 4 4 4 5 3 0 0 0
1 4 2 2 2 2 3 4 0 0
4 4 1 2 1 3 1 0 0 0
2 4 4 2 4 3 3 5 0 0
4 4 5 4 2 1 2 4 3 5
];

%# Number of different states.
N = max(max(x));

%# Original states.
OrigStateVector = repmat((1:N)', N, 1);

%# Destination states corresponding to OrigStateVector.
DestStateVector = reshape(repmat((1:N)', 1, N)', N^2, 1);

%# Pad rows of x with zeros and reshape it to a horizontal vector.
xVector = reshape([ x, zeros(size(x,1),1) ]', 1, numel(x)+size(x,1));

%# Compute the number of state transitions and store the result in ResultMatrix.
ResultMatrix = reshape(cellfun(@(z) numel(z), arrayfun(@(x,y) strfind(xVector, [x y]), OrigStateVector, DestStateVector, 'UniformOutput', false)), N, N)';

ResultMatrix =
 0     2     1     1     0
 2     3     0     3     0
 1     1     1     2     1
 1     3     1     7     1
 0     0     2     2     0

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow