Question

I was interested on evaluating a catalogue that students would be using to observe how is it being used probabilistically.

The catalogue works by choosing cells in a temporal sequence, so for example:

  • Student A has: ($t_1$,$Cell_3$),($t_2$,$Cell_4$)
  • Student B has: $(t_1,Cell_5),(t_2,Cell_3),(t_3,Cell_7)$.

Assume that the cells of the table are states of a Hidden Markov Model, so the transition between states would map in the real world to a student going from a given cell to another.

Assuming that the catalogue is nothing more than guidance, it is expected to have a certain kind of phenomenon to occur on a given artifact. Consider this artifact to be unique, say, for example a program.

What happens to this program is a finite list of observations, thus, for a given cell we have a finite list of observations for following the suggestion mentioned on that cell. On a HMM this would be then the probability associated to a state to generate a given observation in this artifact.

Finally, consider the catalogue to be structured in a way that initially it is expected that the probability to start in a given cell is equal. The catalogue does not suggest any starting point.

  • Question 1: Is the mapping between the catalogue and the HMM appropriate?

  • Question 2: Assuming question 1 holds true. Consider now that we train the HMM using as entries $(t_1,Cell_1), (t_2,Cell_3) , ... (t_n,Cell_n)$ for the students. Would the trained HMM, once asked to generate the transition between states that it is most likely yields as result what is the most used way by the people who used the catalogue for a given experiment $\epsilon$?

Was it helpful?

Solution

Ad Question 1: Assuming that your assumptions on how the catalogue is used -- that is the choice of the next cell only depends on the current (or constantly many preceeding) cell(s), not the (full) history -- then yes, you can use a Markov Chain to model it.

However, you do not seem to need the "Hidden" part; this is only useful if you have (probabilistic) output in the states which you observe. In contrast, you want to observe the students' cell states directly.

Imagine you can not observe which cells students are in but only how much they like the current cell; this could be implemented by notorious "Was this useful to you?" buttons; in general, assume every students gives feedback from $\{1, \dots, k\}$ in every cell but you do not know which cell they are in. Then you can use a HMM to estimate their cell sequence given their feedback sequence.

Ad Question 2: Let me illustrate the use continuing above train of thought. For $n=k=3$, we have the following Markov chain:

abstract HMM
[source]

$p_{i,j}$ is the probability to transition from state $i$ to state $j$; note that $p_{i,0}=0$ for all $i$ and those edges have been left out¹. Let $p_{i,\bot}$ the probability of terminating in state $i$ (we leave out another dummy state for clarity). $q_{i,l}$ is the probability to emit $l$ when entering (or, equivalently, leaving) state $i$; this models our feedback. Of course, we require $\sum_j p_{i,j} + p_{i,\bot} = \sum_l q_{i,l} = 1$ for all $i$.

Note that the state sequence $0,1,2$ has probability $p_{0,1} \cdot p_{1,2}$ -- that is a central property of Markov chains. The output sequence $1,2$, on the other hand, has probability $q_{0,1}\cdot p_{0,1}\cdot q_{1,2} + q_{0,1}\cdot p_{0,2}\cdot q_{2,2} + q_{0,1}\cdot p_{0,3}\cdot q_{3,2}$.

In a real world scenario, we would have to train our Markov chain. Let us assume we have some sequences of both state and output²:

$\qquad \begin{align} &(0,1), (1,1), (2,1), (3,1) \\ &(0,2), (1,3), (3,1) \\ &(0,1), (3,3), (1,2) \end{align}$

Now we just count how often each transition and output happened and set our probabilities to the relative frequencies³:

concrete HMM
[source]

Now you can do the interesting stuff. Ignoring the output, you can determine the path with highest probability, you can determine the output sequence with highest probability but most importantly, and here the hidden part comes into play, you can find the most likely state sequence given an output sequence with Viterbi algorithm.

Bottom line: yes, you can use Markov chains resp. hidden Markov models (depending on your needs) to model your scenario. Wether the fundamental assumption -- the probability for the next cell only depends on the current cell -- makes sense is another question entirely.


  1. We need a starting state, but we can make it a dummy by never going back. We can ignore its output or introduce a dummy output that is only and always emitted in state $0$.
  2. If we have only output sequences, we can use the forward-backward algorithm.
  3. Due to our low number of observations, we have a number of impossible transitions/emissions. Usually you assume that everything is possible and has just not been observed due to low probability. We can compensate for this by adding $1$ to all counts; this way, all sequences have probability $\gt 0$.
Licensed under: CC-BY-SA with attribution
Not affiliated with cs.stackexchange
scroll top