When using sequence analysis, we are interested in the evolution of one variable (for instance, a sequence of one variable across several waves). You have then multiple possibilities to analyze several variables:
- Create on sequences per variable and then analyze the links between the cluster of sequences. In my opinion, this is the best way to go, if your variables measure different concepts (for instance, family and employment).
- Create a new variable for each wave that is the
interaction
of the different variables of one wave using theinteraction
function. For instance, for wave one, useL$IntVar1 <- interaction(L$A1, L$B1, L$C1, drop=T)
(usedrop=T
to remove unused combination of answers). And then analyze the sequence of this newly created variable. In my opinion, this is the prefered way if your variables are different dimensions of the same concept. For instance, marriage, children and union are all related to familly life. - Create one sequence object per variable and then use
seqdistmc
to compute the distance (multi-channel sequence analysis). This is equivalent to the previous method depending on how you will set substitution costs (see below).
If you use the second strategy, you could use the following substitution costs. You can count the differences between the original variable to set the substition costs. For instance, between states "Married, Child" and "Not married and Child", you could set the substitution to "1" because there is only a difference on the "marriage" variable. Similarly, you would set the substition cost between states "Married, Child" and "Not married and No Child" to "2" because all of your variables are different. Finally, you set the indel cost to half the maximum substitution cost. This is the strategy used by seqdistmc
.
Hope this helps.