Question

I am doing Sequence Analysis in TraMineR using the Optimal Matching algorithm. Unfortunately, my sequences are of unequal length due to right-censored data. The minimum length of my sequences is 5, the maximum length 11. The variations in length are not meaningful for the dissimilarities between sequences that I am interested in. Therefore, I want to hold the influence of unequal length on the overall dissimilarities between sequences as small as possible.

I read a possible solution to this problem in Stovel and Bolan (2004 (1)), who use variable indel costs depending on whether or not the sequences are of equal length. So, for sequences of equal length they use fixed indel costs, and for unequal length they use a reduced cost, which is "roughly one-fourth of the fixed cost”.

My questions would be: In general, how should the missings be coded in TraMineR? As void elements or should I include a missing state into the alphabet? Is there an option in TraMineR to apply variable indel costs, as introduced by Stovel and Bolan? If yes, how can this be done?


(1) Stovel, Katherine and Marc Bolan. 2004. "Residential Trajectories: Using Optimal Alignment to Reveal the Structure of Residential Mobility." Sociological Methods & Research 32(4):559-598.

Was it helpful?

Solution

Currently, it is not possible to use variable indel costs(depending on whether or not the sequences are of equal length). I am quite sceptic regarding this method, because, if I understand it well, the definition of the distance measure change according to the sequences involved (since the indel costs change). For this reason, the triangle inequality is not respected. From a conceptual point of view, I think that we should always use the same comparison criteria, and thus the same distance definition.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top