The number of subsequences returned are not surprising at all. It is a matter of definition of 'subsequence', which should not be confused with 'substring'.
A sequence $x = (x_1, x_2, ... , x_3)$ is a subsequence of $y$ if its elements $x_i$ are all in $y$ and occur in the same order as in $y$. For instance, A-B-A is a subsequence of C-A-D-B-C-D-A-D.
To illustrate, consider the `mvad' example from the TraMineR package.
library(TraMineR)
data(mvad)
mvad.scodes <- c("EM", "FE", "HE", "JL", "SC", "TR")
mvad.seq <- seqdef(mvad, 17:86, states = mvad.scodes)
print(mvad.seq[1:3,], format="SPS")
## Sequence
##[1] (EM,4)-(TR,2)-(EM,64)
##[2] (FE,36)-(HE,34)
##[3] (TR,24)-(FE,34)-(EM,10)-(JL,2)
seqsubsn(mvad.seq)[1:3]
##[1] 7 4 16
By default, seqsubsn
computes the number of subsequences of the distinct successive states (DSS). The DSS of the first sequence, for example, is EM-TR-EM. The seven subsequences of EM-TR-EM are:
- the empty sequence
- the two sequences made of a single element: EM and TR
- the two-length subsequences: EM-TR, EM-EM, TR-EM
- the three-length sequence: EM-TR-EM
Proceeding the same way you can verify that your fourth sequence (that is equal to its DSS)
*-opened-*-discussed-merged-discussed
has 49 subsequences, of which the nine two-length subsequences:
*-open
, *-discussed
, *-merged
,
opened-*
, opened-discussed
, opened-merged
,
discussed-merged
, discussed-discussed
,
merged-discussed
Hope this helps