TraMineR: extract events between equal states from SPELL-based sequence data

StackOverflow https://stackoverflow.com/questions/16169914

  •  11-04-2022
  •  | 
  •  

Frage

Context

This question concerns sequence analysis using TraMineR package. The package offers automatic transformation of temporal sequences (statuses in time) to event sequences (changes between statuses in time). One of the recurrent issues in my analyses concerns the options to distinguish events of change between equal statuses.

Question-specific example

Suppose we have sequences of employment statuses, e.g. work, unemployment, inactivity, retirement. The analysis is focused on career transitions, distinguishing between stable and transitional careers. All kinds of transitions are relevant, from work to unemployment, inactivity to work, but also (and most importantly) from work to work!

Question

For TraMineR an event takes place when a status in a sequence is changed. For instance, the respondent had 3 years of work and then 1 in unemployment: Work-Work-Work-Unemployment (assuming annual interval). This is the STS format, representing statuses in time. However, in SPELL format we have additional information, e.g:

Status         Time1 Time2

Work           1     2
Work           2     3
Work           3     3
Unemployment   3     4

From the table above we can clearly see that two work-to-work transition events have occurred (otherwise there would be just one line: Work from 1 to 3). The question is whether there is any convenient way to extract an event object from the sequence object based on these data.

Data

My data contains work-related respondent statuses in the SPELL format (status, begin & end time), like this:

to.SO <- structure(list(ID = c(10, 11, 11, 12, 13, 13, 13, 13, 14, 14,     
         14, 14, 14, 14, 14, 14, 14, 15, 15, 15, 15, 15, 15, 15), status = c(1, 
         1, 1, 1, 1, 1, 1, 1, 2, 3, 1, 2, 3, 2, 3, 1, 1, 1, 3, 1, 3, 3,         
         1, 3), time1 = c(1, 1, 104, 1, 1, 60, 109, 121, 1, 42, 47, 54,         
         64, 72, 78, 85, 116, 1, 29, 39, 69, 74, 78, 88), time2 = c(125,        
         104, 125, 125, 60, 109, 121, 125, 42, 47, 54, 64, 72, 78, 85,          
         116, 125, 29, 39, 69, 74, 78, 88, 125)), .Names = c("ID", "status",    
         "time1", "time2"), row.names = 10:33, class = "data.frame") 

What I have tried

As per this post I must convert SPELL to STS first, then define sequences:

sts.data <- seqformat(data=to.SO,from="SPELL",to="STS",
                 id="ID",begin="time1",end="time2",status="status",
                 limit=125,process=FALSE)

sts.seq <- seqdef(sts.data,right="DEL")
alphabed <- c("Work","Study","Unemployed")
alphabet(sts.seq) <- alphabed

The information I require is already lost at this step, but until the bug (see link) is resolved there is no other way. It still shows what I want to achieve:

sts.seqe <- seqecreate(sts.seq) # creating events
sts.seqe

My results

Here, the first four event sequences are identical. If you look at the SPELL data (to.SO), it is apparent that there are multiple work-to-work transitions involved for respondents with id 11 and 13. In my other article I solve this by ascribing different statuses to job-1, job-2 and so forth. It is a less desirable strategy however, since it (1) explodes the number of statuses making subsequent dissimilarity analysis difficult and (2) is not theoretically important which job in career it is, the status of employment alone should cover it.

Thanks

I imagine this goes beyond the existing package capabilities, but perhaps I am missing something. Thanks in advance for reading this long post (at least) and for having any suggestions.

War es hilfreich?

Lösung

We could indeed imagine a solution which creates the event sequences from the spell data as you suggest. TraMineR does not offer this for now (but see Matthias' solution).

A work around, which you have already given in your question, is to distinguish the successive jobs as job1, job2, ...

I understand that this is less desirable, but you can use this strategy just for defining the event sequences assigning the same event, e.g. "start new job" to each transition from job i to job i+1. To do so you will need to specify a matrix (tmat) of size a x a where a is the size of your state alphabet, which lists in each cell(i, j), the events occurring when transiting from state i to state j. For example at the intersection of the row job1 and column job2, you would give "start new job", and since switching from job2 to job1 should not be possible you would just leave the corresponding cell empty. The cells tmat(i,i) on the diagonal define the start event when the state sequence starts in the corresponding state i. Once you have defined the matrix (tmat) giving the events assigned to each possible transition, you create the event sequence object as

seqe <- seqecreate(sts2.seq, tevent=tmat)

And you can still use your original sts.seq for state sequence analysis with a single work status.

Hope this helps.

Andere Tipps

'seqecreate' accepts different kinds of input. One of them is a state sequences object (as produced by seqdef). But you can also build an event sequences objects by providing data in TSE format. For this, you should specify three vectors: id, timestamp, and event.

The spell format can be viewed as data in the TSE format (if you ignore the end of period). The begin column gives the time the event in the status column occured.

Therefore, we can use the following code:

## Start by giving some labels to the status vector
to.SO$event <- factor(to.SO$status, levels=1:3, labels=c("Work","Study","Unemployed"))
## Now, we can build the event sequences using seqecreate
## You may want to use timestamp=(to.SO$time1-1) instead. Events sequences start at time=0
seqe <- seqecreate(id=to.SO$ID, timestamp=to.SO$time1, event=to.SO$event)
seqe

Now the fourth indiviudal has the correct event sequences

If you want to analyze the "Work>work" transition, then you need to recode your data.

## New vector holding our recoded events
event2 <- as.character(to.SO$event)
## For each row in the TSE data
for(i in 2:nrow(to.SO)){
    if(to.SO[i-1, "ID"]==to.SO[i, "ID"]) {## If we have the same ID (individual)
        if(to.SO[i-1, "event"]=="Work"&& to.SO[i, "event"]=="Work"){ ##Check 
           event2[i] <- "Work>Work"
        }
    }
}
## More general case
event3 <- as.character(to.SO$event)
## For each row in the TSE data
for(i in 2:nrow(to.SO)){
    if(to.SO[i-1, "ID"]==to.SO[i, "ID"]) {## If we have the same ID (individual)
        event3[i] <- paste(to.SO[i-1, "event"], to.SO[i, "event"], sep=">")
    }
}

By adapting this code, you can specify the transitions your are interested in.

seqe2 <- seqecreate(id=to.SO$ID, timestamp=to.SO$time1-1, event=event2)
seqe2

OR

seqe3 <- seqecreate(id=to.SO$ID, timestamp=to.SO$time1-1, event=event3)
seqe3
Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top