Domanda

so i have a data set where i have years students were in classes, and what quarter of the year they were in, so 2002 is 4 times and it has quarter 1,2,3,4, like below:

matrix(c(1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2002,2002,2002,2002,2003,2003,2003,2002,2002,2002,2002,2003,2003,2003,2003,1,2,3,4,1,2,3,4,1,2,3,4,1,2,3),ncol=3,dimnames=list(c(NULL),c("ids","year","quarter")))

which gives me this

      ids year quarter
 [1,]   1 2002       1
 [2,]   1 2002       2
 [3,]   1 2002       3
 [4,]   1 2002       4
 [5,]   1 2003       1
 [6,]   1 2003       2
 [7,]   1 2003       3
 [8,]   2 2002       4
 [9,]   2 2002       1
[10,]   2 2002       2
[11,]   2 2002       3
[12,]   2 2003       4
[13,]   2 2003       1
[14,]   2 2003       2
[15,]   2 2003       3

i want to generate a sequence where it creates a new variable cumuluating the number of quarters, it wont be hard to merge year and quarter if i have to, but how do i tell it to do a sequence like

structure(c(1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2002, 
2002, 2002, 2002, 2003, 2003, 2003, 2002, 2002, 2002, 2002, 2003, 
2003, 2003, 2003, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 
1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 8), .Dim = c(15L, 4L
), .Dimnames = list(NULL, c("ids", "year", "quarter", "sequence quarters"
)))

giving me this

      ids year quarter sequence quarters
 [1,]   1 2002       1                 1
 [2,]   1 2002       2                 2
 [3,]   1 2002       3                 3
 [4,]   1 2002       4                 4
 [5,]   1 2003       1                 5
 [6,]   1 2003       2                 6
 [7,]   1 2003       3                 7
 [8,]   2 2002       4                 1
 [9,]   2 2002       1                 2
[10,]   2 2002       2                 3
[11,]   2 2002       3                 4
[12,]   2 2003       4                 5
[13,]   2 2003       1                 6
[14,]   2 2003       2                 7
[15,]   2 2003       3                 8

i have tried rep command and sequence and such but i dont know how to tell it to restart the numbering after each participant. the number of quarters differs by student, and i dont need to know which quarter they start in, this is university data so they can start in quarter 2 i suppose (i haven't looked at the entire data set on the start values for all 6K participants or so) but i just need it to cumulate. i hope this question is appropriate and i formatted my question right.

È stato utile?

Soluzione

Use ave by ids and call seq:

 transform(dat , seqs = ave(dat[,'ids'],dat[,'ids'],FUN=seq))
 ids year quarter seqs
1    1 2002       1    1
2    1 2002       2    2
3    1 2002       3    3
4    1 2002       4    4
5    1 2003       1    5
6    1 2003       2    6
7    1 2003       3    7
8    2 2002       4    1
9    2 2002       1    2
10   2 2002       2    3
11   2 2002       3    4
12   2 2003       4    5
13   2 2003       1    6
14   2 2003       2    7
15   2 2003       3    8
Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top