Question

I have a data set that looks like this:

structure(list(A = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", 
"14", "15", "16", "17", "18", "19", "20", "21", "22", "23", "24", 
"25"), class = "factor"), T = c(0.04, 0.08, 0.12, 0.16, 0.2, 
0.24), X = c(464.4, 464.4, 464.4, 464.4, 464.4, 464.4), Y = c(418.5, 
418.5, 418.5, 418.5, 418.5, 418.5), V = c(0, 0, 0, 0, 0, 0), 
    GD = c(0, 0, 0, 0, 0, 0), ND = c(NA, 0, 0, 0, 0, 0), ND2 = c(NA, 
    0, 0, 0, 0, 0), TID = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = c("t1", 
    "t10", "t11", "t12", "t13", "t14", "t15", "t16", "t17", "t18", 
    "t19", "t2", "t20", "t21", "t22", "t23", "t24", "t25", "t3", 
    "t4", "t5", "t6", "t7", "t8", "t9"), class = "factor")), .Names = c("A", 
"T", "X", "Y", "V", "GD", "ND", "ND2", "TID"), row.names = c(NA, 
6L), class = "data.frame")

I want to select the first 80 observations of all variables for each TID. So far, I can do this with the first TID only using the code:

sub.data1<-NM[1:80, ]

How can I do it for all my other TIDs?

Thanks!

Was it helpful?

Solution 2

Using function ddply() from plyr you can split data by TID and then select forst 80 with head() and then put all again in one data frame,

library(plyr)
ddply(NM, .(TID), head, n = 80)

OTHER TIPS

I would do:

lapply(split(dat, dat$TID), head, 80)

It returns a list of data.frames with 80 (or less) rows. If instead you want everything into one data.frame:

do.call(rbind, lapply(split(dat, dat$TID), head, 80))

Using data tables, I made a shorter example with just TIDs t1 and t2 that returns the first 2 rows of t1 and t2. It can be adjusted for your data.

library(data.table)
data<-structure(list(A = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = c("1", 
                "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", 
                "14", "15", "16", "17", "18", "19", "20", "21", "22", "23", "24", 
                "25"), class = "factor"), T = c(0.04, 0.08, 0.12, 0.16, 0.2, 
                0.24), X = c(464.4, 464.4, 464.4, 464.4, 464.4, 464.4), Y = c(418.5, 
                        418.5, 418.5, 418.5, 418.5, 418.5), V = c(0, 0, 0, 0, 0, 0), 
                GD = c(0, 0, 0, 0, 0, 0), ND = c(NA, 0, 0, 0, 0, 0), ND2 = c(NA, 
                        0, 0, 0, 0, 0), TID = c("t1","t1","t1","t2","t2","t2")), .Names = c("A", 
                "T", "X", "Y", "V", "GD", "ND", "ND2", "TID"), row.names = c(NA, 
                6L), class = "data.frame")
dt<-data.table(data)
dt[,head(.SD,2),by=TID]

This results in:

   TID A    T     X     Y V GD ND ND2
1:  t1 1 0.04 464.4 418.5 0  0 NA  NA
2:  t1 1 0.08 464.4 418.5 0  0  0   0
3:  t2 1 0.16 464.4 418.5 0  0  0   0
4:  t2 1 0.20 464.4 418.5 0  0  0   0

and can be changed back to a data frame if desired by changing the last line to

as.data.frame(dt[,head(.SD,2),by=TID])

Here is another solution in base:

do.call(rbind, by(NM, NM$TID, head, 80))
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top