Question

I have a data frame that looks like this:

structure(list(A = c(70, 70, 70, 70, 70, 70), T = c(0.1, 0.2, 
0.3, 0.4, 0.5, 0.6), X = c(434.01, 434.01, 434.75, 434.75, 434.75, 
434.01), Y = c(454.92, 454.92, 454.92, 454.92, 454.18, 454.92
), V = c(0, 0, 21.128, 0, 14.94, 14.94), thetarad = c(0.151841552716899, 
0.151841552716899, 0.150990672182432, 0.150990672182432, 0.150177486839524, 
0.151841552716899), thetadeg = c(8.69988012340509, 8.69988012340509, 
8.6511282599214, 8.6511282599214, 8.6045361718215, 8.69988012340509
)), .Names = c("A", "T", "X", "Y", "V", "thetarad", "thetadeg"
), row.names = 1423:1428, class = "data.frame")

I want to subset specific time points in R with intervals of 30 sec. I can do this by manually subsetting each time point that I want:

a1=subset(binA, T==0.1)
a2=subset(binA, T==30)
a3=subset(binA, T==60)
a4=subset(binA, T==90)
a5=subset(binA, T==120)
a6=subset(binA, T==150)
a7=subset(binA, T==180)
a8=subset(binA, T==210)
a9=subset(binA, T==240)
a10=subset(binA, T==270)
a11=subset(binA, T==300)
a12=subset(binA, T==330)
a13=subset(binA, T==360)
a14=subset(binA, T==390)
a15=subset(binA, T==420)
a16=subset(binA, T==450)
a17=subset(binA, T==480)
a18=subset(binA, T==510)
a19=subset(binA, T==540)
a20=subset(binA, T==570)
a21=subset(binA, T==599.5)

I tried subsetting using sapplyand the seq function but got confusing results. I also want to count the unique A in each subset of data. I also know I can do this using the count function in plyrpackage.

a1=count(unique(subset(binA, T==0.1)))

but count will work with one data frame and not multiple ones (correct me if I am wrong). I also want to take the means of thetadeg for each subset (this should be easy for sapply in one data frame only). So I need help on how to write a function with specific seq points.

I know this problem is trivial but help would be appreciated.

Thanks

Was it helpful?

Solution

Assuming data is in df data frame then, try this:

sapply(c(0.1,seq(30,599,30),599.5),
       function(x)
         length(unique(df[ df$T==x, "A"])))

OTHER TIPS

You should be able to use the following code to get what you want. This doesn't look for 0.1 and 599.5 but that should be easy to manipulate.

timeintervals <- seq(0,600, 30)
for(i in 1:length(timeintervals)
{
  # create the subsets for each time interval
  assign(
    paste0("a",i),
    df[df$T == timeintervals[i],]
    )

  # get all unique As
  assign(
    paste0("b",i),
    unique(df[df$T == timeintervals[i],"A"])
  )

}

If purpose is just to get average, unique count etc, you don't need to subset.and one more thing, id T factor is is continuous and you need to make the bins? here I am assuming factor

here is one approach with plyr

ddply(df,~T,summarise,l=length(unique((A))))
ddply(df,~T,summarise,m=mean(thetadeg))

The function I think you want is split:

 subsetted.by.T <- split(dfrm, dfrm$T)
lapply(subsetted.by.T, nrow)

$`0.1`
[1] 1

$`0.2`
[1] 1

$`0.3`
[1] 1

$`0.4`
[1] 1

$`0.5`
[1] 1

$`0.6`
[1] 1

> subsetted.by.T[[1]]
      A   T      X      Y V  thetarad thetadeg
1423 70 0.1 434.01 454.92 0 0.1518416  8.69988

If you want to name these individual items, then the names<- function would be appropriate:

names(subsetted.by.T) <- paste0("a", seq(length(subsetted.by.T) ) )

If the "T" column were somewhat irregular in its values, then perhaps using cut to create categories at regular breaks would be useful for the purpose of splitting. The question might be clarified if "T" were actually a time value. At the moment it's a "numeric" value, but there are cut methods for datetime classes.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top