This is what my data frame looks like.

I want to create time intervals of 15 mins or 30 mins and have the sum of No_Words for all timestamps in that time interval. I need this to plot the average number of words per time interval.

How should I go about it?

Also, I would really like to know if a solution is possible using sqldf package.

               Time                 No_Words
1   2013-11-17 13:37:00                    6    
2   2013-11-17 13:37:00                   16    
3   2013-11-17 13:37:00                   18    
4   2013-11-17 13:37:00                   12    
5   2013-11-17 14:03:00                    5    
6   2013-11-17 14:03:00                   20    
7   2013-11-17 14:04:00                    4    
8   2013-11-17 17:21:00                   39    
9   2013-11-17 22:48:00                   19    
10  2013-11-17 22:48:00                   12    
有帮助吗?

解决方案 2

# generate example data, 30 min intervals
set.seed(1)
dateseq <- seq(as.POSIXct("2013-11-17"), as.POSIXct("2013-11-18"), by="min")
df <- data.frame(Time=dateseq[sample(1:length(dateseq), 500)],
                 No_Words=sample(1:100, 500, replace=T))
groups <- cut.POSIXt(df$Time, breaks="30 min")

The hard way using sqldf:

library(sqldf)
df$groups <- groups
agg <- sqldf("select groups, avg(No_Words) from df group by groups", row.names=T)
row.names(agg) <- agg[,1]
agg <- as.matrix(agg)
class(agg) <- "numeric"
par(mar=c(2,10,0,0)); barplot(agg[,2], horiz=TRUE, las=1)

The easy way using e.g. tapply:

agg <- tapply(df$No_Words, list(groups), mean)
par(mar=c(2,10,0,0)); barplot(agg, horiz=TRUE, las=1)

其他提示

sqldf Here is an sqldf solution where the input data frame is DF:

library(sqldf)

min15 <- 15 * 60 # in seconds
ans <- fn$sqldf("select
       t.Time - t.Time % $min15 as Time, 
       sum(t.No_Words) as No_Words
    from DF t 
    group by Time")
plot(No_Words ~ Time, ans, type = "o")

giving:

> ans
                 Time No_Words
1 2013-11-17 13:30:00       52
2 2013-11-17 14:00:00       29
3 2013-11-17 17:15:00       39
4 2013-11-17 22:45:00       31

With dense grid If a dense grid is wanted then we will need a grid data frame, G, which we join with the prior ans (Note that sqldf pulls in the chron package so we use its trunc function):

# create grid G
rng <- range(as.POSIXct(trunc(as.chron(DF$Time), 15 / (24 * 60))))
G <- data.frame(Time = seq(rng[1], rng[2], by = min15))

ans2 <- sqldf("select Time, coalesce(No_Words, 0) as No_Words 
         from (select * from G left join ans using(Time))")
plot(No_Words ~ Time, ans2, type = "o")

The first few rows of ans2 are:

> head(ans2)

                 Time No_Words
1 2013-11-17 13:30:00       52
2 2013-11-17 13:45:00        0
3 2013-11-17 14:00:00       29
4 2013-11-17 14:15:00        0
5 2013-11-17 14:30:00        0
6 2013-11-17 14:45:00        0

zoo We also show a zoo solution:

library(zoo)
library(chron)
FUN <- function(x) as.POSIXct(trunc(as.chron(x), 15 / (24 * 60)))
z <- read.zoo(DF, FUN = FUN, aggregate = sum)
plot(z)

which gives for z:

> z
2013-11-17 13:30:00 2013-11-17 14:00:00 2013-11-17 17:15:00 2013-11-17 22:45:00 
             52                  29                  39                  31 

Note: We used this data and, in particular, Time is of class "POSIXct":

Lines<- " Time            No_Words
1   2013-11-17 13:37:00                    6    
2   2013-11-17 13:37:00                   16    
3   2013-11-17 13:37:00                   18    
4   2013-11-17 13:37:00                   12    
5   2013-11-17 14:03:00                    5    
6   2013-11-17 14:03:00                   20    
7   2013-11-17 14:04:00                    4    
8   2013-11-17 17:21:00                   39    
9   2013-11-17 22:48:00                   19    
10  2013-11-17 22:48:00                   12   
"

raw <- read.table(text = Lines, skip = 1)
DF <- data.frame(Time = as.POSIXct(paste(raw$V2, raw$V3)), No_Words = raw$V4)

This answer is not with sqldf, but with the base R functions aggregate and cut:

## If your "Time" column is not an actual time object, 
##    convert it to one before proceeding.
mydf$Time <- as.POSIXct(mydf$Time)

cut can create time bins. We'll use that to do our aggregation. You could use the formula notation, but I've used the list approach so that it is easy to specify the resulting column names:

## Aggregate data in 30 minute chunks
aggregate(list(No_Words = mydf$No_Words), 
          list(Time = cut(mydf$Time, "30 min")), FUN = mean)
#                  Time No_Words
# 1 2013-11-17 13:37:00 11.57143
# 2 2013-11-17 17:07:00 39.00000
# 3 2013-11-17 22:37:00 15.50000

## Aggregate data into 15 minute chunks
aggregate(list(No_Words = mydf$No_Words), 
          list(Time = cut(mydf$Time, "15 min")), FUN = mean)
#                  Time  No_Words
# 1 2013-11-17 13:37:00 13.000000
# 2 2013-11-17 13:52:00  9.666667
# 3 2013-11-17 17:07:00 39.000000
# 4 2013-11-17 22:37:00 15.500000
许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top