Question

I have problem in subsetting times. 1) I would like to filter my data by time intervals where one is in midnight and another in midday. 2) And i need only first time that occurs in each interval.

Data frame looks like this

             DATE v
1  2007-07-28 00:41:00 1
2  2007-07-28 02:00:12 5
3  2007-07-28 02:01:19 3
4  2007-07-28 02:44:08 2
5  2007-07-28 04:02:18 3
6  2007-07-28 09:59:16 4
7  2007-07-28 11:21:32 8
8  2007-07-28 11:58:40 5
9  2007-07-28 13:20:52 4
10 2007-07-28 13:21:52 9
11 2007-07-28 14:41:32 3
12 2007-07-28 15:19:00 9
13 2007-07-29 01:01:48 2
14 2007-07-29 01:41:08 5

Result should look like this

             DATE v
2  2007-07-28 02:00:12 5
9  2007-07-28 13:20:52 4
13 2007-07-29 01:01:48 2

Reproducible code

DATE<-c("2007-07-28 00:41:00", "2007-07-28 02:00:12","2007-07-28    02:01:19", "2007-07-28 02:44:08", "2007-07-28 04:02:18","2007-07-28 09:59:16", "2007-07-28 11:21:32", "2007-07-28 11:58:40","2007-07-28 13:20:52", "2007-07-28 13:21:52", "2007-07-28 14:41:32","2007-07-28 15:19:00", "2007-07-29 01:01:48", "2007-07-29 01:41:08")

v<-c(1,5,3,2,3,4,8,5,4,9,3,9,2,5)

hyljes<-data.frame(cbind(DATE,v))

df <-subset(hyljes, format(as.POSIXct(DATE),"%H") %in% c ("01":"02","13":"14"))

There´s problem with making intervals. It allows me to subset hours "13":"14" but not for "01":"02". Is there any reasonable answers for that? And i haven´t found the way how to get only first elements from each interval.

Any help is appreciated!

Was it helpful?

Solution

Try

hyljes[c(1, head(cumsum(rle(as.POSIXlt(hyljes$DATE)$hour < 13)$lengths) + 1, -1)), ]
##                   DATE v
## 1  2007-07-28 00:41:00 1
## 9  2007-07-28 13:20:52 4
## 13 2007-07-29 01:01:48 2
  • as.POSIXlt(hyljes$DATE)$hour < 13 gives you whether time is before or after noon
  • rle(...)$lengths gives you lengths of the runs of TRUEs and FALSEs
  • cumsum of above + 1 gives you indices of first record in each run
  • head(...,-1) trims of last element
  • c(1, ...) adds back first index - which should be always be included by definition

OTHER TIPS

There are lots of little manipulations in here, but the end result gets you where you need to be:

hyljes <- [YOUR DATA]
hyljes$DATE <- as.POSIXct(hyljes$DATE, format = "%Y-%m-%d %H:%M:%S")

hyljes$hour <- strftime(hyljes$DATE, '%H')
hyljes$date <- strftime(hyljes$DATE, '%Y-%m-%d')
hyljes$am_pm <- ifelse(hyljes$hour < 12, 'am', 'pm')

mins <- ddply(hyljes, .(date, am_pm), summarise, min = min(DATE))$min

hyljes[hyljes[, 1] %in% mins, 1:2]

                  DATE v
1  2007-07-28 00:41:00 1
9  2007-07-28 13:20:52 4
13 2007-07-29 01:01:48 2
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top