Question

1st off, sorry for not having any reproducible data here, but I can't figure out how to reproduce this problem. But I will do my best to include a step wise list of what I have done as well as any pertinent information. Any thoughts on troubleshooting would be greatly appreciated.

My problem is this:

I have a large time series data set that I read in to R. I eventually transform to zoo, but for now I keep it as a data frame. Using read.csv I read the data into R. Using str to look at the data I get this:

> str(Met)
'data.frame':   568354 obs. of  18 variables:
 $ time_local                          : Factor w/ 568354 levels "2006-08-06 03:15:00",..: 1 2 3     4 5 6 7 8 9 10 ...

Note- Met$time_local is what I am concerned with and I've removed all other columns of the str readout.

If I search for duplicates using

Dup<-Met$time_local[duplicated(Met$time_local)]

I get nothing

str(Dup)
Factor w/ 568354 levels "2006-08-06 03:15:00",..: 

If I transform the Date/Time data to a POSIXlt or POSIXct object using strptime

MetStrp<-strptime(Met$time_local, "%Y-%m-%d %H:%M:%S")
str(MetStrp)
POSIXlt[1:568354], format: "2006-08-06 03:15:00" "2006-08-06 03:20:00" "2006-08-06 03:25:00" ...

and then search fro duplicates

Dup<-MetStrp[duplicated(MetStrp)]
> head(Dup)
[1] "2007-03-11 02:00:00" "2007-03-11 02:05:00" "2007-03-11 02:10:00"
[4] "2007-03-11 02:15:00" "2007-03-11 02:20:00" "2007-03-11 02:25:00"
> str(Dup)
 POSIXlt[1:60], format: "2007-03-11 02:00:00" "2007-03-11 02:05:00" "2007-03-11 02:10:00" ...

I now have 60 duplicates (which throw off things later when I create a zoo object).

And interestingly, if I change the POSIXlt format to POSIXct

ct<-as.POSIXct(MetStrp)
str(ct)
POSIXct[1:568354], format: "2006-08-06 03:15:00" "2006-08-06 03:20:00" "2006-08-06 03:25:00" ...

I get the same duplicates but offset by an hour

Dup<-ct[duplicated(ct)]
> head(Dup)
[1] "2007-03-11 01:00:00 PST" "2007-03-11 01:05:00 PST" "2007-03-11 01:10:00 PST"
[4] "2007-03-11 01:15:00 PST" "2007-03-11 01:20:00 PST" "2007-03-11 01:25:00 PST"
> str(Dup)
 POSIXct[1:60], format: "2007-03-11 01:00:00" "2007-03-11 01:05:00" "2007-03-11 01:10:00" ...

If I choose to look for duplicate locations using

Dup_loc<-which(duplicated(MetStrp) | duplicated(MetStrp,fromLast=TRUE))

I get 120 duplicate locations. Which end up being the combination of the POSIXlt and POSIXct duplicates.

str(Dup_loc)
int [1:120] 62470 62471 62472 62473 62474 62475 62476 62477 62478 62479 ...

With the POSIXct dates always being from hours 1-2, and the POSIClt dates always being from hours 2-3

To see the duplicates:

Test<-MetStrp[Dup_loc]


>Test
[1] "2007-03-11 01:00:00" "2007-03-11 01:05:00" "2007-03-11 01:10:00"
[4] "2007-03-11 01:15:00" "2007-03-11 01:20:00" "2007-03-11 01:25:00"
[7] "2007-03-11 01:30:00" "2007-03-11 01:35:00" "2007-03-11 01:40:00"
[10] "2007-03-11 01:45:00" "2007-03-11 01:50:00" "2007-03-11 01:55:00"
[13] "2007-03-11 02:00:00" "2007-03-11 02:05:00" "2007-03-11 02:10:00"
[16] "2007-03-11 02:15:00" "2007-03-11 02:20:00" "2007-03-11 02:25:00"
[19] "2007-03-11 02:30:00" "2007-03-11 02:35:00" "2007-03-11 02:40:00"
[22] "2007-03-11 02:45:00" "2007-03-11 02:50:00" "2007-03-11 02:55:00"
[25] "2008-03-09 01:00:00" "2008-03-09 01:05:00" "2008-03-09 01:10:00"
[28] "2008-03-09 01:15:00" "2008-03-09 01:20:00" "2008-03-09 01:25:00"
[31] "2008-03-09 01:30:00" "2008-03-09 01:35:00" "2008-03-09 01:40:00"
[34] "2008-03-09 01:45:00" "2008-03-09 01:50:00" "2008-03-09 01:55:00"
[37] "2008-03-09 02:00:00" "2008-03-09 02:05:00" "2008-03-09 02:10:00"
[40] "2008-03-09 02:15:00" "2008-03-09 02:20:00" "2008-03-09 02:25:00"
[43] "2008-03-09 02:30:00" "2008-03-09 02:35:00" "2008-03-09 02:40:00"
[46] "2008-03-09 02:45:00" "2008-03-09 02:50:00" "2008-03-09 02:55:00"
[49] "2009-03-08 01:00:00" "2009-03-08 01:05:00" "2009-03-08 01:10:00"
[52] "2009-03-08 01:15:00" "2009-03-08 01:20:00" "2009-03-08 01:25:00"
[55] "2009-03-08 01:30:00" "2009-03-08 01:35:00" "2009-03-08 01:40:00"
[58] "2009-03-08 01:45:00" "2009-03-08 01:50:00" "2009-03-08 01:55:00"
[61] "2009-03-08 02:00:00" "2009-03-08 02:05:00" "2009-03-08 02:10:00"
[64] "2009-03-08 02:15:00" "2009-03-08 02:20:00" "2009-03-08 02:25:00"
[67] "2009-03-08 02:30:00" "2009-03-08 02:35:00" "2009-03-08 02:40:00"
[70] "2009-03-08 02:45:00" "2009-03-08 02:50:00" "2009-03-08 02:55:00"
[73] "2010-03-14 01:00:00" "2010-03-14 01:05:00" "2010-03-14 01:10:00"
[76] "2010-03-14 01:15:00" "2010-03-14 01:20:00" "2010-03-14 01:25:00"
[79] "2010-03-14 01:30:00" "2010-03-14 01:35:00" "2010-03-14 01:40:00"
[82] "2010-03-14 01:45:00" "2010-03-14 01:50:00" "2010-03-14 01:55:00"
[85] "2010-03-14 02:00:00" "2010-03-14 02:05:00" "2010-03-14 02:10:00"
[88] "2010-03-14 02:15:00" "2010-03-14 02:20:00" "2010-03-14 02:25:00"
[91] "2010-03-14 02:30:00" "2010-03-14 02:35:00" "2010-03-14 02:40:00"
[94] "2010-03-14 02:45:00" "2010-03-14 02:50:00" "2010-03-14 02:55:00"
[97] "2011-03-13 01:00:00" "2011-03-13 01:05:00" "2011-03-13 01:10:00"
[100] "2011-03-13 01:15:00" "2011-03-13 01:20:00" "2011-03-13 01:25:00"
[103] "2011-03-13 01:30:00" "2011-03-13 01:35:00" "2011-03-13 01:40:00"
[106] "2011-03-13 01:45:00" "2011-03-13 01:50:00" "2011-03-13 01:55:00"
[109] "2011-03-13 02:00:00" "2011-03-13 02:05:00" "2011-03-13 02:10:00"
[112] "2011-03-13 02:15:00" "2011-03-13 02:20:00" "2011-03-13 02:25:00"
[115] "2011-03-13 02:30:00" "2011-03-13 02:35:00" "2011-03-13 02:40:00"
[118] "2011-03-13 02:45:00" "2011-03-13 02:50:00" "2011-03-13 02:55:00"

As far as I can see, I don't see any duplicate time stamps above. So I'm not sure what is the matter, but something is amiss.

And as far as I can tell, all I have done is transformed a factor data set into a time based data set. So I have no idea why I am getting a duplicate error in zoo, and finding duplicates using duplicatedwhen there does not appear to be any.

Again, any thoughts on this matter would be greatly appreciated.

Was it helpful?

Solution

I have three words for you: " daylight savings time". I predict on the basis of the evidence offered that in your locale that March 11 2007 was the date when the Daylight Savings Time shift occurred. Notice that they occur in the time frame of 1-2 AM.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top