Pergunta

Suppose I have:

R> str(data)
'data.frame':   4 obs. of  2 variables:
 $ datetime: Factor w/ 4 levels "2011-01-05 09:30:00.001",..: 1 2 3 4
 $ price   : num  18.3 18.3 18.3 18.3

R> data
                 datetime price
1 2011-01-05 09:30:00.001 18.31
2 2011-01-05 09:30:00.321 18.33
3 2011-01-05 09:30:01.511 18.33
4 2011-01-05 09:30:02.192 18.34

When I try to load this into an xts object the timestamps are subtly altered:

R> x <- xts(data[-1], as.POSIXct(strptime(data$datetime, '%Y-%m-%d %H:%M:%OS')))
R> str(x)
An ‘xts’ object from 2011-01-05 09:30:00.000 to 2011-01-05 09:30:02.191 containing:
  Data: num [1:4, 1] 18.3 18.3 18.3 18.3
 - attr(*, "dimnames")=List of 2
  ..$ : NULL
  ..$ : chr "price"
  Indexed by objects of class: [POSIXct,POSIXt] TZ: 
  xts Attributes:  
 NULL

 R> x
                         price
 2011-01-05 09:30:00.000 18.31
 2011-01-05 09:30:00.321 18.33
 2011-01-05 09:30:01.510 18.33
 2011-01-05 09:30:02.191 18.34

You'll notice that the timestamps have been altered. The first entry now occurs at 09:30:00.000 instead of what the original data said, 09:30:00.001. The third and fourth rows are also incorrect.

What's causing this? Am I doing something fundamentally wrong? I've tried various incantations to get the data into an xts object and they all seem to exhibit this behavior.

EDIT: Add sessionInfo()

R> sessionInfo()
R version 2.13.1 (2011-07-08)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8       
 [4] LC_COLLATE=C               LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] xts_0.8-2 zoo_1.7-4

loaded via a namespace (and not attached):
[1] grid_2.13.1     lattice_0.19-30 tools_2.13.1   

EDIT 2: If I modify my source data to be microsecond precision as follows:

datetime,price
2011-01-05 09:30:00.001000,18.31
2011-01-05 09:30:00.321000,18.33
2011-01-05 09:30:01.511000,18.33
2011-01-05 09:30:02.192000,18.34

And then load it so I have:

R> test
                    datetime price
1 2011-01-05 09:30:00.001000 18.31
2 2011-01-05 09:30:00.321000 18.33
3 2011-01-05 09:30:01.511000 18.33
4 2011-01-05 09:30:02.192000 18.34

And then, finally, convert it into an xts object and set the index format:

R> x <- xts(test[,-1], as.POSIXct(strptime(test$datetime, '%Y-%m-%d %H:%M:%OS')))
R> indexFormat(x) <- '%Y-%m-%d %H:%M:%OS6'
R> x
                            [,1]
2011-01-05 09:30:00.000999 18.31
2011-01-05 09:30:00.321000 18.33
2011-01-05 09:30:01.510999 18.33
2011-01-05 09:30:02.191999 18.34

You can see the effect as well. I was hoping that adding the extra precision would help, but unfortunately it does not.

EDIT 3: Please see @DWin's answer for an end-to-end test case that reproduces this behavior.

EDIT 4: The behavior does not appear to be millisecond oriented. The following shows the same altered result of a microsecond resolution timestamp. If I change my input data to:

R> data
                    datetime price
1 2011-01-05 09:30:00.001001 18.31
2 2011-01-05 09:30:00.321001 18.33
3 2011-01-05 09:30:01.511001 18.33
4 2011-01-05 09:30:02.192005 18.34

And then create an xts object:

R> x <- xts(data[-1], 
            as.POSIXct(strptime(as.character(data$datetime), '%Y-%m-%d %H:%M:%OS')))
R> indexFormat(x) <- '%Y-%m-%d %H:%M:%OS6'
R> x
                           price
2011-01-05 09:30:00.001000 18.31
2011-01-05 09:30:00.321001 18.33
2011-01-05 09:30:01.511001 18.33
2011-01-05 09:30:02.192004 18.34

EDIT 5: It would appear to be a floating point precision issue. Observe:

R> t <- as.POSIXct("2011-01-05 09:30:00.001001")
R> t
[1] "2011-01-05 09:30:00.001 CST"
R> as.numeric(t)
[1] 1294241400.0010008812

This exhibits the error behavior, and is consistent with the example in EDIT 4. However, using an example that didn't show the error:

R> t <- as.POSIXct("2011-01-05 09:30:01.511001")
R> t
[1] "2011-01-05 09:30:01.511001 CST"
R> as.numeric(t)
[1] 1294241401.5110011101

It seems as if xts or some underlying component is rounding down rather than to the nearest?

Foi útil?

Solução

It seems the problem is only in printing. Using the OP's original data:

ind <- as.POSIXct(strptime(data$datetime, '%Y-%m-%d %H:%M:%OS'))
as.numeric(ind)*1e6  # as expected
# [1] 1294241400001000 1294241400321000 1294241401511000 1294241402192000
ind  # wrong
# [1] "2011-01-05 09:30:00.000 CST" "2011-01-05 09:30:00.321 CST"
# [3] "2011-01-05 09:30:01.510 CST" "2011-01-05 09:30:02.191 CST"
x <- xts(data[-1], ind)
x  # wrong
#                         price
# 2011-01-05 09:30:00.000 18.31
# 2011-01-05 09:30:00.321 18.33
# 2011-01-05 09:30:01.510 18.33
# 2011-01-05 09:30:02.191 18.34
as.numeric(index(x))*1e6  # but the underlying index values are as expected
# [1] 1294241400001000 1294241400321000 1294241401511000 1294241402192000

Outras dicas

You have your times in a factor:

R> str(data)
'data.frame':   4 obs. of  2 variables:
 $ datetime: Factor w/ 4 levels "2011-01-05 09:30:00.001",..: 1 2 3 4
 [...]

That is not the best place to start. You need to convert to character. Hence instead of

x <- xts(data[-1], as.POSIXct(strptime(data$datetime, '%Y-%m-%d %H:%M:%OS')))

I would suggest

x <- xts(data[-1], 
         order.by=as.POSIXct(strptime(as.character(data$datetime), 
                                      '%Y-%m-%d %H:%M:%OS')))   

In my experience, the as.character() around a factor is critical. Factors are powerful for modeling, they are however a bit of a nuisance when you get them accidentally from reading data. Use stringsAsFactor=FALSE to your advantage and avoid them on data import.

Edit: So this appears to point to the strptime/strftime implementations. To make matters more interesting, R takes some of these from the operating system and reimplements some in src/main/datetime.c.

Also, pay attention to the smallest epsilon you can add to a time variable and still have R see them as equal. On my 64-bit Linux system, this happens 10^-7 :

R> sapply(seq(1, 8), FUN=function(x) identical(now, now+1/10^x)) 
[1] FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE
R> 

I post this just so people who want to explore it can have a reproducible example which shows that it happens on more than just the OP's system. as.character to the factor does not keep it from occurring.

dat <- read.table(textConnection("     datetime\tprice
 1\t2011-01-05 09:30:00.001\t18.31
 2\t2011-01-05 09:30:00.321\t18.33
 3\t2011-01-05 09:30:01.511\t18.33
 4\t2011-01-05 09:30:02.192\t18.34"), header =TRUE, sep="\t")
 as.character(dat$datetime)
#[1] "2011-01-05 09:30:00.001" "2011-01-05 09:30:00.321" "2011-01-05 09:30:01.511"
#[4] "2011-01-05 09:30:02.192"
  strptime(as.character(dat$datetime),         '%Y-%m-%d %H:%M:%OS')
#[1] "2011-01-05 09:30:00" "2011-01-05 09:30:00" "2011-01-05 09:30:01"
#[4] "2011-01-05 09:30:02"
 as.POSIXct(strptime(as.character(dat$datetime), 
                                       '%Y-%m-%d %H:%M:%OS'))
#[1] "2011-01-05 09:30:00 EST" "2011-01-05 09:30:00 EST" "2011-01-05 09:30:01 EST"
#[4] "2011-01-05 09:30:02 EST"
 x <- xts(dat[-1], 
          order.by=as.POSIXct(strptime(as.character(dat$datetime), 
                                       '%Y-%m-%d %H:%M:%OS')))
 x
####                price
2011-01-05 09:30:00 18.31
2011-01-05 09:30:00 18.33
2011-01-05 09:30:01 18.33
2011-01-05 09:30:02 18.34
indexFormat(x) <- '%Y-%m-%d %H:%M:%OS6'
x
                           price
2011-01-05 09:30:00.000999 18.31
2011-01-05 09:30:00.321000 18.33
2011-01-05 09:30:01.510999 18.33
2011-01-05 09:30:02.191999 18.34

sessionInfo()
R version 2.13.1 RC (2011-07-03 r56263)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] grid      splines   stats     graphics  grDevices utils     datasets  methods  
[9] base     

other attached packages:
 [1] xts_0.8-2       zoo_1.7-4       sculpt3d_0.2-2  RGtk2_2.20.12  
 [5] rgl_0.92.798    survey_3.24     hexbin_1.26.0   spam_0.23-0    
 [9] xtable_1.5-6    polspline_1.1.5 Ryacas_0.2-10   XML_3.4-0      
[13] rms_3.3-1       Hmisc_3.8-3     survival_2.36-9 sos_1.3-0      
[17] brew_1.0-6      lattice_0.19-30

loaded via a namespace (and not attached):
[1] cluster_1.14.0 tools_2.13.1  
Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top