Pregunta

I'm trying to fit a linear function for whom I know the slope to this data in R:

> flN
   eCenter      eLow     eHigh     arrivalTime_4_8 timeError_4_8  vCenter     vLow    vHigh
1  56.4997  60.25967  52.73972 2012-05-17 02:01:44      41.01283 2.298964 2.236939 2.367869
2 105.5787 108.02476 103.13254 2012-05-17 01:57:37      27.06803 1.787009 1.771748 1.802860
3 164.0613 166.50385 161.61884 2012-05-17 01:49:47      60.42015 1.530334 1.522997 1.537860
4 226.9886 229.78374 224.19346 2012-05-17 01:49:20      50.83871 1.386016 1.381233 1.390904

My Y is arrivalTime_4_8 and the error in Y is timeError_4_8. My X is vCenter and the error vHigh and vLow for the upper and lower values.

So, have 3 problems about this (I'll put them in order of importance):

1 - How do I fit a linear function with a set slope? I tried to use lm, but I can't find a way to put the slope there:

lm(formula = arrivalTime_4_8 ~ vCenter, data = flN)

2 - I know I must use weights = 1/sqrt(error) to get a weighted fit, but how do I do that with a timeframe axis?

3 - How do I calculate the weights to use if I have the error measurements in 2 axis? A simple squared sum?

> dput(flN)
structure(list(eCenter = c(56.4996953402131, 105.578651367445, 
164.061343347697, 226.988601498086), eLow = c(60.2596687117468, 
108.024759195324, 166.503850666149, 229.78374170578), eHigh = c(52.7397219686794, 
103.132543539565, 161.618836029245, 224.193461290392), arrivalTime_4_8 = structure(c(1337216504.88343, 
1337216257.43636, 1337215787.45439, 1337215760.18075), tzone = "", class = c("POSIXct", 
"POSIXt")), timeError_4_8 = c(41.0128309582924, 27.0680250428967, 
60.4201539087451, 50.8387114506802), vCenter = c(2.29896400265163, 
1.78700866407098, 1.53033400915538, 1.38601563560752), vLow = c(2.23693860876912, 
1.77174836673712, 1.52299693760316, 1.38123278889559), vHigh = c(2.36786898717605, 
1.80286021104626, 1.5378599964982, 1.39090403398638)), .Names = c("eCenter", 
"eLow", "eHigh", "arrivalTime_4_8", "timeError_4_8", "vCenter", 
"vLow", "vHigh"), row.names = c(NA, -4L), class = "data.frame")
¿Fue útil?

Solución

Responding to @jbssm comment, @BenBolker solution seems to give very reasonable results:

vCenter_slope <- 600.7472
flN <- transform(flN,w=1/as.numeric(timeError_4_8)^2)
# slope fixed
m1 <- lm(as.numeric(arrivalTime_4_8) ~ 1 + offset(vCenter*vCenter_slope), data = flN, weights=w)
# slope chosen by lm(...)
m0 <- lm(as.numeric(arrivalTime_4_8) ~ vCenter, data=flN, weights=w)

library(ggplot2)
ggp <- ggplot(flN)
ggp <- ggp + geom_point(aes(x=vCenter,y=arrivalTime_4_8))
ggp <- ggp + geom_errorbar(aes(x=vCenter, ymin=arrivalTime_4_8-timeError_4_8,ymax=arrivalTime_4_8+timeError_4_8),width=0)
ggp <- ggp + geom_errorbarh(aes(x=vCenter, y=arrivalTime_4_8, xmax=vHigh, xmin=vLow),height=0)
ggp <- ggp + geom_abline(slope=600.7472, intercept=coef(m1)[1])
ggp <- ggp + geom_abline(slope=coef(m0)[2], intercept=coef(m0)[1],color="red")
ggp

The red line allows lm(...) to set the slope, the black line uses fixed slope. Note that the black line is further away from the first two vCenter points than you might expect, because you're weighting inversely to error in vCenter. If this is your whole dataset, then using weights is highly questionable.

Finally, you can read up on "errors-in-variables" models here and here. After reading these, you might want to look at the MethComp package, which supports `Deming Regression' (one type of errors-in-variables regression).

Otros consejos

If you have a known slope vCenter_slope you can use an offset:

vCenter_slope <- 600.7472
flN <- transform(flN,w=1/as.numeric(timeError_4_8)^2)
m1 <- lm(as.numeric(arrivalTime_4_8) ~ 1 + offset(vCenter*vCenter_slope),
           data = flN, weights=w)
as.POSIXct(coef(m1),origin="1970-01-01")
## "2012-05-16 20:37:05 EDT"

It turns out you do have to convert to numeric if you want to use weights. The weights should be inverse-variance weights, I think, not inverse-SE weights. I don't know whether timeError_4_8 represents SE or variance here, I'm assuming SE.

As for errors in the X variable -- if you want to deal with this properly you will have to look into measurement-error models, which is a big subject. On the other hand, if all you want from the model is prediction (and not estimation ofthe true slope or correlation), then you can ignore the errors in X.

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top