Error usando `` loess.smooth` pero no loess` o `lowess`

https://stackoverflow.com/questions/4645682

09-10-2019
|

Pregunta

necesito para suavizar algunos datos simulados, pero en ocasiones se encuentran con problemas cuando las ordenadas simulados a ser suavizadas son en su mayoría el mismo valor. Aquí hay un pequeño ejemplo reproducible del caso más simple.

> x <- 0:50
> y <- rep(0,51)
> loess.smooth(x,y)
Error in simpleLoess(y, x, w, span, degree, FALSE, FALSE, normalize = FALSE,  : 
   NA/NaN/Inf in foreign function call (arg 1)

loess(y~x), lowess(x,y) y su análogo en MATLAB producen los resultados esperados y sin error en este ejemplo. Estoy usando loess.smooth aquí porque necesito las estimaciones evaluados en un número determinado de puntos. Según la documentación, creo loess.smooth y loess están utilizando las mismas funciones de estimación, pero la primera es una "función auxiliar" para manejar los puntos de evaluación. El error parece venir de una función C:

> traceback()
3: .C(R_loess_raw, as.double(pseudovalues), as.double(x), as.double(weights), 
   as.double(weights), as.integer(D), as.integer(N), as.double(span), 
   as.integer(degree), as.integer(nonparametric), as.integer(order.drop.sqr), 
   as.integer(sum.drop.sqr), as.double(span * cell), as.character(surf.stat), 
   temp = double(N), parameter = integer(7), a = integer(max.kd), 
   xi = double(max.kd), vert = double(2 * D), vval = double((D + 
       1) * max.kd), diagonal = double(N), trL = double(1), 
   delta1 = double(1), delta2 = double(1), as.integer(0L))
2: simpleLoess(y, x, w, span, degree, FALSE, FALSE, normalize = FALSE, 
   "none", "interpolate", control$cell, iterations, control$trace.hat)
1: loess.smooth(x, y)

loess también llama simpleLoess, pero con lo que parece ser diferentes argumentos. Por supuesto, si varía bastante de los valores de y ser distinto de cero, se ejecuta loess.smooth sin error, pero necesito el programa a ejecutar, incluso en el caso más extremo.

Con suerte, alguien me puede ayudar con uno y / o todos los siguientes:

Entender por qué sólo loess.smooth, y no las otras funciones, produce este error y encontrar una solución para este problema.
Encontrar un trabajo en torno usando loess pero sigue evaluando la estimación a un número determinado de puntos que pueden ser diferentes a partir del vector x. Por ejemplo, puede ser que quiera utilizar sólo x <- seq(0,50,10) en el suavizado, pero evaluar la estimación a la x <- 0:50. Por lo que yo sé, utilizando predict con una nueva trama de datos no va a manejar adecuadamente esta situación, pero por favor, hágamelo saber si estoy perdiendo algo allí.
manejar el error de una manera que no se detiene el programa de pasar a la siguiente base de datos simulados.

Gracias de antemano por cualquier ayuda en este problema.

Solución

For part 1: This took a bit of tracking down, but if you do:

loess.smooth(x, y, family = "guassian")

the model will fit. This arises due to the different defaults of loess.smooth and loess; the former has family = c("symmetric", "gaussian") whilst the latter has it reversed. If you trawl through the code for loess and loess.smooth, you'll see that when family = "gaussian" iterations is set to 1. Otherwise it takes the value loess.control()$iterations. If you do iterations in simpleLoess, the following function call returns a vector of NaN:

pseudovalues <- .Fortran(R_lowesp, as.integer(N), as.double(y), 
            as.double(z$fitted.values), as.double(weights), as.double(robust), 
            integer(N), pseudovalues = double(N))$pseudovalues

Which causes the next function call to throw the error you saw:

zz <- .C(R_loess_raw, as.double(pseudovalues), as.double(x), 
            as.double(weights), as.double(weights), as.integer(D), 
            as.integer(N), as.double(span), as.integer(degree), 
            as.integer(nonparametric), as.integer(order.drop.sqr), 
            as.integer(sum.drop.sqr), as.double(span * cell), 
            as.character(surf.stat), temp = double(N), parameter = integer(7), 
            a = integer(max.kd), xi = double(max.kd), vert = double(2 * 
                D), vval = double((D + 1) * max.kd), diagonal = double(N), 
            trL = double(1), delta1 = double(1), delta2 = double(1), 
            as.integer(0L))

This all relates to robust fitting in Loess (the method). If you don't want/need a robust fit, use family = "gaussian" in your loess.smooth call.

Also, note that the defaults for loess.smooth differ from those of loess, e.g. for 'span' and 'degree'. So carefully check out what models you want to fit and adjust the relevant function's defaults.

For part 2:

DF <- data.frame(x = 0:50, y = rep(0,51))
mod <- loess(y ~ x, data = DF)
pred <- predict(mod, newdata = data.frame(x = c(-1, 10, 15, 55)))
mod2 <- loess(y ~ x, data = DF, control = loess.control(surface = "direct"))
pred2 <- predict(mod2, newdata = data.frame(x = c(-1, 10, 15, 55)))

Which gives:

> pred
 1  2  3  4 
NA  0  0 NA 
> pred2
1 2 3 4 
0 0 0 0

The default won't extrapolate if that was what you meant. I don't see what the problem with using predict here is at all, in fact.

For part 3: Look at ?try and ?tryCatch which you can wrap round the loess fitting function (loess.smooth say), which will allow computations to continue if an error in loess.smooth is encountered.

You will need to handle the output of try or tryCatch by including something like (if you are doing this in a loop:

mod <- try(loess.smooth(x, y))
if(inherits(mod, "try-error"))
    next
## if here, model work, do something with `mod`

I would probably combine try or tryCatch with fitting via loess and using predict for such a problem.

Otros consejos

This is the first time I encountered these functions so I can't help you that much, but can't this have something to do with having a variance of 0 in the y-values? Now you try to estimate a smooth line from data that already is as smooth as it gets, and this does work:

x <- 0:50
y <- c(rep(0,25),rep(1,26))
loess.smooth(x,y)

Licenciado bajo: CC-BY-SA con atribución

No afiliado a StackOverflow