Question

Background

Using R to predict the next values in a series.

Problem

The following code generates and plots a model for a curve with some uniform noise:

slope = 0.55
offset = -0.5
amplitude = 0.22
frequency = 3
noise = 0.75
x <- seq( 0, 200 )
y <- offset + (slope * x / 100) + (amplitude * sin( frequency * x / 100 ))
yn <- y + (noise * runif( length( x ) ))

gam.object <- gam( yn ~ s( x ) + 0 )
plot( gam.object, col = rgb( 1.0, 0.392, 0.0 ) )
points( x, yn, col = rgb( 0.121, 0.247, 0.506 ) )

The model reveals the trend, as expected. The trouble is predicting subsequent values:

p <- predict( gam.object, data.frame( x=201:210 ) )

The predictions do not look correct when plotted:

df <- data.frame( fit=c( fitted( gam.object ), p ) )
plot( seq( 1:211 ), df[,], col="blue" )
points( yn, col="orange" )

The predicted values (from 201 onwards) appear to be too low.

Questions

  1. Are the predicted values, as shown, actually the most accurate predictions?
  2. If not, how can the accuracy be improved?
  3. What is a better way to concatenate the two data sets (fitted.values( gam.object ) and p)?
Was it helpful?

Solution

  1. The simulated data is weird, because all the errors you add to the "true" y are greater than 0. (runif creates numbers on [0,1], not [-1,1].)
  2. The problem disappears when an intercept term in the model is allowed.

For example:

gam.object2 <- gam( yn ~ s( x ))
p2 <- predict( gam.object2, data.frame( x=201:210 ))
points( 1:211, c( fitted( gam.object2 ), p2), col="green")

The reason for the systematic underestimation in the model without intercept could be that gam uses a sum-to-zero constraint on the estimated smooth functions. I think point 2 answers your first and second questions.

Your third question needs clarification because a gam-object is not a data.frame. The two data types do not mix.

A more complete example:

slope = 0.55
amplitude = 0.22
frequency = 3
noise = 0.75
x <- 1:200
y <- (slope * x / 100) + (amplitude * sin( frequency * x / 100 ))
ynoise <- y + (noise * runif( length( x ) ))

gam.object <- gam( ynoise ~ s( x ) )
p <- predict( gam.object, data.frame( x = 1:210 ) )

plot( p, col = rgb( 0, 0.75, 0.2 ) )
points( x, ynoise, col = rgb( 0.121, 0.247, 0.506 ) )
points( fitted( gam.object ), col = rgb( 1.0, 0.392, 0.0 ) )
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top