Question

I am trying to use GAM smoothing in ggplot2. According to this conversation and this code, ggplot2 loads mgcv package used for general additive models only if n >= 1000. Otherwise a user has to manually load the package. As far as I understand this example code from the conversation should do the smoothing using geom_smooth(method="gam", formula = y ~ s(x, bs = "cs")):

library(ggplot2)
dat.large <- data.frame(x=rnorm(10000), y=rnorm(10000))
ggplot(dat.large, aes(x=x, y=y)) + geom_smooth() 

But I get an error:

geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.
Error in s(x, bs = "cs") : object 'x' not found

The same error happens if I try following:

ggplot(dat.large, aes(x=x, y=y)) + geom_point() + geom_smooth(method="gam", formula = y ~ s(x, bs = "cs"))

But for example linear model would work:

ggplot(dat.large, aes(x=x, y=y)) + geom_smooth(method = "lm", formula = y ~ x)

What am I doing wrong here?

My R and package versions should be up-to-date:

R version 3.0.3 (2014-03-06)
Platform: x86_64-apple-darwin10.8.0 (64-bit)

other attached packages: mgcv_1.7-29  ggplot2_0.9.3.1 
Was it helpful?

Solution

The problem was that I had summary function assigned as s in my .Rprofile. This confused the s() argument in gam function. I guess one should avoid assigning too many shorthands. After removal of that assignment everything works as it should.

One way to avoid making packages confused by .Rprofile shorthands is to assign them to a hidden environment and attach that environment in .Rprofile. For example (the code is borrowed from here):

.env <- new.env()
.env$s <- base::summary
attach(.env)

Then s would work as summary until loading mgcv

dat.large <- data.frame(x=rnorm(10000), y=rnorm(10000))
s(dat.large)
       x                   y            
 Min.   :-3.823756   Min.   :-4.531882  
 1st Qu.:-0.683730   1st Qu.:-0.687335  
 Median :-0.006945   Median :-0.009993  
 Mean   :-0.010285   Mean   :-0.000491  
 3rd Qu.: 0.665435   3rd Qu.: 0.672098  
 Max.   : 3.694357   Max.   : 3.647825  

And would change meaning after loading the package, but would not confuse the package functionality:

ggplot(dat.large, aes(x=x, y=y)) + geom_smooth() # works
s(dat.large)
$term
[1] "dat.large"

$bs.dim
[1] -1

$fixed
[1] FALSE

$dim
[1] 1

$p.order
[1] NA

$by
[1] "NA"

$label
[1] "s(dat.large)"

$xt
NULL

$id
NULL

$sp
NULL

attr(,"class")
[1] "tp.smooth.spec"

EDIT Workaround above did not seem to work in my actual code, which is much more complicated. If you want to keep that summary shorthand, the easiest workaround is just to place rm(s) before loading mgcv.

OTHER TIPS

My problem was caused by a corrupt version of mgcv. Reinstalling this package solved the issue:

install.packages("mgcv")

Versions:

  • Linux Mint 18 / 18.1
  • R 3.4.0

I had the same problem on two different Linux machines.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top