The ggplot
package comes with wrappers for a number of summarizing functions in the Hmisc
package, including
mean_cl_normal
which calculates the confidence limits based on the t-distribution,
mean_cl_boot
which uses a bootstrap method that does not assume a distribution of the mean,
mean_sdl
which uses a multiple of the standard deviation (default=2).
This latter method is the same as in the answer above, but is not the 95% CL. Confidence limits based on the t-distribution are given by:
CL = t × s / √n
Where t is the appropriate quantile of the t-distribution and s is the sample standard deviation. Compare the confidence bands:
ggplot(df1, aes(x=Variable, y=Observation)) +
stat_summary(fun.data="mean_sdl", geom="line", colour="blue")+
stat_summary(fun.data="mean_sdl", mult=2, geom="errorbar",
width=0.1, linetype=2, colour="blue")+
geom_point(color="red") +
labs(title=expression(paste(bar(x)," \u00B1 ","2 * sd")))
ggplot(df1, aes(x=Variable, y=Observation)) +
geom_point(color="red") +
stat_summary(fun.data="mean_cl_normal", geom="line", colour="blue")+
stat_summary(fun.data="mean_cl_normal", conf.int=0.95, geom="errorbar",
width=0.1, linetype=2, colour="blue")+
stat_summary(fun.data="mean_cl_normal", geom="point", size=3,
shape=1, colour="blue")+
labs(title=expression(paste(bar(x)," \u00B1 ","t * sd / sqrt(n)")))
Finally, rotating this last plot using coord_flip()
generates something very close to a Forest Plot
, which is a standard method for summarizing data like yours.
ggplot(df1, aes(x=Variable, y=Observation)) +
geom_point(color="red") +
stat_summary(fun.data="mean_cl_normal", conf.int=0.95, geom="errorbar",
width=0.2, colour="blue")+
stat_summary(fun.data="mean_cl_normal", geom="point", size=3,
shape=1, colour="blue")+
geom_hline(aes(yintercept=mean(Observation)), linetype=2)+
labs(title="Forest Plot")+
coord_flip()