Conditional stat_summary for ggplot in R

Question

See ?stat_summary.

fun.data : Complete summary function. Should take data frame as input and return data frame as output

Your function max.n.filt uses an if() statement that tries to evaluate the condition x > filter. But when length(x) > 1, the if() statement only evaluates the condition for the first value of x. When used on a data frame, this will return a list cobbled together from the original input x and whatever label the if() statement returns.

> max.n.filt(data.frame(x=c(10,15,400)))
$y.x
[1]  10  15 400

$label
[1] ""

Try a function that uses ifelse() instead:

max.n.filt2 <- function(x){
    filter = 300                  # whatever threshold
    y = ifelse( x > filter, max(x) + 1, x[,1] )
    label = ifelse( x > filter, round(max(x),2), NA )
    return(data.frame(y=y[,1], label=label[,1]))
}

> max.n.filt2(data.frame(x=c(10,15,400)))
    y label
1  10    NA
2  15    NA
3 401   400

Alternatively, you might just find it easier to use geom_text(). I can't reproduce your example, but here's a simulated dataset:

set.seed(101)
sim_data <- expand.grid(m1=1:1440, variable=factor(c(0,0.25,0.5,0.75,1)))
sim_data$sample_size <- sapply(1:1440, function(.) sample(1:25, 1, replace=T))
sim_data$value = t(sapply(1:1440, function(.) quantile(rgamma(sim_data$sample_size, 0.9, 0.5),c(0,0.25,0.5,0.75,1))))[1:(1440*5)]

Just use the subset argument in geom_text() to select those points you wish to label:

ggplot(sim_data, aes(x = m1/60, y = value, color = variable)) +
geom_point(size = 4) + geom_text(aes(label=round(value)), subset = .(variable == 1 & value > 25), angle = 90, size = 4, colour = "red", hjust = -0.5)

If you have a column of sample sizes, those can be incorporated into label with paste():

ggplot(sim_data, aes(x = m1/60, y = value, color = variable)) +
geom_point(size = 4) + geom_text(aes(label=paste(round(value),", N=",sample_size)), subset = .(variable == 1 & value > 25), angle = 90, size = 4, colour = "red", hjust = -0.25)

(or create a separate column in your data with whatever labels you want.) If you're asking about how to retrieve the sample sizes, you could modify your call to ddply() like this:

...
c2 <- ddply(C, .(h1), function (x) { cbind(summarise(x, y = quantile(x$gaps, cuts)), n=nrow(x)) } )
c2$cuts <- cuts
c2 <- dcast(c2, h1 + n ~ cuts, value.var = "y")
c2.h1.melt <- melt(c2, id.vars = c("h1","n"))
...