Pergunta

Following is head of my data:

dput(head(trucksv[,c(1,5)]))
structure(list(Measur. = c(1L, 2L, 3L, 4L, 5L, 1L), Speed.Mean.Trucks = c(NA, 
NA, 9.5, 4.5, NA, NA)), .Names = c("Measur.", "Speed.Mean.Trucks"
), row.names = c(1L, 2L, 3L, 4L, 5L, 17L), class = "data.frame")

I want to find cumulative distribution of speeds by 'Measur.' for which I used following function:

f <- function(x) {
  hi <- hist(x)
  speedmph=round(hi$breaks*0.68,1)
  prob=c(0, round(cumsum(hi$counts)/sum(hi$counts),digits=2))
  cbind(speedmph, prob)
}

But when I try to apply it to my data R gives me following error:

tspdistu <- ddply(trucksv, 'Measur.', summarise, trucksspeedmph = f(Speed.Mean.Trucks)) 
Error in hist.default(x) : invalid number of 'breaks'
Called from: top level 
Browse[1]> 

I am not sure how to find correct number of bins. Please help. Thanks in advance.

Foi útil?

Solução

The NA's are throwing it off (i.e. it has nothing to do with the # of bins). Here's a slightly modified f() with both plotting disabled for hist (it's unlikely you want plots) and with handing a column subset that's all NA's

f <- function(x) {

  y <- x[!is.na(x)]

  if (length(y) > 0) {

    hi <- hist(x, plot=FALSE)

    speedmph <- round(hi$breaks*0.68,1)

    prob <- c(0, round(cumsum(hi$counts) / sum(hi$counts), digits=2))

    cbind(speedmph, prob)

  } else { # still need to return proper sized values 

    cbind(rep(NA, length(x)), rep(NA, length(x)))

  }

}
Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top