In R, how do I calculate expected values in a chi-square test when survey lengths differ?

StackOverflow https://stackoverflow.com/questions/12208346

  •  29-06-2021
  •  | 
  •  

Question

I am doing a behavioral study in which I want to see if a species shows a response significantly different from expected among three periods. There are 47 independent observations of the species, each with three periods, for a total observation period of 8.6 minutes. The first period is 3 minutes, the second period is 0.6 minutes, and the third period is 5 minutes. During each period, animals can either respond positively or negatively. During the first period, there were two positive responses (out of 47 observations; 45 negative), during the second period, 13 of 47 responses were positive, and during the third period, 14 of 47 responses were positive.

Thus I'm attempting to run a chisquare test where I adjust the probabilities in the null hypothesis to correct for the difference in time among periods, but I don't think I'm doing it correctly.

data<-c(2,13,14)
null.probs<-c(3/8.6, 0.6/8.6, 5/8.6)
chi<-chisq.test(data, p=null.probs)

I am fairly certain that my null hypothesis of those expected values is not correct in this case, but I'm not sure how to properly adjust it.

Was it helpful?

Solution

If you run a glm fit with 'poisson' errors, you get a deviance statistic that is distributed as chi-square. You can use the counts as outcome and add an offset term that is the log(time) which will adjust for the different lengths of observation.

> counts<-c(2, 13, 14)
> times<-c(3, 0.6, 5)
> glm(counts ~ letters[1:3] +offset( log(times)), family="poisson")

Call:  glm(formula = counts ~ letters[1:3] + offset(log(times)), family = "poisson")

Coefficients:
  (Intercept)  letters[1:3]b  letters[1:3]c  
      -0.4055         3.4812         1.4351  

Degrees of Freedom: 2 Total (i.e. Null);  0 Residual
Null Deviance:      36.68 
Residual Deviance: 1.776e-15    AIC: 17.52 

Degrees of Freedom: 2 Total (i.e. Null);  0 Residual
Null Deviance:      36.68 
Residual Deviance: 1.776e-15    AIC: 17.52 

You have exactly fitted a model that only had two degrees of freedom possible. So the exact fit will obviously explain all of the data. What you are using for inference is the sum of departures of the log(counts) from the Poisson mean. (There will be a predictable problem if any of the counts are zero.)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top