Question

I'm really new to R, so please bear with me. I'm using a chi-squared test to compare nucleotide frequencies at a given position, and I counted up the number of A,C,G,T in two different data sets:

x1 <- c(272003,310418,201601,237168)
x2 <- c(239614,316515,182070,198025)

I can think of two ways to ask for a two-sample chi-squared test:

> chisq.test(x1,x2)

    Pearson's Chi-squared test

data:  x1 and x2
X-squared = 12, df = 9, p-value = 0.2133

Warning message:
In chisq.test(x1, x2) : Chi-squared approximation may be incorrect

or

> chisq.test(cbind(x1,x2))

    Pearson's Chi-squared test

data:  cbind(x1, x2)
X-squared = 2942.065, df = 3, p-value < 2.2e-16

I suspect that the second version is correct, because I can also do this:

> chisq.test(x1,x1)

    Pearson's Chi-squared test

data:  x1 and x1
X-squared = 12, df = 9, p-value = 0.2133

Warning message:
In chisq.test(x1, x1) : Chi-squared approximation may be incorrect

with an identical and obviously incorrect result.

What is actually being calculated in this case?

Thanks!

Was it helpful?

Solution

chisq.test(x1,x1)$expected shows the following:

        x1
x1       201601 237168 272003 310418
  201601   0.25   0.25   0.25   0.25
  237168   0.25   0.25   0.25   0.25
  272003   0.25   0.25   0.25   0.25
  310418   0.25   0.25   0.25   0.25

Observed counts (chisq.test(x1,x1)$observed):

        x1
x1       201601 237168 272003 310418
  201601      1      0      0      0
  237168      0      1      0      0
  272003      0      0      1      0
  310418      0      0      0      1

So like this it assumes that you provide all the pairs, but you only provide the same numbers hence that is the observed count. The expected value is then actually "correct" (though silly in this case). As a side note, chisq.test(cbind(x1,x1)) does what you expect it to do (X-squared = 0, df = 3, p-value = 1).

Your second result looks good though:

> chisq.test(cbind(x1,x2))$observed
         x1     x2
[1,] 272003 239614
[2,] 310418 316515
[3,] 201601 182070
[4,] 237168 198025
> chisq.test(cbind(x1,x2))$expected
           x1       x2
[1,] 266912.4 244704.6
[2,] 327073.2 299859.8
[3,] 200162.6 183508.4
[4,] 227041.8 208151.2
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top