After reading a recent blog post about an application of the Poisson distribution, I tried reproducing its findings using Python's 'scipy.stats' module, as well as Excel/LibreOffice 'POISSON' and 'CHITEST' functions.
For the expected values shown in the article, I simply used:
import scipy.stats
for i in range(8):
print(scipy.stats.poisson.pmf(i, 2)*31)
This reproduces the table shown in the blog post - and I also recreated it from within LibreOffice, using a first column A that has values 0 to 7 in cells A1, A2, ..., A8, and the simple formula '=POISSON(A1, 2, 0)*31' repeated in the first 8 lines of column B.
So far so good - now for the chi-squared p-test value:
Under LibreOffice, I just wrote down the observed values in cells C1-C8, and used '=CHITEST(C1:C8, B1:B8)' to reproduce the article's reported p-value of 0.18. Under scipy.stats however, I can't seem to reproduce this value:
import numpy as np
import scipy.stats
obs = [4, 10, 7, 5, 4, 0, 0, 1]
exp = [scipy.stats.poisson.pmf(i, 2)*31 for i in range(8)]
# we only estimated one variable (the rate of 2 killings per year via 62/31)
# so dof will be N-1-estimates
estimates = 1
print(scipy.stats.chisquare(np.array(obs), np.array(exp), ddof=len(obs)-1-estimates))
# (10.112318133864241, 0.0014728159441179519)
# the p-test value reported is 0.00147, not 0.18...
#
# Maybe I need to aggregate categories with observations less than 5
# (as suggested in many textbooks of statistics for chi-squared tests)?
observedAggregateLessThan5 = [14, 7, 5, 5]
expectedAggregateLessThan5 = [exp[0]+exp[1], exp[2], exp[3], sum(exp[4:])]
print(scipy.stats.chisquare(np.array(observedAggregateLessThan5), np.array(expectedAggregateLessThan5), ddof=len(observedAggregateLessThan5)-1-estimates))
# (0.53561749342466913, 0.46425467595930309)
# Again the p-test value computed is not 0.18, it is 0.46...
What am I doing wrong?