t test or anova

https://datascience.stackexchange.com/questions/10178

python
statistics
pandas

16-10-2019
|

Question

I have a pandas data frame of the form:

r1    r2    r3    r4    r5

0    1    12    0    4
1    1    2    9    2
32   5    0    0    0
12   14   3    1    23
0    2    43    5    2
9    3    5    1    1
0    0    0    0    1
1    0    0    0    0

And I want to check if any column: r1, r2, r3, r4, r5 significantly differs from any of the other. Should I do a t test or an anova? And how would I set it up for the computation?

Solution

This is typical statistics problem. When you have multiple 'classes' that you assume are normally distributed you first run an ANOVA. Then, IFF (if-and-only-if) the ANOVA is significant, then run post-hoc pairwise t-tests with an appropriate correction (e.g. Bonferroni).

Licensed under: CC-BY-SA with attribution

Not affiliated with datascience.stackexchange