Question

I have a pandas data frame of the form:

r1    r2    r3    r4    r5

0    1    12    0    4
1    1    2    9    2
32   5    0    0    0
12   14   3    1    23
0    2    43    5    2
9    3    5    1    1
0    0    0    0    1
1    0    0    0    0

And I want to check if any column: r1, r2, r3, r4, r5 significantly differs from any of the other. Should I do a t test or an anova? And how would I set it up for the computation?

Was it helpful?

Solution

This is typical statistics problem. When you have multiple 'classes' that you assume are normally distributed you first run an ANOVA. Then, IFF (if-and-only-if) the ANOVA is significant, then run post-hoc pairwise t-tests with an appropriate correction (e.g. Bonferroni).

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top