Question

What are the data conditions that we should watch out for, where p-values may not be the best way of deciding statistical significance? Are there specific problem types that fall into this category?

Was it helpful?

Solution

You are asking about Data Dredging, which is what happens when testing a very large number of hypotheses against a data set, or testing hypotheses against a data set that were suggested by the same data.

In particular, check out Multiple hypothesis hazard, and Testing hypotheses suggested by the data.

The solution is to use some kind of correction for False discovery rate or Familywise error rate, such as Scheffé's method or the (very old-school) Bonferroni correction.

In a somewhat less rigorous way, it may help to filter your discoveries by the confidence interval for the odds ratio (OR) for each statistical result. If the 99% confidence interval for the odds ratio is 10-12, then the OR is <= 1 with some extremely small probability, especially if the sample size is also large. If you find something like this, it is probably a strong effect even if it came out of a test of millions of hypotheses.

OTHER TIPS

You shouldn't consider the p-value out of context.

One rather basic point (as illustrated by xkcd) is that you need to consider how many tests you're actually doing. Obviously, you shouldn't be shocked to see p < 0.05 for one out of 20 tests, even if the null hypothesis is true every time.

A more subtle example of this occurs in high-energy physics, and is known as the look-elsewhere effect. The larger the parameter space you search for a signal that might represent a new particle, the more likely you are to see an apparent signal that's really just due to random fluctuations.

One thing you should be aware of is the sample size you are using. Very large samples, such as economists using census data, will lead to deflated p-values. This paper "Too Big to Fail: Large Samples and the p-Value Problem" covers some of the issues.

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top