Chapter 29 A Closer Look at Tests of Significance
29.1 Chapter Notes
This chapter discusses some caveats and problems with significance testing.
- It is not true that a statistically significant result cannot be caused by chance variation.
- Deciding which hypothesis to test after looking at the data makes p-values hard to interpret.
- One form of this data snooping is to look at whether your sample average is too big or two small, and then deciding to use a one-tailed test in the direction observed.
- As long as they say what they have done, this can be corrected. If you think a two-tailed test should have been performed instead, just double the p-value.
- Statistical significance and practical significance are two distinct ideas. With a large enough sample, even a difference too small to ever matter practically can lead to a very small p-value.
- A box model is always needed to make sense out of a test of significance.
- Take care to distinguish between samples drawn by probability methods and a sample of convenience.
- Rejection of \(H_0\) should not be confused with strong evidence for your substantive theory \(T\) by itself. If too many sixes come up in a die-rolling test of telekinesis, we may have simply found evidence that the die used is biased. Multiple competing theories can predict the same statistical model.
- Tests of significance only ever answer one question:
How easy is it to explain the difference between the data and what is expected on the null hypothesis, on the basis of chance variation alone?
- Sometimes this is not the right question to ask, and techniques of estimation are needed instead of hypothesis testing.