Chapter 29 A Closer Look at Tests of Significance

29.1 Chapter Notes

This chapter discusses some caveats and problems with significance testing.

  • It is not true that a statistically significant result cannot be caused by chance variation.
  • Deciding which hypothesis to test after looking at the data makes p-values hard to interpret.
  • One form of this data snooping is to look at whether your sample average is too big or two small, and then deciding to use a one-tailed test in the direction observed.
  • As long as they say what they have done, this can be corrected. If you think a two-tailed test should have been performed instead, just double the p-value.
  • Statistical significance and practical significance are two distinct ideas. With a large enough sample, even a difference too small to ever matter practically can lead to a very small p-value.
  • A box model is always needed to make sense out of a test of significance.
  • Take care to distinguish between samples drawn by probability methods and a sample of convenience.
  • Rejection of \(H_0\) should not be confused with strong evidence for your substantive theory \(T\) by itself. If too many sixes come up in a die-rolling test of telekinesis, we may have simply found evidence that the die used is biased. Multiple competing theories can predict the same statistical model.
  • Tests of significance only ever answer one question:

How easy is it to explain the difference between the data and what is expected on the null hypothesis, on the basis of chance variation alone?

  • Sometimes this is not the right question to ask, and techniques of estimation are needed instead of hypothesis testing.