Chapter 26 Tests of Significance
26.1 Chapter Notes
The chapter introduces an example to illustrate the ideas behind significance testing:
Suppose two investigators are arguing about a large box of numbered tickets. Dr. Nullsheimer says the average is 50. Dr. Altshuler says the average is different from 50. Eventually, they get tired of arguing, and decide to look at some data. There are many, many tickets in the box, so they agree to take a sample— they’ll draw 500 tickets at random. (The box is so large that it makes no difference whether the draws are made with or without replacement.) The average of the draws turns out to be 48, and the SD is 15.3.
The investigators discuss whether the observed average of 48 is likely to be due to chance. They work out the standard error for the average
\[ \frac{\sqrt{500} \times 15.3}{500}\approx 0.7 \]
The sample average is three standard errors below 50, which the investigators conclude is hard to explain by chance.
This is a test statistic, usually called z:
\[ z = \frac{\text{observed} - \text{expected}}{\text{SE}} \]
Where the expected value is the one expected under the assumption that the null hypothesis is true.
The SD of the box is estimated from the data to compute the standard error.
Using the normal approximation, the probability of getting the value 48 or a lower value is about 0.001. This is the p-value.
The z-test is used for reasonably large samples, when the normal approximation can be used on the probability histogram for the average of the draws. (The average has already been converted to standard units, by z.) With small samples, other techniques must be used, as discussed in section 6 below.
The test statistic outlined above is called the “one-sample z-test.”
A second example of the one-sample z-test is introduced:
Charles Tart ran an experiment at the University of California, Davis, to demonstrate ESP.5 Tart used a machine called the “Aquarius.” The Aquarius has an electronic random number generator and 4 “targets.” Using its random number generator, the machine picks one of the 4 targets at random. It does not indicate which. Then, the subject guesses which target was chosen, by pushing a button. Finally, the machine lights up the target it picked, ringing a bell if the subject guessed right. The machine keeps track of the number of trials and the number of correct guesses.
The box model looks like this:
There were 7,500 guesses in total from 15 subjects who were thought to be clairvoyant. They guessed correctly 2,006, when 1,875 would be expected by chance alone. We have the numerator for the test statistic: 2,006 - 1,875 = 131. The denominator is the standard error. We know the SD of the box this time (assuming the null hypothesis is true), we don’t have to estimate it from the sample: it is \(\sqrt{0.25 \times 0.75}\approx 0.43\), and so the standard error is about \(0.43 \times 7,500 = 37\).
The value of the one-sample z-statistics is about 3.5 and the p-value is about 2 in 10,000.
Next, the t-test is introduced.
Here’s an example:
In Los Angeles, many studies have been conducted to determine the concentration of CO (carbon monoxide) near freeways with various conditions of traffic flow. The basic technique involves capturing air samples in special bags, and then determining the CO concentrations in the bag samples by using a machine called a spectrophotometer. These machines can measure concentrations up to about 100 ppm (parts per million by volume) with errors on the order of 10 ppm. Spectrophotometers are quite delicate and have to be calibrated every day. This involves measuring CO concentration in a manufactured gas sample, called span gas,where the concentration is precisely controlled at 70 ppm. If the machine reads close to 70 ppm on the span gas, it’s ready for use; if not, it has to be adjusted. A complicating factor is that the size of the measurement errors varies from day to day. On any particular day, however, we assume that the errors are independent and follow the normal curve; the SD is unknown and changes from day to day.
One day we get the following readings:
78, 83, 68, 72, 88.
Four higher than 70, some a lot higher. Is this chance?
The chapter emphasises that each time we want to use a test of significance, we should try to translate the problem into a box model. In this case the appropriate box model is the Gausss model introduced in chapter 24. The idea here is that each measurement is the true value plus a bias plus a chance error. The chance error is assumed to have been drawn from a box where the tickets average out to 0 and the SD is unknown.
The null hypothesis is that the bias is 0.
Here’s the test statistic:
\[ \frac{\text{observed} - \text{expected}}{\text{SE}} \]
The average of the five measurements is 77.8, the expected value under the null hypothesis is 70. The SD of the five measurements is 7.22. Let’s provisionally use this as the SD of the box. Te SE for the average is
\[ \frac{\sqrt{5} \times 7.22}{5} \approx 3.23 \]
This gives a test statistic of \(\frac{77.8 - 70}{3.23} \approx 2.4\).
Is our use of the SD of the sample as the SD of the population reasonable? Our sample is only five measurements. There is extra uncertainty that has to be taken into account. The SD of the error box should not be estimated by the SD of the measurements. Instead \(SD^+\) is used:
\[ SD^+ = \sqrt{\frac{\text{number of measurements}}{\text{number of measurements} - \text{one}}} \times SD \]
Using \(SD^+\), the new value of the test statistic is 2.2, a little lower. So the results are a little less surprising, assuming the null hypothesis.
To find the p-value, we no longer use the normal approximation. With a small number of observations, we use Student’s curve. Student’s curve has a parameter degrees of freedom. If
- \(\bar{X}\) is the sample mean
- \(\mu\) is the expected mean
- \(\sigma\) is the population standard deviation
- \(SD^+\) is the sample standard deviation
- \(n\) is the sample size
then the random variable
\[ \frac{\bar{X}-\mu}{\sigma / \sqrt{n}} \] has a standard normal distribution. But we don’t know the population standard deviation in this case. The random variable
\[ \frac{\bar{X}-\mu}{SD^+ / \sqrt{n}} \]
has a Student’s t-distribution with n-1 degrees of freedom.
Student’s curve should be used under the following circumstances: * The data are like draws from a box. * The SD of the box is unknown. * The number of observations is small, so the SD of the box cannot be estimated very accurately. * The histogram for the contents of the box does not look too different from the normal curve.
Further Reading
Endnote 2 contains the following:
Additional reading, in order of difficulty —
- J. L. Hodges, Jr. and E. Lehmann, Basic Concepts of Probability and Statistics,2nd ed. (SIAM, 2004).
- L. Breiman, Statistics with a View towards Applications (Houghton Mifflin, 1973).
- J. Rice, Mathematical Statistics and Data Analysis,3d ed.(Duxbury Press, 2005).
- P. Bickel and K. Doksum, Mathematical Statistics,2nd ed.(Prentice Hall, 2001).
- E. Lehmann, Theory of Point Estimation,2nd ed. with G. Casella (Springer, 1998).
- E. Lehmann, Testing Statistical Hypotheses,3rd ed. with J. Romano (Springer, 2005).