Chapter 5 The Normal Approximation for Data

5.1 Chapter Notes

The normal curve with mean 0, standard deviation 1:

\[ y = \frac{100\%}{\sqrt{2\pi}}e^{-x^2/2} \]

Roughly 68% of the area under the curve is between -1 and 1 - i.e. within 1 standard deviation of the mean. About 95% of the area is between -2 and 2. About 99.7% is between -3 and 3. There’s a bit in the chapter about finding the area under the particular sections of the curve - I’ve done quite a bit of this in actuarial exams.

There’s a section in the chapter about normal approximation. You have some data with a mean and standard deviation, and you assume the data is normally distributed with the same mean and standard deviation, and then answer questions like “what percentage of men have heights between 63 and 72 inches.”

If the histogram of your data follows the normal distribution then the mean and standard deviation may well be good summary statistics. They are poor summary statistics if your data is not normally distributed. There is more information in the distribution than is captured by those two figures.

In this latter case (e.g. income data) we can use percentiles to summarise the histograms.

Here are some rules for changing scales:

  • Adding the same number to every entry on a list adds that constant to the average; the SD does not change.
  • Multiplying every entry on a list by the same positive number multiplies the average and the SD by that constant.
  • These changes of scale do not change the standard units - since standardising units involves subtracting the mean and dividing by the standard deviation.