Standard deviation is the most-quoted statistic in U.S. math classes outside of "the average." But while everyone uses the word, fewer people can explain what a "standard deviation of 5" actually means — or why we go through the trouble of squaring everything when computing it. Here's the intuitive explanation.

What it measures

Standard deviation measures spread. Specifically, it measures how far a typical data point is from the mean.

Two datasets can have the same mean but very different spreads:

  • Dataset A: 49, 50, 51 → mean 50, SD ≈ 1
  • Dataset B: 0, 50, 100 → mean 50, SD ≈ 50

Both have a mean of 50, but the second is enormously more spread out. The SD captures that — Dataset B has a typical deviation from the mean that's 50 times larger than Dataset A.

The plain-English meaning

"This data has a mean of 70 and a standard deviation of 5" translates roughly to: most values are within 5 units of 70. Specifically, in a normal distribution:

  • About 68% of values are within 1 SD of the mean (65–75)
  • About 95% are within 2 SDs (60–80)
  • About 99.7% are within 3 SDs (55–85)

This is the "68–95–99.7 rule" — your single most useful tool for interpreting any normally-distributed dataset.

How it's computed (and why)

Steps:

  1. Find the mean.
  2. For each data point, find its deviation from the mean (data − mean). Some deviations are positive, some negative.
  3. Square each deviation. (Why? Because the negatives would cancel the positives if you just averaged the raw deviations.)
  4. Average the squared deviations. This is the variance.
  5. Take the square root. This is the standard deviation — back in the original units.

For data {2, 4, 4, 4, 5, 5, 7, 9}:

  • Mean = 5
  • Deviations: −3, −1, −1, −1, 0, 0, 2, 4
  • Squared: 9, 1, 1, 1, 0, 0, 4, 16
  • Mean of squares = 32/8 = 4 (this is the population variance)
  • SD = √4 = 2

So a typical value in this dataset is about 2 units away from the mean of 5.

Why square the deviations?

If you just averaged the raw deviations, the positives and negatives would cancel — every dataset would have an average deviation of zero, which is useless.

You could try absolute value instead: |x − mean|. This gives a stat called "mean absolute deviation" (MAD). It's a valid measure of spread, but it has weaker math properties — specifically, you can't easily do calculus on absolute values.

Squaring eliminates negatives, makes the math smooth (differentiable), and amplifies the influence of outliers — which often is what you want, since outliers are the most informative data points about spread.

Sample vs population SD

Two slightly different formulas exist:

  • Population SD: divide by n (the number of data points). Use when your data is the whole population.
  • Sample SD: divide by n − 1. Use when your data is a sample drawn from a larger population.

The n − 1 version (called Bessel's correction) is unbiased — it gives a better estimate of the true population SD when you only have a sample. Most real-world stats and AP Stats use the sample formula.

For large samples (n > 30), the two are nearly identical. For small samples, sample SD is meaningfully larger.

What "high" and "low" SD mean

SD is in the same units as the data. An SD of 5 inches has different meaning than an SD of 5 dollars or 5 IQ points. Always specify units.

To compare across datasets, use the coefficient of variation: SD / mean. This gives a unitless measure. A coefficient of 0.05 means SD is 5% of the mean (very tight). A coefficient of 0.5 means SD is half the mean (highly variable).

SD in real datasets

  • Adult U.S. heights: mean ~5'9" for men, SD ~3 inches. So 95% of men are between 5'3" and 6'3" (mean ± 2 SD).
  • SAT scores: mean ~1050, SD ~200. A 1450 is 2 SDs above the mean — top 2.5% nationally.
  • Daily stock returns: S&P 500 typical SD ~1% per day. A 3% drop is a 3-SD event — rare but not unprecedented.
  • IQ: defined to have mean 100, SD 15. So 130 is 2 SDs above (top 2.5%); 70 is 2 SDs below.

Common confusions

  • SD vs variance: variance is SD squared. Variance has squared units (very weird to interpret); SD has natural units. SD is what you report; variance is what you sometimes calculate as an intermediate step.
  • SD vs standard error: different concepts. SD measures spread of individual data; standard error measures spread of the sample mean across many imaginary samples. Standard error = SD / √n.
  • SD assumes normal distribution. The 68–95–99.7 rule only holds for normal data. Skewed or heavy-tailed distributions need different rules.

Compute your own

Our standard deviation calculator computes mean, both SDs (sample and population), variance, and range from a list of numbers. Useful for AP Statistics homework, lab data, or just exploring what SD means on data you collect yourself.