
Standard Deviation

The standard deviation is a statistical measure that quantifies the amount of variation or dispersion in a data set. In other words, it tells us how spread out the numbers are around the mean (average) of the data set.

Let's assume we measure the height of a small group of people. The standard deviation tells us how much our data deviates from the mean.

Calculation of the standard deviation

To calculate the standard deviation, we first need to calculate the mean. We get the mean by simply adding up the heights of all the people and dividing by the number of people. Let's assume we get a mean of 155 cm. Now, we want to know how much each person deviates from the mean.

Example Standard Deviation

So, we look at the first person, who deviates 18 cm from the mean, the second person deviates 8 cm from the mean, the third 15 cm, the fourth 8 cm, the fifth 9 cm, and finally, the last person deviates 6 cm from the mean. Simply put: people who are very tall or very short deviate more from the mean.

Now, we are not interested in the deviation of each individual person from the mean, but we want to know how much the people, on average, deviate from the mean, and that is exactly what the standard deviation tells us.

Standard Deviation

In our example, the average deviation from the mean is 11.5 cm. To calculate the standard deviation, we can use this formula:

Standard Deviation Formula

So, the standard deviation is the square root of the sum of the squared deviations divided by the number of values.

For our example, this means we calculate the height of the first person minus the mean and square it, the height of the second person minus the mean and square it, and so on until we reach the last person.

Standard Deviation Formula Example

Then, we divide this by the number of people, i.e., 6, and take the square root. The result is a standard deviation of 11.5 cm.

Each person, therefore, has a certain deviation from the mean, but on average, the people deviate 11.5 cm from the mean, which is the standard deviation.

Now, you might notice one thing: I keep talking about the “average deviation” from the mean. But for the average deviation, we would simply add all the deviations and divide by the number of participants, just like calculating a mean, right?

That is absolutely correct, but there are different types of averages. In the case of the standard deviation, we use the quadratic mean instead of the arithmetic mean.

Average Deviation from the Mean

Different Formulas

So far, so good. Now there is one more thing to consider! There are two slightly different formulas for the standard deviation.

Different Standard Deviation Formulas

The difference is that in one case, we divide by n, and in the other, we divide by n-1. But why are there two different formulas?

Normally, we want to know the standard deviation of the entire population. For example, we want to know the standard deviation of the height of all German professional football players.

If we had the height of all German professional football players, we would use the formula with 1 divided by n.

Population Standard Deviation

However, it is usually not possible to survey the entire population, which is why we take a sample. We then use this sample to estimate the standard deviation of the population. In this case, we use the formula with n-1.

Estimated Standard Deviation

Simply put: if our survey does not cover the entire population, we always use the formula with n-1! If we have conducted a clinical study, we also use this formula to infer the population.

Standard Deviation and Variance

But what is the difference between the standard deviation and variance? As we know, the standard deviation is the quadratic mean of the distance from the mean. The variance is the squared standard deviation.

Standard Deviation and Variance

So, we almost have the same formula! The only difference is that for the standard deviation, we take the square root. For the variance, we do not.

Since the square root is taken, the standard deviation is always in the same unit as the original data. In our case, cm! Therefore, it is advisable to use the standard deviation for describing data, as it simplifies interpretation.

Standard Deviation vs Variance

The variance is more difficult to interpret because the unit is the square of the original unit. In our case, cm2.

