Spearman's rank correlation coefficient
The Spearman rank correlation examines the relationship between two variables, being the non-parametric counterpart of Pearson's correlation. Therefore, in this case, a normal distribution of the data is not required.
There is an important difference between the two correlation coefficients! Spearman correlation uses the ranks of the data rather than the original data, hence the name rank correlation.
Example of Spearman correlation
We measured the reaction time of 8 computer gamers and asked them about their age.
If we use a Pearson correlation, we simply take the two variables reaction time and age and calculate the Pearson correlation coefficient. However, we now want to calculate the Spearman rank correlation, so we first assign a rank to each person for reaction time and age.
The reaction time is already sorted by size. 12 is the smallest value, so it gets rank 1, 15 is the second smallest, so it gets rank 2 and so on. Now we do the same with the age.
Let's take a look at this in a scatter plot. On the left side we see the initial data of age and responsiveness and on the right side the ranks.
We have studied 8 people and since we have no rank correlations, we therefore have 8 ranks to assign. With this transformation we now have a more even distribution of the data.
To calculate the Spearman correlation, we simply calculate the Pearson correlation of the ranks. So the Spearman correlation is the same as the Pearson correlation, except that the ranks are used instead of the original values.
Let's have a quick look at this in DATAtab. You can load the data we used here.
On the one hand, we have the reaction time and age, and on the other hand we have the just created ranks from the reaction time and age.
Now we can either calculate the Spearman rank correlation from the reaction time and age, or we can calculate the Pearson correlation from the ranks. In both cases we get a correlation of 0.9.
Spearman rank correlation and Kendall's tau
Kendall's tau is very similar to the Spearman correlation. However, Kendall's tau should be preferred to Spearman's correlation when only a few data with many ties are available.
Spearman Correlation Equation
If there are no rank ties, this equation can also be used to calculate the Spearman correlation.
Where n is the number of cases and d is the difference between the rankings of the two variables. For our example, the result is as follows:
The sum of d_{i}^{2} is 8 and n, which is the number of people, is also 8. If we put it all in, we get a correlation coefficient of 0.9.
Spearman correlation coefficient
Like Pearson's correlation coefficient r, the Spearman's correlation coefficient r_{s} also varies between -1 and 1.
Using the coefficient, we can now determine two things:
- The strength of the correlation and
- the direction of the correlation.
The strength of the correlation can be read from a table.
Amount of r_{s} | Strength of correlation |
---|---|
0.0 < 0.1 | no correlation |
0.1 < 0.3 | low correlation |
0.3 < 0.5 | medium correlation |
0.5 < 0.7 | high correlation |
0.7 < 1 | very high correlation |
If we have a coefficient between -1 and less than 0, there is a negative correlation, that is, a negative relationship between the variables. If we have a coefficient greater than 0 and greater than 1, there is a positive correlation, that is, a positive relationship between the two variables. If the result is 0, there is no correlation.
Testing the significance of correlation coefficients
Often our aim is to test a hypothesis about the population from a sample.
We have calculated the correlation coefficient for the sample data. We can now test whether the correlation coefficient is significantly different from 0.
The null hypothesis and the alternative hypothesis are as follows:
- Null hypothesis: the correlation coefficient r_{s} = 0 (There is no correlation).
- Alternative hypothesis The correlation coefficient r_{s} ≠ 0 (There is a correlation).
Whether the correlation coefficient is significantly different from zero, based on the sample collected, can be tested using a t-test.
Where r is the correlation coefficient and n is the sample size. A p-value can then be calculated from the test statistic t. If the p-value is less than the specified significance level (usually 5%), then the null hypothesis is rejected, otherwise it is not.
If we use DATAtab to calculate the example, we get a p-value of 0.002.
Therefore, the p-value is less than 0.05 and we can reject the null hypothesis that the correlation coefficient is zero in the population.
Statistics made easy
- many illustrative examples
- ideal for exams and theses
- statistics made easy on 276 pages
- 3rd revised edition (July 2023)
- Only 6.99 €
"Super simple written"
"It could not be simpler"
"So many helpful examples"