Wilcoxon signed-rank test

The Wilcoxon test (Wilcoxon signed-rank test) tests whether the mean values of two dependent groups differ significantly from each other.

The Wilcoxon test is a non-parametric test and is therefore subject to considerably fewer assumptions than its parametric counterpart, the t-test for dependent samples. Therefore, as soon as the boundary conditions for the t-test for dependent samples are no longer fulfilled, the Wilcoxon test is used.

Medical example:

You should check whether your memory performance is better in the morning or in the evening.

Technical example:

A V-belt producer has very high downtimes on his 5 production lines. You should now find out whether a system setting has an influence on the downtimes.

Assumptions Wilcoxon test

Since the Wilcoxon test is a nonparametric test, the data need not be normally distributed. However, to calculate a Wilcoxon test, the samples must be dependent. Dependent samples are present, for example, when data is obtained from repeated measurements or when so-called natural pairs are involved.

Repeat measurement: A characteristic of a person, e.g. weight, was measured at two points in time
Natural couples: The values do not necessarily have to be from the same person but from people who belong together, for example lawyer/client, wife/husband and psychologist/patient. Of course, they do not have to be people either.
Independence: The Wilcoxon sign test assumes independence, i.e., the paired observations are drawn randomly and independently.

Furthermore, the distribution shape of the differences of the two dependent samples should be approximately symmetrical. If the data are not available in pairs, the Mann-Whitney U test is used instead of the Wilcoxon test.

Hypotheses in the Wilcoxon test

The hypotheses of the Wilcoxon test are very similar to the hypotheses of the dependent t-test. However, in the case of the Wilcoxon test, the test is whether there is a difference in the central tendency; in the case of the t-test, the test is whether there is a difference in the mean. Thus, the Mann-Whitney U test results in: in:

Null hypothesis: There is no difference (in terms of central tendency) between the two groups in the population.
Alternative hypothesis: There is a difference (with respect to the central tendency) between the two groups in the population.

Wilcoxon test and test power

Now of course the question may come, why don't I just always use the Wilcoxon test instead of the t-test for dependent samples? Then I don't need to test for normal distribution! Parametric tests like the t-test are usually more powerful!

With a parametric test, a smaller difference or a smaller sample is usually enough to reject the null hypothesis. Both are, of course, very convenient. Therefore, if possible, always use parametric tests!

Calculate Wilcoxon test

To calculate the Wilcoxon test for two dependent samples, the difference between the dependent values is first calculated. After the differences are calculated, the absolute values of the differences are used to form the rankings. It is important to note the original sign of the differences (An example with tied ranks comes below)..

In the last step, the sums of the ranks are formed, which are derived from a positive and a negative difference. The test statistics W is then calculated from the smaller value of T⁺ and T^-

In this example, the test statistics W results in 8

If there is no difference in the rank sum, the expected value is

In this example, the expected value is 10.5. The calculated test statistic must now be tested for significance.

If the sample is sufficiently large, i.e. there is a number of cases greater than 25, the critical value is approximately normally distributed. If normal distribution is assumed, the z-value can be calculated using the formula above. If less than 25 values are present, the critical T-value is read from a table of critical T-values. Therefore, in this case, the table would actually be used.

The calculated z value from the Wilcoxon test can now be checked for significance by comparing it with the critical value of the standard normal distribution.

Calculate Wilcoxon signed-rank test with tied ranks

If several people share a rank, connected ranks are present. In this case, there is a change in the calculation of the rank sums and the standard deviation of the W-value. We will now go through both using an example.

In the example it can be seen that there are...

...three people who have a difference in amount of two, these people share the ranks 2, 3 and 4.
...two people who have a difference in amount of 4, these people share the ranks 6 and 7.

Wilcoxon signed-rank test with tied ranks

To account for these connected ranks, the mean values of the joined ranks are calculated in each case. In the first case, this results in a "new" rank of 3 and in the second case in a "new" rank of 6.5. Now we can calculate the rank sums of the positive and negative ranks.

Since the rank ties are clearly visible in the upper table, a term is calculated here that is needed for the later calculation of the W-value in the presence of rank ties.

Now all values are available to calculate the z-value considering connected ranks.

Again, noting that you actually need about 20 cases to assume normal distribution of W values.

Effect size in the Wilcoxon signed-rank test

The effect size indicates how large the observed effect is compared to the random noise. There are several measures to calculate the effect size in the Wilcoxon test. A common method is to use r, defined as:

Where z is the standardized test statistic value from the Wilcoxon test and n is the total number of observations (i.e., the sum of the sizes of both groups).

The value of r can range from -1 to 1, with values near 0 indicating that there is no effect and values near -1 or 1 indicating a strong effect. The sign of r indicates the direction of the effect.

The following table can be used to interpret the effect size (effect size r according to Cohen (1988)).

\|r\| < 0.1	no effect / very small effect
\|r\| = 0.1	small effect
\|r\| = 0.3	medium effect
\|r\| = 0.5	large effect

Example Wilcoxon-Test

A Wilcoxon test can easily be calculated with DATAtab. Simply copy the table below or your own data into the Statistical Calculator and click on Hypothesis tests Then click on the two variables and select Non-Parametric Test.

Reaction time morning	Reaction time evening
34	45
36	33
41	35
39	43
44	42
37	42
39	43
39	43
45	42

DATAtab then gives you the following result.

If you have more than two dependent variables, you can also easily calculate a Friedman test online. To do this, simply click on more than two metric variables.

Statistics made easy

many illustrative examples
ideal for exams and theses
statistics made easy on 412 pages
5rd revised edition (April 2024)
Only 7.99 €

Free sample

"Super simple written"

"It could not be simpler"

"So many helpful examples"