Wilcoxon signed-rank test
Author: Dr. Mathias Jesussek
Medical example dataThe Wilcoxon test (Wilcoxon signed-rank test) determines whether two dependent groups differ significantly from each other. To do this, the Wilcoxon test uses the ranks of the groups instead of the mean values.

The Wilcoxon test is a non-parametric test and therefore has fewer assumptions than its parametric counterpart, the paired samples t-test. Thus, when the assumptions for the dependent samples t-test are not met, the Wilcoxon test is used instead.
Medical example
Load DataComparing Pain Levels Before and After Treatment
A study measures patients' pain (1–10 scale) before and after medication. Since pain scores may not be normally distributed, the Wilcoxon Signed-Rank Test compares pre- and post-treatment levels.
Technical example
Load DataBattery Life Before and After a Software Update
An engineer tests battery life on the same set of devices before and after a software update. Since battery performance may not be normally distributed, the Wilcoxon Signed-Rank Test determines if the update significantly affects battery life.
Assumptions Wilcoxon test
The Wilcoxon Signed-Rank Test is a great non-parametric alternative to the paired t-test, especially when the normality assumption is violated. However, this assumption must be met to perform a Wilcoxon test:
Repeat measurement
The test is used for paired or dependent samples, meaning the same subjects or units are measured before and after an intervention or under two different conditions. So a characteristic of a person, e.g. weight, was measured at two points in time
Metric or Ordinal Data
The data should be at least ordinal or metric (e.g., measurements like pain levels, reaction times, or weights). It cannot be used for nominal (categorical) data.
Symmetric Distribution of Differences
The differences between paired observations should be symmetrically distributed around the median. Unlike the paired t-test, the Wilcoxon test does not require normality but works best when the distribution is roughly symmetric.
No Significant Outliers in the Differences
Extreme outliers can affect the ranking process, reducing the reliability of the test.
Random Sampling
The sample should be randomly selected from the population to ensure unbiased results.
If the data are not available in pairs, the Mann-Whitney U test is used instead of the Wilcoxon test.
Hypotheses in the Wilcoxon test
The hypotheses of the Wilcoxon test are very similar to the hypotheses of the dependent t-test. However, in the case of the Wilcoxon test, the test is whether there is a difference in the central tendency; in the case of the t-test, the test is whether there is a difference in the mean. Thus, the Wilcoxon test test results in:
Null hypothesis
There is no difference (in terms of central tendency) between the two groups in the population.
Alternative hypothesis
There is a difference (with respect to the central tendency) between the two groups in the population.
Wilcoxon test and test power
One might wonder: why not always use the Wilcoxon test instead of the t-test for dependent samples? That way, there’s no need to check for normality!
The reason is that parametric tests, like the t-test, are generally more powerful. They typically require a smaller difference or a smaller sample size to detect a significant effect, making them more efficient. Therefore, whenever possible, opt for parametric tests!
Calculate Wilcoxon test
Load Example DataTo perform the Wilcoxon test for two dependent samples, first, calculate the differences between the paired values. Then, take the absolute values of these differences and rank them accordingly. It is crucial to retain the original signs of the differences throughout the process. (An example with tied ranks follows.)

In the final step, the rank sums are computed separately for the positive and negative differences.

The test statistic W is then calculated using the sum of positive ranks. Note that there are different ways to calculate W; sometimes, the maximum or minimum value of T+ and T− is used for W.

In this example, the test statistics W results in 13. If there is no difference between the two dependent samples, the expected value can be calculated using the following formula:

Next, the test statistic W is compared to the expected value by calculating the standardized test statistic z. For this we need the Standard deviation.

Now we have everything needed to calculate the z value.

We can now determine whether there is a significant difference between the two groups by calculating the corresponding p-value for the z statistic.
Note: This approach is typically valid when the sample size is greater than 25, ensuring the distribution approximates normality.
Since we are testing a two-sided hypothesis, we multiply the p-value of 0.3 by 2, resulting in a final p-value of 0.6.
Continuity Correction
Load Example DataMany statistical software programs, such as DATAtab, apply a so-called continuity correction in the normal approximation for the p-value. As a result, the p-value may vary slightly.

And here is the entire calculation workflow presented in a single figure:

Calculate Wilcoxon signed-rank test with tied ranks
Load Example DataIf several people share a rank, connected ranks are present. In this case, there is a change in the calculation of the rank sums and the standard deviation of the W-value. We will now go through both using an example.
In the example it can be seen that there are...
- ...three people who have a difference in amount of two, these people share the ranks 2, 3 and 4.
- ...two people who have a difference in amount of 4, these people share the ranks 6 and 7.

To account for these connected ranks, the mean values of the joined ranks are calculated in each case. In the first case, this results in a "new" rank of 3 and in the second case in a "new" rank of 6.5. Now we can calculate the rank sums of the positive and negative ranks.

Since the rank ties are clearly visible in the upper table, a term is calculated here that is needed for the later calculation of the W-value in the presence of rank ties. Now all values are available to calculate the z-value considering connected ranks.

Again, noting that you actually need about 20 cases to assume normal distribution of W values.
Effect size in the Wilcoxon signed-rank test
The effect size indicates how large the observed effect is compared to the random noise. There are several measures to calculate the effect size in the Wilcoxon test. A common method is to use r, defined as:

Where z is the standardized test statistic value from the Wilcoxon test and n is the total number of observations (i.e., the sum of the sizes of both groups).
The value of r can range from -1 to 1, with values near 0 indicating that there is no effect and values near -1 or 1 indicating a strong effect. The sign of r indicates the direction of the effect.
The following table can be used to interpret the effect size (effect size r according to Cohen (1988)).
|r| < 0.1 | no effect / very small effect |
---|---|
|r| = 0.1 | small effect |
|r| = 0.3 | medium effect |
|r| = 0.5 | large effect |
Example Wilcoxon-Test
A Wilcoxon test can easily be calculated with DATAtab. Simply copy the table below or your own data into the Statistical Calculator and click on Hypothesis tests Then click on the two variables and select Non-Parametric Test.
Reaction time morning | Reaction time evening |
---|---|
34 | 45 |
36 | 33 |
41 | 35 |
39 | 43 |
44 | 42 |
37 | 42 |
39 | 43 |
39 | 43 |
45 | 42 |
DATAtab then gives you the following result.

If you have more than two dependent variables, you can also easily calculate a Friedman test online. To do this, simply click on more than two metric variables.
Statistics made easy
- many illustrative examples
- ideal for exams and theses
- statistics made easy on 454 pages
- 6th revised edition (March 2025)
- Only 8.99 €

"Super simple written"
"It could not be simpler"
"So many helpful examples"