Weighted Cohen's kappa

Weighted Cohen's kappa is a measure of the agreement between two ordinally scaled samples and is used whenever you want to know if two people's measurements agree. The two people who measure something are called raters.

In the case of a "normal" Cohen's Kappa, the variable to be measured by the two raters is a nominal variable. With a nominal variable, the characteristics can be distinguished, but there is no ranking between the characteristics.

Cohen's kappa takes into account whether the two raters measured the same thing or not, but it does not take into account the degree of disagreement. What if you don't have a nominal variable but an ordinal variable?

If you have an ordinal variable, that is, a variable in which the characteristics can be ordered, then of course you want to take that order into account.

Let's say your expressions are dissatisfied, neutral, and satisfied. There is a smaller difference between dissatisfied and neutral than between dissatisfied and satisfied. If you want to take the size of the difference into account, you have to use the weighted Cohen's Kappa.

So if you have a nominal variable, you use Cohen's Kappa. If you have an ordinal variable, you use the weighted Cohen's kappa.

Reliability and validity

It is important to note that the weighted Cohen's Kappa can only tell you how reliably both raters are measuring the same thing. It cannot tell you whether what the two raters are measuring is the right thing!

So if both raters are pretty much always measuring the same thing, you would have a very high weighted Cohen's Kappa. However, the weighted Cohen's Kappa does not tell you whether this measurement corresponds to reality, i.e. whether the raters are measuring the right thing! In the first case we are talking about reliability. In the second case we speak of validity.

Calculating weighted Cohen's Kappa

How is the weighted Cohen's kappa calculated? Let's say two doctors have rated how satisfied they are with the therapeutic success of their patients. The doctors can answer with dissatisfied, neutral and satisfied.

Now you want to know how much agreement there is between the two doctors. Since we have an ordinal variable with the rank order dissatisfied, neutral and satisfied, we determine the agreement with the weighted Cohen's kappa.

The first step is to create a table with the frequencies of each response. We plot one rater on each axis. Here we have our two raters, each of whom rated whether they were dissatisfied, neutral or satisfied with a person's success.

Let's say a total of 75 patients have been evaluated. Now let's count how often each combination occurs. Let's say 17 times both raters are dissatisfied, 8 times rater 1 is dissatisfied and rater 2 is neutral, 4 times rater 1 is dissatisfied and rater 2 is satisfied and so on and so forth. For the ratings on the diagonal, both raters agree.

The weighted Cohen's kappa can be calculated using the following formula:

Where w are the weighting factors, fo are the observed frequencies, and fe are the expected frequencies. Instead of the frequencies, we could also use the calculated probabilities, i.e. the observed probabilities po and the expected probabilities pe.

If we calculated Cohen's kappa using probabilities rather than frequencies, we would simply divide each frequency by the number of patients, i.e. 75, and have the observed probabilities.

But we still need the weights and the expected frequencies. Let's start with the expected frequencies.

Calculate expected frequency

To calculate the expected frequency, we first calculate the sums of the rows and columns. So we simply add up all the rows and all the columns.

For example, in the first row we get a sum of 29 with 17 + 8 + 5. We now divide this by 75 of the total number of cases.

Weighted Cohens Kappa Expected Probabilities

We can now calculate the expected probability for each cell by multiplying the row probability by the column probability. So for the first cell we get 0.35 times 0.39 which is 0.13, for the second cell we get 0.44 times 0.39 which is 0.17.

Now, if we multiply each probability by 75, we get the expected frequencies.

Weighted Cohen's Kappa Expected Frequency

Calculate weighting matrix

If we did not use any weighting at all, our matrix would consist only of ones and zeros on the diagonal. If both raters gave the same answer, there would be a zero in the cell, otherwise there would be a one. It does not matter how far apart the raters are in their answers, if they answered something different it is weighted by 1.

The linear weighting matrix can be calculated using the following formula. Let i be the index for the rows and j for the columns. k is the number of expressions, in our case 3.

So now scores that are close together are weighted less than scores that are far apart.

Linear and quadratic weighting

What about quadratic weighting? If we use quadratic weighting instead of linear weighting, the distances are simply squared again. In this way, scores that are far apart are weighted even more heavily in relation to scores that are close together than in the linear case. The weighting matrix is then obtained with the following matrix.

Quadratic weighting Weighted Cohens Kappa

So we can now decide whether to use no weighting, linear weighting or quadratic weighting. We will continue with the linear weighting.

Calculate weighted kappa

We can now calculate the weighted kappa. We have the weighting matrix, the observed frequency and the expected frequency. Let's start with the sum in the figure below. We simply multiply each cell of the weighting matrix by the corresponding cell of the observed frequency and add them up. So 0 times 17 + 0.5 times 8 to finally 0 times 9.

We now do the same with the weighting matrix and the expected frequency. 0 times 10.05 plus 0.5 times 12.76 and finally 0 times 3.84. If we now calculate everything, we get a weighted kappa of 0.396.

Calculating Cohen's weighted kappa with DATAtab

Calculate the example online with DATAtab:

Load data set

To calculate weighted Cohen's Kappa online, simply go to the Statistics Calculator, copy your own data into this table, and click on the Reliability tab.

DATAtab automatically tries to assign the appropriate scale level to the data, in this case DATAtab assumes that the data are nominal. If we clicked on Rater 1 and Rater 2, DATAtab would calculate the unweighted normal Cohen's kappa. However, in our case these are ordinal variables. So we simply change the scale level to ordinal.

If we now click on both raters, the weighted Cohen's kappa is calculated. We can now choose whether we want linear or quadratic weighting. Here we see the cross table, which shows us how often each combination occurs. Then we get the results for the Cohen's kappa. With this data we get a weighted Cohen's kappa of 0.05.

If you're not sure how to interpret the results, you can click on Summary in Words: An inter-rater reliability analysis was performed between the dependent samples Rater1 and Rater2. This was done by calculating weighted Cohen's Kappa, which is a measure of the agreement between two related categorical samples. Weighted Cohen's Kappa showed that there was moderate agreement between the Rater1 and Rater2 samples with κ= 0.5.

Statistics made easy

many illustrative examples
ideal for exams and theses
statistics made easy on 412 pages
5rd revised edition (April 2024)
Only 7.99 €

Free sample

"Super simple written"

"It could not be simpler"

"So many helpful examples"