Cohens Kappa is a measure of the agreement between two dependent categorical samples and you use it whenever you want to know if the measurement of two Raters agree.
In the case of Cohen's kappa, the variable to be measured by the two rates is a nominal variable.
Therefore, if you have a nominal variable and you want to know how high the agreement of 2 raters is, you use the Cohens Kappa. If you had an ordinal variable and two raters, you would use Kendall's tau or the weighted Cohens Kappa and if you had a metric variable, you would use Pearson's correlation. If you have more than two nominal dependent samples, the Fleiss Kappa is used.
Cohen's Kappa Example
Let's say you have developed a measuring instrument, for example a questionnaire, with which doctors can determine whether a person is depressed or not. Now you give this measuring instrument to a doctor and have her evaluate 50 people with it.
For example, your method shows that the first person is depressed, the second person is depressed, and the third person is not depressed. The big question now is: Does a second doctor come to the same conclusion?
So, with a second doctor, the result could now look like this: For the first person, both doctors come to the same result, but for the second person, the result differs. So you're interested in how big the agreement of the doctors are, and this is where the Cohens Kappa comes in.
When the assessments of the two physicians agree very well, one speaks of a high inter-rater reliability. And it is precisely this inter-rater reliability that is measured by the Cohens Kappa
The Cohens kappa is a measure of inter-rater reliability.
The Cohens Kappa is thus a measure of how reliably two raters measure the same thing.
Use cases of Cohen's Kappa
So far, we have considered the case where two people measure the same thing. However, the Cohens Kappa can also be used when the same rater takes the measurement at two different times.
In that case, the Cohen's kappa score indicates how well the two measurements of the same person agree.
Measurement of the agreement
The Cohens Kappa is a measure of the agreement between two dependent categorical samples.
Cohen's Kappa Reliability and validity
It is important to note that with the Cohens Kappa coefficient you can only make a statement about how reliably both raters measure the same thing. But you cannot make a statement about whether what the two raters measure is the right thing!
In the first case we speak of reliability (whether both measure the same thing) and in the second case we speak of validity (whether both measure the right thing). The Cohens Kappa can only be used to measure reliability.
Calculate Cohens Kappa
Now the question arises, how is Cohen's Kappa calculated? This is not difficult! For this, we create a table with the frequencies of the respective answers.
For this we take our two raters, each of whom has rated whether a person is depressed or not. Now we count how often both have measured the same and how often not.
So we create a table with Rater 1 with "not depressed" and "depressed" and Rater 2 with "not depressed" and "depressed". Now we simply keep a tally sheet and count how often each combination occurs.
Let's assume our final result is as follows: 17 people rated both raters as "not depressed." For 19 people, both chose the rating "depressed."
Therefore, if both raters measured the same thing, this person is on the diagonal, if something different was measured, the person is on the edge here. Now we want to know how often both raters agree and how often they don't.
Rater 1 and Rater 2 agree that 17 patients are not depressed and 19 are depressed. So both raters agree in 36 cases. In total, 50 people were assessed.
With these numbers, we can now calculate the probability that both raters measure the same thing in a person. We calculate this by dividing 36 by 50. We arrive at the following result: In 72% of the cases, both raters assess the same in 28% of the cases differently.
This gives us the first part we need to calculate Cohen's Kappa. Cohen's Kappa is given by this formula:
So po we have just calculated, what is now pe?
If both doctors were to answer purely by chance, simply flipping a coin, so to speak, as to whether a person is depressed or not, they would certainly also come to the same conclusion in some cases, purely by chance.
And that is exactly what pe indicates: The hypothetical probability of a random match. But how do you calculate pe?
To calculate pe, we first need the sums of the rows and columns. With this we can now calculate the pe.
In the first step, we calculate the probability that both raters would randomly arrive at the rating "not depressed."
- Rater 1 rated 25 out of 50 people as "not depressed", i.e. 50%.
- Rater 2 rated 23 out of 50 people as "not depressed", i.e. 46%.
The overall probability that both raters would say "not depressed" by chance is: 0.5 * 0.46 = 0.23
In the second step, we calculate the probability that the raters would both say "depressed" by chance.
- Rater 1 says "depressed" in 25 out of 50 persons, i.e. 50%.
- Rater 2 says "depressed" in 27 out of 50 people, i.e. 54%.
The total probability that both raters say depressed by chance is: 0.5 * 0.54 = 0.27. With this we can now calculate pe
If both values are now added, we get the probability that the two raters coincidentally agree. pe is therefore 0.23 + 0.27 which is equal to 0.50. Therefore, if the doctors had no guidance and simply rolled the dice, the probability of such a match is 50%.
Now we can calculate the Cohen's kappa coefficient. We simply substitute po and pe and we get a kappa score of 0.4 in our example.
By the way, in po the o stands for "observed". And in pe, the e stands for "expected". Therefore, po is what we actually observed and pe would be what we would expect if it were purely random.
Cohen's Kappa interpretation
Now, of course, we would like to interpret the calculated Cohens Kappa coefficient. For this purpose, the table of Landis & Koch (1977) can be used as a guide.
Therefore, the just calculated Cohens Kappa coefficient of 0.44 represents moderate reliability or agreement.
Weighted Cohen's Kappa
Cohen's Kappa takes into account the agreement between two raters, but it is only relevant whether both raters measure the same or not. If an ordinal variable is present, i.e. a variable with a ranking such as school grades, it is of course desirable if the gradations are also considered. A difference between "very good" and "satisfactory" is larger than between "very good" and "good".
To take this into account, the weighted kappa can be calculated. Here, the deviation is included in the calculation. The differences can be taken into account linearly or quadratically.
Calculate Cohen's Kappa with DATAtab
And now I'll show you how you can easily calculate the Cohen's Kappa for your data online with DATAtab.
Just go to the Cohen's Kappa calculator and copy your own data into the table. Now click on the tab "Reliability".
Now you just have to click on the variables you want to analyze and the Cohen's Kappa will be displayed automatically. First you see the crosstab and then you can read the the calculated Cohen's Kappa coefficient. If you don't know how to interpret the result, just click on Interpretations in words.
An inter-rater reliability analysis was performed between the dependent samples Rater1 and Rater2. For this, Cohens Kappa was calculated, which is a measure of the agreement between two related categorical samples. The Cohens Kappa showed that there was moderate agreement between the samples Rater1 and Rater2 with κ= 0.23.
Statistics made easy
- Many illustrative examples
- Ideal for exams and theses
- Statistics made easy on 251 pages
- Only 6.99 €
"Super simple written"
"It could not be simpler"
"So many helpful examples"