You use Fleiss Kappa whenever you want to know if the measurements of more than two people agree. The people who measure something are called raters.
In the case of Fleiss Kappa, the variable to be measured by the three or more rates is a nominal variable. Therefore, if you have a nominal variable, you use Fleiss Kappa.
If you had an ordinal variable and more than two raters you would use the Kendall's W and if you had a metric variable you would use the intra-class correlation. If you had only two raters and a nominal variable, you would use Cohen's Kappa.
But that's enough theory for now, let's take a look at an example.
Fleiss Kappa Example
Let's say you have developed a measuring instrument, for example a questionnaire, that doctors can use to determine whether a person is depressed or not.
Now you give the measuring instrument to doctors and let them assess 50 people with it. The big question is: how well do the doctors' measurements agree?
If the ratings of the raters agree very well, the inter-rater reliability is high.
And it is this inter-rater reliability that is measured by Fleiss Kappa. Fleiss Kappa is a measure of inter-rater reliability.
The Fleiss Kappa is a measure of how reliably three or more raters measure the same thing.
Fleiss Kappa with repeated measurement
So far we have considered the case where two or more people measure the same thing. However, Fleiss Kappa can also be used when the same rater makes the measurement at more than two different times.
In this case, Fleiss Kappa indicates how well the measurements of the same person match.
In this case, the variable of interest has two expressions, depressed and non-depressed; of course, the variable of interest may consist of more than two expressions.
Measure of the agreement:
Fleiss Kappa is a measure of the agreement between more than two dependent categorical samples.
Fleiss Kappa reliability and validity
It is important to note that Fleiss Kappa can only tell you how reliably the raters are measuring the same thing. It cannot tell you whether what the raters are measuring is the right thing!
So if all the raters measured the same thing, you would have a very high Fleiss Kappa. Fleiss Kappa does not tell you whether this measured value corresponds to reality, i.e. whether the correct value is measured!
In the first case we speak of reliability, in the second of validity.
Calculate Fleiss Kappa
With this equation we can calculate the Fleiss Kappa:
In this equation, po is the observed agreement of the raters and pe is the expected agreement of the raters. The expected agreement is given if the raters judge completely randomly, i.e. simply flip a coin for each patient to see whether they are depressed or not.
So how do we calculate po and pe? Let's start with pe Let's say we have 7 patients and three raters. Each patient has been assessed by each rater.
In the first step, we simply count how many times a patient was judged to be depressed and how many times they were judged not to be depressed.
For the first patient, 0 raters said that this person is not depressed and 3 raters said that this person is depressed. For the second person, one rater said that the person is not depressed and two said that the person is depressed.
Now we do the same for all the other patients and we can calculate the total for each one. In total we have 8 ratings with not depressed and 13 ratings with depressed. In total there were 21 ratings.
This allows us to calculate how likely a person is to be rated as not depressed or as depressed. To do this, we divide the number of ratings of depressed and not depressed by the total number of 21.
So we divide 8 by 21 to get 38% of the patients rated as not depressed by the raters and then we divide 13 by 21 to get 62% of the patients rated as depressed.
To calculate pe, we now square and sum the two values. So 0.382 plus 0.622 is 0.53.
Now we need to calculate po. po we can calculate with this formula, don't worry, it looks more complicated than it is.
Let's start with the first part. Capital N is the number of patients, so 7, and small n is the number of raters, so 3. This gives us 0.024 for the first part.
In the second part of the formula, we simply square each value in the table and add them up. So 02 plus 32 to finally 12 plus 22. This gives us 47.
And the third part is 7 times 3, which is 21. If we insert everything, we get 0.024 times 47 - 21, which is equal to 0.624.
So now we have po and pe. Putting them into the equation for kappa, we get a Kappa of 0.19.
Fleiss Kappa interpretation
Now, of course, the Fleiss Kappa coefficient must be interpreted. For this we can use the table from Landis and Kock (1977).
For a Fleiss Kappa value of 0.19, we get just a slight match.
Calculate Fleiss Kappa with DATAtab
With DATAtab you can easily calculate the Fleiss Kappa online. Simply go to datatab.net and copy your own data into the table at the Fleiss Kappa calculator. Now click on the Reliability tab. Under Reliability you can calculate different reliability statistics, depending on how many variables you click on and which scale level they have, you will get a suitable suggestion.
The Fleiss Kappa is calculated for nominal variables. If your data is recognised as metric, please change the scale level under Data View to nominal.
If you now click on Rater 1 and Rater 2, the Cohen's Kappa will be calculated, if you now click on Rater 3, the Fleiss Kappa will be calculated.
Below you can see the calculated Fleiss Kappa.
If you don't know how to interpret the result, just click on Interpretations in Words.
An inter-rater reliability analysis was performed between the dependent samples of Rater 1, Rater 2 and Rater 3. For this purpose, the Fleiss Kappa was calculated, which is a measure of the agreement between more than two dependent categorical samples.
The Fleiss Kappa showed that there was a slight agreement between the samples of Rater 1, Rater 2 and Rater 3 with κ= 0.16.
Statistics made easy
- many illustrative examples
- ideal for exams and theses
- statistics made easy on 276 pages
- 3rd revised edition (July 2023)
- Only 6.99 €
"Super simple written"
"It could not be simpler"
"So many helpful examples"