Datatab

Fleiss Kappa

You use Fleiss Kappa whenever you want to know if the measurements of more than two people agree. The people who measure something are called raters.

In the case of Fleiss Kappa, the variable to be measured by the three or more rates is a nominal variable. Therefore, if you have a nominal variable, you use Fleiss Kappa.

If you had an ordinal variable and more than two raters you would use the Kendall's W and if you had a metric variable you would use the intra-class correlation. If you had only two raters and a nominal variable, you would use Cohen's Kappa.

Fleiss Kappa for nominal data

But that's enough theory for now, let's take a look at an example.

Fleiss Kappa Example

Let's say you have developed a measuring instrument, for example a questionnaire, that doctors can use to determine whether a person is depressed or not.

Now you give the measuring instrument to doctors and let them assess 50 people with it. The big question is: how well do the doctors' measurements agree?

Fleiss Kappa Example

If the ratings of the raters agree very well, the inter-rater reliability is high.

And it is this inter-rater reliability that is measured by Fleiss Kappa. Fleiss Kappa is a measure of inter-rater reliability.

Definition:

The Fleiss Kappa is a measure of how reliably three or more raters measure the same thing.

Fleiss Kappa with repeated measurement

So far we have considered the case where two or more people measure the same thing. However, Fleiss Kappa can also be used when the same rater makes the measurement at more than two different times.

In this case, Fleiss Kappa indicates how well the measurements of the same person match.

Fleiss Kappa dependent sample

In this case, the variable of interest has two expressions, depressed and non-depressed; of course, the variable of interest may consist of more than two expressions.

Measure of the agreement:

Fleiss Kappa is a measure of the agreement between more than two dependent categorical samples.

Fleiss Kappa reliability and validity

It is important to note that Fleiss Kappa can only tell you how reliably the raters are measuring the same thing. It cannot tell you whether what the raters are measuring is the right thing!

Reliability and validity Fleiss Kappa

So if all the raters measured the same thing, you would have a very high Fleiss Kappa. Fleiss Kappa does not tell you whether this measured value corresponds to reality, i.e. whether the correct value is measured!

In the first case we speak of reliability, in the second of validity.

Calculate Fleiss Kappa

With this equation we can calculate the Fleiss Kappa:

Calculate Fleiss Kappa

In this equation, po is the observed agreement of the raters and pe is the expected agreement of the raters. The expected agreement is given if the raters judge completely randomly, i.e. simply flip a coin for each patient to see whether they are depressed or not.

So how do we calculate po and pe? Let's start with pe Let's say we have 7 patients and three raters. Each patient has been assessed by each rater.

In the first step, we simply count how many times a patient was judged to be depressed and how many times they were judged not to be depressed.

Fleiss Kappa pe calculation

For the first patient, 0 raters said that this person is not depressed and 3 raters said that this person is depressed. For the second person, one rater said that the person is not depressed and two said that the person is depressed.

Now we do the same for all the other patients and we can calculate the total for each one. In total we have 8 ratings with not depressed and 13 ratings with depressed. In total there were 21 ratings.

This allows us to calculate how likely a person is to be rated as not depressed or as depressed. To do this, we divide the number of ratings of depressed and not depressed by the total number of 21.

So we divide 8 by 21 to get 38% of the patients rated as not depressed by the raters and then we divide 13 by 21 to get 62% of the patients rated as depressed.

To calculate pe, we now square and sum the two values. So 0.382 plus 0.622 is 0.53.

Now we need to calculate po. po we can calculate with this formula, don't worry, it looks more complicated than it is.

Fleiss Kappa po Calculation

Let's start with the first part. Capital N is the number of patients, so 7, and small n is the number of raters, so 3. This gives us 0.024 for the first part.

In the second part of the formula, we simply square each value in the table and add them up. So 02 plus 32 to finally 12 plus 22. This gives us 47.

And the third part is 7 times 3, which is 21. If we insert everything, we get 0.024 times 47 - 21, which is equal to 0.624.

So now we have po and pe. Putting them into the equation for kappa, we get a Kappa of 0.19.

Fleiss Kappa Equation

Fleiss Kappa interpretation

Now, of course, the Fleiss Kappa coefficient must be interpreted. For this we can use the table from Landis and Kock (1977).

Interpret Fleiss Kappa

For a Fleiss Kappa value of 0.19, we get just a slight match.

Calculate Fleiss Kappa with DATAtab

With DATAtab you can easily calculate the Fleiss Kappa online. Simply go to datatab.net and copy your own data into the table at the Fleiss Kappa calculator. Now click on the Reliability tab. Under Reliability you can calculate different reliability statistics, depending on how many variables you click on and which scale level they have, you will get a suitable suggestion.

The Fleiss Kappa is calculated for nominal variables. If your data is recognised as metric, please change the scale level under Data View to nominal.

If you now click on Rater 1 and Rater 2, the Cohen's Kappa will be calculated, if you now click on Rater 3, the Fleiss Kappa will be calculated.

Below you can see the calculated Fleiss Kappa.

Calculate Fleiss Kappa online

If you don't know how to interpret the result, just click on Interpretations in Words.

An inter-rater reliability analysis was performed between the dependent samples of Rater 1, Rater 2 and Rater 3. For this purpose, the Fleiss Kappa was calculated, which is a measure of the agreement between more than two dependent categorical samples.

The Fleiss Kappa showed that there was a slight agreement between the samples of Rater 1, Rater 2 and Rater 3 with κ= 0.16.


Statistics made easy

  • many illustrative examples
  • ideal for exams and theses
  • statistics made easy on 412 pages
  • 5rd revised edition (April 2024)
  • Only 7.99 €
Free sample
Datatab

"Super simple written"

"It could not be simpler"

"So many helpful examples"

Cite DATAtab: DATAtab Team (2024). DATAtab: Online Statistics Calculator. DATAtab e.U. Graz, Austria. URL https://datatab.net

Contact & Support FAQ & About Us Privacy Policy Statistics Software