Kaplan Meier Curve
This tutorial is about the Kaplan Meier curve. We will discuss what the Kaplan Meier curve is and how you can create it.
The Kaplan Meier curve is used to graphically represent the survival rate or survival function. Here, time is plotted on the x-axis and survival rate on the y-axis.
First, the question arises what the survival rate is. Let's look at it with an example. Let's say you are a dental technician and you want to study the "survival time" of a filling in a tooth.
So your start time is that moment when a person gets a filling at the dentist's office and the end time, the event, is the moment when that filling breaks out. The time between these two events is the focus of your study.
In the Kaplan Meier curve you can now read how likely it is that a filling will last longer than up to a certain point in time.
In this context, you might be interested, for example, in the probability that your filling will last longer than 5 years. For this, you read off the value 5 years in the graph, how high the survival rate is. At 5 years, the Kaplan Maier curve gives you a value of 0.7. It is therefore 70% likely that a filling will last longer than 5 years.
This brings us to the question of how you can create a Kaplan Meier curve based on your data.
Calculating the Kaplan-Meier curve
To create a Kaplan-Meier curve, you first need the data of your test subjects.
Let's assume that the filling lasted 3 years for the first test person, 4 years for the second test person, 4 years for the third test person, and so on.
If you had several factors that you wanted to check if they had an influence on the curve, you could click on them here and the Log Rank Test or Cox Regression would be calculated.
Let's first assume that none of the cases are "censored". The data is already arranged so that the smallest survival is at the top and the largest is at the bottom.
Now we create a second table, which we can use to draw the Kaplan Maier curve.
For this, we look at which times occur in this table and also add the time zero. So we have 0, then 3, 4, 6, 7, 8 11 and 13. In total we have 10 subjects.
Now we look at how many fills break out at which time. We enter this in the column m. So at time 0, zero fillings have broken out. After 3 years one filling has broken out, after 4 years two fillings have broken out, after 6 years one filling has broken out. We now do this for all other time points as well.
Next, we look at the number of cases that have survived to the time plus those cases where the event occurs at that exact time. We enter this in the column n.
So n is the number of cases that survived until that time plus the people who drop out at that exact time.
After zero years we still have all 10 persons. After 3 years, we get 10 for n, 9 people still have their fill intact, and one person has their fill broken out exactly after 3 years.
The easiest way to get n is to calculate the previous n value minus the previous m value. So we get 10 - 1 is equal to 9. Then 9 minus 2 is equal to 7, 7 - 1 is equal to 6... and so on and so forth.
From the column n we can now calculate the survival rates. For this we simply divide the value n by the total number, i.e. 10.
Thus we get 10 by 10 is equal to 1, 9 by 10 is equal to 0.9, 7 by 10 is equal to 0.7. We now do the same for all others.
Draw Kaplan Meier curve
Now we can draw the Kaplan Meier curve. At time 0 we have a value of 1, after 3 years we have a value of 0.9 or 90 percent. After 4 years we get 0.7 after 6 years 0.6 and so on and so forth.
We can now plot these values. At zero we have one, at three we have 0.9, at 4 we have 0.7, at 6 we have 0.6.
We can now read off from the Kaplan Meier curve what percentage of the filling has not yet broken out after a certain time.
Next, we look at what to do when censored data is present.
For this purpose, censored data has been added as an example in these three places. If you don't know exactly what censored data is, feel free to check out the survival analysis tutorial.
We now need to incorporate this data into our table for the Kaplan Meier curve.
This is how we do it: We create our m exactly the same as before, looking at how many cases failed at each time.
Now we add column q, in column q we enter how many cases were censored at the respective time.
Note that the time when the respective censored case occurred does not get its own row, but is assigned to the previous time.
Let's take a look at this case. The censoring has taken place at time 9. In this table, however, there is no time point with nine years and it is also not led to. The respective person is added at time 8.
Now we can again calculate the values for the survival time curve. If censored data are available, this is a little more complex.
For this we write down the values in the first step. We get these values by calculating n-m/n. So in this row for example we get with 12-2 by 12 the value 10/12.
The calculation of the actual value is done iteratively. To do this, we multiply the result from the previous row by the value we just calculated.
So, in the first row we get 1, now we calculate 12/13 times 1, which is equal to 0.923. In the next row we calculate 10/12 times 0.923 and get a value of 0.769. We take this value again for the next row.
We now do this for all rows. Afterwards we can draw the Kaplan Meier curve in the same way as before with this data.
Now it could be that there are two different materials for the filling and you want to check if the material has an influence on the survival time. If you want to know how to do this, feel free to watch my next video on the Log Rank Test.
Create Kaplan Meier curve with DATAtab
To create the Kaplan Meier curve with DATAtab, simply go to the statistics calculator on datatab.de and copy your own data into the table.
Now click on "Plus" and select Survival Analysis. Here you can create the Kaplan Meier curve online. If you select the variable "Time" DATAtab will create the Kaplan Meier curve and you will get the survival table. If you do not click on a status, Datatab assumes that the data is not censored. If this is not the case, click also on the variable that contains the information which case is censored and which is not. One stands for event occurred and 0 stands for censored. Now you will get the appropriate results.
Statistics made easy
- Many illustrative examples
- Ideal for exams and theses
- Statistics made easy on 251 pages
- Only 6.99 €
"Super simple written"
"It could not be simpler"
"So many helpful examples"