Survival Analysis

This tutorial is about survival analysis (Time-to-Event analysis). We start with the question of what survival analysis is, then come to the important point of what censoring means, and then discuss the Kaplan-Meier curve, the log-rank test and Cox regression (more on these in the separate tutorials).

What is a survival time analysis?

Survival time analysis is a group of statistical methods in which the variable under study is the time until an event occurs. What does "time to occurrence of an event" mean?

Survival time analysis considers a variable that has a start time and, when a particular event occurs, an end time. The time between the start time and the event is the focus of survival analysis. For example, time may be measured in days, weeks or months.

Use cases for survival time analysis

An example would be to look at the time between a drug withdrawal and the person's relapse. The start time would then be the end of the withdrawal and the event considered would be the relapse. For example, you might be interested in whether different types of treatment have an effect on the time to relapse.

As the name "survival time analysis" implies, there is also a classic example: the time to death after a disease. Here, the start time is the recognition of the disease and the end time is death. It is often of great interest to know whether a particular drug has an effect on survival time.

Of course, the event does not have to be a negative one - you could look at the time it took to return to work after a burnout, for example.

Moreover, the object under investigation need not be a human being. In engineering, for example, a common question is how long a component will last in a test without failing. In this case, different parameters could be varied to see if they have an effect on the object's survival time.

Survival time (Time-to-Event)

The time considered may have nothing to do with the actual "survival time", but it is still called survival time and survival time analysis.

Survival Time Analysis Example

How exactly is a survival analysis performed? Let us look at an example. Let's say you are a dental technician and you want to analyse the "survival time" of a filling in a tooth.

So your start time is the moment a person goes to the dentist for a filling. The end time, or event, is the moment when the filling breaks out. You are now interested in the time between these two events.

First, of course, you need subjects so that you have data to evaluate. For each test person you can now note down the time that passes until the filling breaks out.

You will probably ask yourself the question: What happens if the filling of a test person does not break out at all? Or what happens if a person moves, changes dentists, and it is simply not known when the filling will break out?

All these cases are summarised under the term "censoring". Let's have a look at what it means.

Censored data

First of all, it is important to remember that a study cannot go on indefinitely, but is limited in time. For reasons of resources (time, money, etc.) and simply because you want to publish the results at some point, each study has a clear start and end date.

If a filling is inserted within this time period and then the filling breaks out again within this time period and this is also documented, then there is a valid case. The event has occurred.

However, it is also possible that a filling is inserted and then the end of the study is reached before the event occurs. Or it can happen that a subject decides not to continue with the study. In both cases, you do not know when or if the event under consideration has occurred.

Further another event can occurs, that is not considered in the study. For example, the patient could die or even lose the whole tooth. In both cases, the event considered, that the filling breaks out, can no longer occur.

It can also happen that the patient does not notice that the filling has broken out and it is only discovered at the next routine check-up.

All in all, there are many cases where data is not fully available. This data is called "censored data". You will learn how to deal with this data in the Kaplan-Meier curve tutorial. Now let's look at the most common methods of survival analysis.

Methods of survival time analysis

The three most common methods of survival time analysis are (1) the Kaplan Meier survival time curves, (2) the log rank test, and (3) Cox regression.

We will now briefly cover all three of these areas, and then I will show you how to easily calculate these methods online using DATAtab. For each of the three methods there is a detailed separate tutorial with calculation examples.

Kaplan Meier survival time curves

The Kaplan Meier curve is used to graphically represent the survival rate or survival function. Here, time is plotted on the x-axis and survival rate is plotted on the y-axis.

What is the survival rate? At this point, we go back to the tooth filling example. Suppose we have collected data on how long it takes for a filling to erupt. In the Kaplan Meier curve, you can now read how likely it is that a filling will last longer than a certain time.

In this context, you might be interested, for example, in the probability that your filling will last longer than 5 years. To do this, simply move to 5 years on the x-axis of the graph and see what the survival rate (y-axis) is. At 5 years, the Kaplan Meier curve gives you a value of 0.7.

So it is 70% likely that a filling will last longer than 5 years. Of course, the data are purely fictitious. If you are interested in how the Kaplan Meier curve is created from existing data, please watch my video. Now you might be interested in whether this curve differs for different filling materials, e.g. whether one filling material is better than another. To answer this question the Log Rank Test will help you.

Median survival, which is the time that half of the subjects survive, can be a useful summary measure and is often reported in research reports. This can be read from the Kaplan-Meier curve as long as the curve dips below the '0.50 survival' point on the y-axis, which is not the case for these data.

In addition, median survival is often reported in research reports. Median survival is the time that half of the subjects survive. This can be seen on the Kaplan-Meier curve when the curve dips below the '0.50 survival' point on the y-axis.

Full tutorial Kaplan Meier curve

Log Rank Test

The Log Rank Test compares the distribution of the time until an event occurs of two or more independent samples. For example, you might be interested in whether there is a difference in the survival time of two different materials. In this example, you use material A for half of the subjects and material B for the other half.

The Log Rank Test now gives you an answer to the following question: Is there a significant difference between the two curves? Or in other words: Does the filling material have an influence on the "survival time" of the filling?

The null hypothesis is therefore There is no tendency for one group to survive less than the other.

Full tutorial Log Rank Test

Cox Regression

What if you now want to check if there are other parameters that influence the curve? Let's say you want to know not only whether the material has an influence on the survival time, but also whether the age of the subjects/proband influences it. To answer this question, Cox regression is the appropriate method.

Full tutorial Cox Regression

Calculate survival time analysis with DATAtab

Load example data

With DATAtab you can easily calculate a survival time analysis online. Just go to the (1) Survival Analysis Calculator, (2) copy your own data into the table, and (3) click on "Plus" and then on Survival Analysis.

In the example above, once we have a column with the "time", then a column that tells us whether the "event occurred" or not, so the case is censored or not. Here 1 stands for "occurred" and 0 for "censored". Then we have the variable "Material" with the two materials A and B and we have the "Age". Depending on what you click here, the appropriate methods will be calculated.

If you select only the variable "Time", the Kaplan-Meier Survival Curve will be displayed and you will get the corresponding survival time table. If no variable is specified with the status, the calculation assumes that no case is censored. If this is not the case, you can simply click on the variable "Status", which contains the information about whether the event has occurred or not.

If you now select another factor, e.g. the "Material", the log-rank test will be calculated. Then you get the null and the alternative hypothesis as well as the p-value for the long rank test.

The null hypothesis is: There is no difference between groups A and B in the distribution of the time until the event occurs.

If you go further down in the results section, you will find the p-value. If you don't know exactly how this is interpreted, you can simply click on "Summary in words":

A log-rank test was calculated to see if there was a difference between groups A and B in terms of the distribution of time until the event occurs.

For the present data, the log-rank test showed that there is a difference between the groups in terms of the distribution of the time until the event occurs, p=<0.001. The null hypothesis is thus rejected.

On the other hand, in the case where the "material" and the "age" were selected, the Cox regression is calculated. Then you can read whether the factors have a significant influence or not. You can find more information about this in the Cox Regression tutorial.

Statistics made easy

many illustrative examples
ideal for exams and theses
statistics made easy on 412 pages
5rd revised edition (April 2024)
Only 7.99 €

Free sample

"Super simple written"

"It could not be simpler"

"So many helpful examples"