# Survival Analysis

This tutorial is about survival time analysis. We start with the question what a survival analysis is, then we come to the important point what the censoring of data means and afterwards we briefly discuss the Kaplan Maier curve, the log rank test and the Cox regression (more about this in the separate tutorials).

## What is a survival time analysis?

Survival time analysis is a group of statistical methods in which the variable under study is the time until an event occurs.

What does "time to occurrence of an event" mean?

In survival time analysis, a variable is considered that has a start time and, when a certain event occurs, an end time. The time between the start time and the event is the focus of the survival time analysis. For example, time can be measured in days, weeks or months.

## Use cases of the survival time analysis

An example would be to look at the time between a drug withdrawal and the relapse of the person in question. The start time would then be the end of withdrawal and the event considered would then be the relapse. For example, you might be interested in whether different types of treatment have an impact on the time to relapse.

As the name "survival time analysis" implies, there is also a classic example: the time until death after a disease. Here, the start time is the recognition of the disease and the end time is death. Of great interest then is often whether a certain drug has an influence on the survival time.

The event does not have to be a negative event, of course, you could also look at the time to return to work after a burnout, for example.

Furthermore, the object under investigation does not have to be a person either. In engineering, for example, a common question is how long a component will last in a test without failing. Here, one could then tweak various parameters and see if they have an effect on the survival time of the object.

## Survival time

The considered time does not have to have anything to do with the actual "survival time", nevertheless one speaks of the survival time and the survival time analysis. The next question is how exactly a survival time analysis is performed. We will now take a look at an example.

## Survival Time Analysis Example

Let's assume you are a dental technician and want to analyze the "survival time" of a filling in a tooth.

So your start time is that moment when a person gets a filling at the dentist. The end time, i.e. the event, is the moment when this filling breaks out. You are now interested in the time between these two events.

First, of course, you need subjects so that you have data to evaluate. For each test person you can now note down the time that passes until the filling breaks out.

You will probably ask yourself the question: What happens if the filling of a test person does not break out at all? Or what happens if a person moves, changes dentists, and it is simply not known when the filling will break out?

All these cases are summarized under the term "censoring". Now let's take a look at what exactly is meant by this.

## Censored data

First of all, it is important to remember that a study cannot last indefinitely, but extends over a limited period of time. For resource reasons (time, financial, etc.) and simply because you want to publish the results at some point, each study has a clear start and end date.

If a filling is inserted within this time period and then the filling also breaks out again within this time period and this is also documented, there is a valid case. The event has occurred.

However, it is also possible that a filling is inserted and then the end of the study is reached before the event occurs.

Or it can happen that a subject decides not to continue with the study. In both cases, you do not know when or if the event under consideration has occurred.

It can also happen that another event occurs, which is not considered in the study. For example, the patient could die or even lose the entire tooth. In both cases, the event considered, that the filling breaks out, can no longer occur.

Of course, it can also happen that the person does not realize that the filling has broken out and this is only discovered during the next routine examination.

All in all, there are a lot of cases where data cannot be completely available. This data is then called "censored data". You will learn how to deal with this data in the tutorial on the Kaplan Meier curve. Now let's take a look at the most popular methods of survival time analysis.

## Methods of survival time analysis

The three most common methods of survival time analysis are (1) the Kaplan Meier survival time curves, (2) the log rank test, and (3) Cox regression.

We will now briefly cover all three of these areas, and then I will show you how to easily calculate these methods online using DATAtab. For each of the three methods there is a detailed separate tutorial with calculation examples.

### Kaplan Meier survival time curves

The Kaplan Meier curve is used to graphically represent the survival rate or survival function. Here, time is plotted on the x-axis and survival rate is plotted on the y-axis.

What is the survival rate? At this point, we go back to the tooth filling example. Suppose we have collected data on how long it takes for a filling to erupt. In the Kaplan Meier curve, you can now read how likely it is that a filling will last longer than a certain time.

In this context, you might be interested, for example, in the probability that your filling will last longer than 5 years. To do this, simply move to 5 years on the x-axis of the graph and see what the survival rate (y-axis) is. At 5 years, the Kaplan Maier curve gives you a value of 0.7.

So it is 70% likely that a filling will last longer than 5 years. Of course, the data are purely fictitious. If you are interested in how the Kaplan Maier curve is created from existing data, please watch my video. Now you might be interested in whether this curve differs for different filling materials, e.g. whether one filling material is better than another. To answer this question the Log Rank Test will help you.

### Log Rank Test

The Log Rank Test compares the distribution of the time until an event occurs of two or more independent samples. For example, you might be interested in whether there is a difference in the survival time of two different materials. In this example, you use material A for half of the subjects and material B for the other half.

The Log Rank Test now gives you an answer to the following question: Is there a significant difference between the two curves?

Or in other words: Does the filling material have an influence on the "survival time" of the filling?

### Cox Regression

What if you now want to check if there are other parameters that influence the curve? Let's say you want to know not only whether the material has an influence on the survival time, but also whether the age of the subjects/proband influences it. To answer this question, Cox regression is the appropriate method.

## Calculate survival time analysis with DATAtab

With DATAtab you can easily calculate a survival time analysis online. Just go to the (1) Survival Analysis Calculator, (2) copy your own data into the table, and (3) click on "Plus" and then on Survival Analysis.

In the example above, once we have a column with the "time", then a column that tells us whether the "event occurred" or not, so the case is censored or not. Here 1 stands for "occurred" and 0 for "censored". Then we have the variable "Material" with the two materials A and B and we have the "Age". Depending on what you click here, the appropriate methods will be calculated.

If you select only the variable "Time", the Kaplan-Meier Survival Curve will be displayed and you will get the corresponding survival time table. If no variable is specified with the status, the calculation assumes that no case is censored. If this is not the case, you can simply click on the variable "Status", which contains the information about whether the event has occurred or not.

If you now select another factor, e.g. the "Material", the log-rank test will be calculated. Then you get the null and the alternative hypothesis as well as the p-value for the long rank test.

The null hypothesis is: There is no difference between groups A and B in the distribution of the time until the event occurs.

If you go further down in the results section, you will find the p-value. If you don't know exactly how this is interpreted, you can simply click on "Summary in words":

A log-rank test was calculated to see if there was a difference between groups A and B in terms of the distribution of time until the event occurs.

For the present data, the log-rank test showed that there is a difference between the groups in terms of the distribution of the time until the event occurs, p=<0.001. The null hypothesis is thus rejected.

On the other hand, in the case where the "material" and the "age" were selected, the Cox regression is calculated. Then you can read whether the factors have a significant influence or not. You can find more information about this in the Cox Regression tutorial.

### Statistics made easy

- Many illustrative examples
- Ideal for exams and theses
- Statistics made easy on 251 pages
**Only 6.99 €**

*"Super simple written"*

*"It could not be simpler"*

*"So many helpful examples"*