Cox Regression (Cox Proportional Hazards Survival Regression)

What is Cox Proportional Hazards Survival Regression, or Cox Regression for short? Cox regression is used in survival time analysis to determine the influence of different variables on survival time.

The variables can be any mixture of continuous, binary, or categorical data. The Cox proportional hazards model is then used to determine the effect on survival time.

Cox Regression allows us to determine the effects of multiple independent variables on a time-to-event outcome, either to test hypotheses about the independent variables or to build a predictive model.

Survival time analysis

What is survival analysis? In survival time analysis, the survival times of test subjects are recorded and a survival curve is generated. Usually, the subjects have a particular disease.

The survival curve then shows how many of the subjects remain alive over time. The considered time does not have to have anything to do with the actual "survival time", nevertheless one speaks of the Survival Time and Survival Time Analysis.

Therefore the survival time analysis considers a variable that has a start time and an end time when a certain event occurs.

The time between the start time and the event is considered in the survival time analysis. This can be measured in days, weeks or months, for example.

Censoring

There is now the problem that a study cannot last indefinitely. This results from limited time and financial resources and from the fact that one would like to publish the results at some point. Therefore, each study has a start date and an end date. If there is no clear event date for a case, it is referred to as "censoring".

Several methods have been developed to deal with this issue. You are welcome to have a look at the tutorial on the Kaplan Meier curve.

Cox Regression Example

Let's go back to the Cox regression. For example, if you want to analyze the survival time after the detection of a disease, you are often not interested in the survival time itself, but in what influences the survival time.

So we want to know if the survival time depends on one or more factors, called "predictors" or "independent variables".

For simple situations with a single factor with only two values, the Log Rank Test is used. For example, if you want to test whether there is a difference in survival time when two different drugs are given.

If you want to include the age of the subjects, a special type of regression is needed. This is the Proportional Hazards Survival Regression. This regression is then used to evaluate the effect of each predictor on the shape of the survival curve.

In our example, we have as predictors, on the one hand, the drug used and, on the other hand, the age of the subjects. We want to know what effect these variables have on the survival time curve. To do this, we use Cox regression.

We will now look at the steps of Cox regression using an example. Let's assume we have the following data and want to analyse it.

Each row describes a patient with the corresponding disease. The time indicates when the event or death occurred. Of course, we also have information about which drug was used and the age of the subjects.

Calculate Cox Regression

Load example data Cox Regression

The first step is to calculate the Cox regression, we will do this online using DATAtab, then we will go through how to interpret the results. Please load the data above.

To calculate the Cox Proportional Hazards Survival Regression with your own data, simply go to the Cox Regression Calculator and copy and paste your data into the table as you would in Excel.

Now we click on "Survival Analysis." Depending on which variables you want to select, different methods of survival analysis will be calculated. If you select only the "Time" and the "Status", the Kaplan Meier curve will be displayed.

If you now click on the drug, you will get the log rank test. If you also select the age, the Cox regression will be calculated.

Interpreting Cox Regression

The first column contains the names of the variables. The first row shows the variable drug and the second row shows the age of the persons.

The most important values in this table are the estimated regression coefficient and the p-value. The p-value tells you whether the regression coefficient is significantly different from zero.

So the null hypothesis is that the coefficient is zero in the population. Assuming, as usual, that the significance level is set at 5%, the null hypothesis is rejected for p-values less than 5% or 0.05. This means that the coefficient is significantly different from zero.

In the case of drug, the p-value is less than 0.05 and therefore there is a significant difference from zero.

In the case of age, we obtain a p-value of 0.221, which is greater than 0.05. Therefore, in this case, the null hypothesis is neither rejected nor accepted and we assume, based on these data, that age does not have a significant effect on the survival curve.

Assumptions of a Cox Regression

Proportional Hazards Assumption: The proportional hazards assumption is the central assumption of Cox regression. It states that the hazard ratio (the ratio of the hazard rates between two groups) remains constant over time. In other words, the effect of the predictor variables on the hazard function is assumed to be constant over time.

Independence Assumption: Cox regression assumes that the survival times of individuals are independent of each other, given the values of the predictor variables. This means that the survival time of one individual should not influence the survival time of another individual.

Linearity Assumption: Cox regression assumes that the relationship between the predictor variables and the log of the hazard rate is linear. This assumption implies that the effect of a continuous predictor is constant over its entire range.

No Multicollinearity: Cox regression assumes that there is no perfect multicollinearity among the predictor variables. Multicollinearity occurs when two or more predictor variables are highly correlated, making it difficult to separate their individual effects on the outcome.

No Outliers: Cox regression assumes that there are no extreme outliers that significantly affect the results. Outliers are observations that deviate substantially from the overall pattern of the data and can distort the estimated coefficients.

No Effect Modification: Cox regression assumes that there is no effect modification or interaction between the predictor variables. Effect modification occurs when the effect of one predictor variable on the outcome depends on the level of another predictor variable.

Statistics made easy

many illustrative examples
ideal for exams and theses
statistics made easy on 412 pages
5rd revised edition (April 2024)
Only 7.99 €

Free sample

"Super simple written"

"It could not be simpler"

"So many helpful examples"