Causality means that there is a clear cause-effect relationship between two variables. Therefore, there is causation, when action A causes outcome B. A common mistake in the interpretation of statistics is to infer causality when correlation is present, but correlation is simply a relationship.
Causality and correlation
Correlation analysis shows whether there is a relationship between two variables. If there is a correlation, however, it is not yet known in which direction this relationship goes. For this, it must first be checked whether causality exists.
Why is correlation not causality?
If there is a correlation between variable X and variable Y, this does not mean that the two variables are causally related. It could be, for example, that the correlation is purely due to a third variable Z and neither the variable X has an influence on Y nor the variable Y on X.
Causality and regression
If there is a causal relationship between two variables, a regression analysis can predict one variable with the other. Of course, care must be taken that the direction is correct, it is only possible to predict the dependent variable with the help of the independent variable with a regression.
By defining one variable as predictor and one variable as criterion in regression, the causal direction is already given, this direction should then be justified based on theory.
Therefore, causality or direction of effect must first be theoretically derived before it can be assumed in a regression model. Thus, one cannot "search" for causality with the regression, the regression can only be used if a causal relationship is assumed.
Causal Models for Regression
Does linear regression imply causation? Neither correlation nor regression can indicate causation. Causal model involve regression or correlation analysis and additionally a strong theory linking the two or more variables.
Assumptions for causality
There are two prerequisites for causality: First, there must be a significant relationship, i.e., a significant correlation. The second condition can be fulfilled in two ways. First, it is fulfilled if there is a temporal sequence of the variables. In other words, variable A was collected before variable B in terms of time. Furthermore, the second condition can be fulfilled if there is a theoretically justified and plausible theory in which direction the causal relationship goes.
If neither of these applies, i.e., there is neither a temporal order nor can causality be substantiated by a well-founded theory, then one can only speak of a correlation, but never of causality, i.e., it cannot be said that variable A influences variable B or vice versa.
Example of causality
First, you need to check if there is a correlation between the two variables, this is done with a correlation analysis. If there is a significant correlation, the second condition must still be tested.
The second condition can be confirmed either by theory or if there is a time sequence. In this case, there is a clear time sequence. If there is a correlation, it is clear that the variable "age at which the first sentence is spoken" influences the variable "later school success", the other way around is not possible.
Statistics made easy
- many illustrative examples
- ideal for exams and theses
- statistics made easy on 276 pages
- 3rd revised edition (July 2023)
- Only 6.99 €
"Super simple written"
"It could not be simpler"
"So many helpful examples"