Point Biserial Correlation

Point biserial correlation is a special case of Pearson correlation and examines the relationship between a dichotomous variable and a metric variable.

What is a dichotomous variable and what is a metric variable? A dichotomous variable is a variable with two expressions, for example gender with male and female or smoking status with smoker and non-smoker. A metric variable is, for example, a person's weight or a person's salary.

So, if we have a dichotomous variable and a metric variable and we want to know if there is a correlation, we can use a point biserial correlation. Of course, we need to check the preconditions beforehand, but more about that later.

Calculate point biserial correlation

As we said at the beginning, the point biserial correlation is a special case of the Pearson correlation. But how can we calculate the Pearson correlation when a variable is nominal? Let's look at this with an example.

Let's say we want to study the correlation between the number of hours spent learning for an exam and the exam result (pass/fail).

We collected data from 20 students, 12 of whom passed the test and 8 of whom failed. We recorded the number of hours each student studied for the exam.

To calculate the point biserial correlation, we first need to convert the test score into numbers. We can assign a value of 1 to the students who passed the test and 0 to the students who failed the test.

Now we can either calculate the Pearson correlation of time and test score, or we can use the equation for the point biserial correlation.

Point biserial correlation and Pearson correlation

But no matter if we calculate the Pearson correlation or if we use the equation for the point biserial correlation. We get the same result both times!

Load sample data

Let's take a quick look at this in DATAtab. We have the learning hours, the test result with pass and fail, and the test result with zero and one. The test result with zero and one, we define as metric.

If we now go to correlation and calculate the Pearson correlation for these two metric variables, we get a correlation coefficient of 0.31. If we calculate the point biserial correlation for learning hours and exam result with "passed" and "failed" we also get a correlation of 0.31.

Point biserial correlation and Pearson correlation

Point biserial correlation coefficient

Just like the Pearson correlation coefficient r, the point biserial correlation coefficient r_pb also varies between -1 and 1.

If we have a coefficient between -1 and less than 0, there is a negative correlation, that is, a negative relationship between the variables.

If we have a coefficient between greater than 0 and 1, there is a positive correlation, that is, a positive relationship between the two variables. If the result is 0, we have no correlation.

Hypotheses

Often, however, starting from a sample, we want to test a hypothesis about the population. In the case of correlation analysis, we can test whether the correlation coefficient is significantly different from 0.

The hypotheses for point biserial correlation thus result in:

Null hypothesis: The correlation coefficient r = 0 (There is no correlation)
Alternative hypothesis: The correlation coefficient r ≠ 0 (There is a correlation)

Point biserial correlation and the t-test for independent samples.

When we calculate a point biserial correlation, we get the same p-value as when we calculate a independent t-test for the same data.

So, whether we test a correlation hypothesis with the point biserial correlation, or a difference hypothesis with the t-test, we get the same p-value.

Load sample data

If we calculate a t-test in Datatab with the data under the tab "Hypothesis Tests", and we have the null hypothesis: "There is no difference between the groups not passed and passed with respect to the variable learning hours", then we get a p-value of 0.179 out of it.

Point biserial correlation and the t-test for independent samples

And in the same way, if we calculate a point biserial correlation under the tab "Correlation" and we have the null hypothesis: "There is no correlation between learning hours and exam score", we also get a p-value of 0.179!

In our example, the p-value is greater than 0.05, which is most often used as a significance level, and thus the null hypothesis is not rejected.

Assumptions for a point biserial correlation

Regarding the assumptions in point biserial correlation, we need to distinguish whether we want to calculate only the correlation coefficient, or whether we want to test a hypothesis. To calculate the correlation coefficient, we only need to have a metric variable and a dichotomous variable.

However, if we want to test whether the correlation coefficient is significantly different from zero, the metric variable must also be normally distributed! If this is not given, the test statistic t or the p-value cannot be interpreted reliably!

Statistics made easy

many illustrative examples
ideal for exams and theses
statistics made easy on 412 pages
5rd revised edition (April 2024)
Only 7.99 €

Free sample

"Super simple written"

"It could not be simpler"

"So many helpful examples"