 # t-test for independent samples

## What is a t-test for independent samples (Unpaired t-test)?

The t-test for independent samples is a statistical test that determines whether there is a difference between two unrelated groups.

The t-test for independent samples is used to make a statement about the population based on two independent samples. To make this statement, the mean value of the two samples is compared. If the difference in means is large enough, it is assumed that the two groups differ. ## Why do you need the unpaired t-Test?

Say you want to test if there is a difference between two groups in the population, for example, if there is a difference in salary between men and women. Of course it is not possible to ask all men and women for their salary, so we take a sample. We create a survey and send it randomly to people. In order to be able to make a statement about the population based on this sample, we need the independent t-test.

## How does the unpaired t-test works?

The unpaired t-test puts the mean difference in relation to the standard error of the mean. The standard error of the mean indicates how much the mean value scatter, it indicates how far the sample mean of the data is likely to be from the true population mean. If the fluctuation of the mean value is large, this is an indication that a large difference in the mean values of the two groups is very likely, even by chance. Therefore, the larger the mean difference in the two groups is and the smaller the standard error of the mean, the less likely it is that the given mean difference in the two samples is due to chance.

## What are independent samples?

Independent samples exist if no case or person from one group can be assigned to a case or person from the other group. This is the case, for example, when comparing the group of women and the group of men, or the group of psychology students with those of math students.

## Paired vs unpaired t-test

The main difference between the paired and the unpaired t-test is the sample.

• If you have one and the same sample that you survey at two points in time, you use an paired t-test.
• If you want to compare two different groups, whether they come from one sample or two samples, you use an unpaired t-test.

## Examples for the Unpaired t-test

There are many applications for the independent t-test, it is an important test e.g. in biostatistics or marketing.

### Medical example:

For a pharmaceutical company, you want to see if a drug XY helps you lose weight or not. This is done by giving 20 people the medicine and 20 people a placebo.

### Social science example:

You want to find out if there is a difference between the health of people with and without university degrees.

### Technical example:

For a screw factory you want to find out if two production lines produce screws of the same weight. To test this, you weigh 50 screws from one machine and 50 screws from the other machine and compare them.

## Research question and hypotheses

If you want to know whether two independent groups are different, you have to calculate an unpaired t-test. Before the t-test can be calculated, however, you first have to formulate a research question and define the hypotheses.

### Research Question for the unpaired t-Test

With the research question you limit your object of investigation. In a t-test for independent samples the general question is: Is there a statistically significant difference between the mean values of two groups?

For the examples above, the research questions arise:

• Does drug XY help with weight loss?
• Is there a difference in the health of people with and without university degrees?
• Do both production plants produce screws of the same weight?

### Hypotheses for the unpaired t-Test

The next step is to derive the hypotheses to be tested from the question. Hypotheses are assumptions about reality whose validity is possible but not yet proven. Two hypotheses are always formulated that assert exactly the opposite. These two hypotheses are the null hypothesis and the alternative hypothesis.

Null hypothesis H0 Alternative hypothesis H1

There is no mean difference between the two groups in the population.

Two population means are equal.
The two groups are from the same population.
H0: μ1 = μ2

Example: There is no difference between the salary of men and women.

There is a mean difference between the two groups in the population.

The two population means are not equal.
The two groups are not from the same population.
H1: μ1≠ μ2

Example: There is a difference between the salary of men and women.

## Assumptions unpaired t-Test

To calculate an independent t-test you need one independent variable (e.g. gender) that has two characteristics or groups (e.g. male and female) and one metric dependent variable (e.g. income). These two groups should be compared in the analysis. The question is, is there a difference between the two groups with regard to the dependent variable (e.g. income). The assumptions are now the following:

#### 1. There are two dependent groups or samples

As the name of this t-test suggests, the samples must be independent. This means that a value in one sample must not influence a value in the other sample.

• Measuring the weight of people who have been on a diet and people who have not been on a diet.
• Measuring the weight of a person before and after a certain diet.

#### 2. The variables are interval scaled

For the t-test for independent samples, the mean value of the sample must be calculated, this is only meaningful if the variable is metric scaled.

• The weight of a person (in kg)
• The educational level of a person

#### 3. The variables are normally distributed

The t-test for independent samples gives the most accurate results when the data from each group are normally distributed. However, there are exceptions in special cases.

• The weight, age or height of a person.
• The number after throwing a die

#### 4. The variance within the groups should be similar

Since the variance is needed to calculate the t value, the variance within each group should be similar.

• Weight, age or height of a person
• The stock market crisis in "normal" times and in a recession

## Assumptions not met?

If the assumptions for the independent t-test are not met, the calculated p-value may be incorrect. However, if the two samples are of equal size, the t-test is quite robust to a slight skewness of the data. The t-test is not robust if the variances differ significantly.

If the variables are not normally distributed, the Mann-Whitney U test can be used. The Mann-Whitney U Test is the non-parametric counterpart of the independent t-test.

## Calculate t-test for independent samples

Depending on whether the variance between the two groups is assumed to be equal or unequal, a different equation for the test statistic t is obtained. Checking whether the variances are equal or not is done with the Levene-Test. The null hypothesis in the Levene-Test is that the two variances are not different. If the p-value of the levene-test is less than 5%, it is assumed that there is a difference in the variances of the two groups.

### Equations for equal variance (homogeneous)

If the Levene test yields a p-value of greater than 5%, it is assumed that both groups have equal variance and the test statistics are: The p-value can then be determined from the table with the t distribution. The number of degrees of freedom is given by where n1 and n2 are again the number of cases in the two samples.

### Formula for unequal Variance (heterogeneous)

The test statistic t for a t-test for independent samples with unequal variance is calculated by The p-value then follows from the table with the t-distribution, where the degrees of freedom are obtained via the following equation: ## Confidence interval for the true mean difference

The calculated mean difference in the independent t-test has been calculated using the sample. Now it is of course of interest in which range the true mean difference lies. To determine within which limits the true difference is likely to lie, the confidence interval is calculated.

The 95% confidence interval for the true mean difference can be calculated by the following formula: where t* is the t value obtained at 97.5% and degrees of freedom df.

## One-sided and two-sided unpaired t-test

As explained in the article on hypothesis, there are one-sided and two-sided hypotheses (also called directional and non-directional hypotheses). To accommodate this, there is also a one-sided and two-sided t-test for independent samples. By default, the two-sided unpaired t-test is calculated, which is also output in DATAtab.

To obtain the one-sided t-test for independent samples, the p-value must be divided by two. Now it depends on whether the data tend "in the direction" of the hypothesis or not. If the hypothesis says that the mean of one group is larger or smaller than the mean of the other group, this must also be seen in the result. If this is not the case, 1 minus the halved p-value must be calculated.

## Effect size unpaired t-test

The effect size in an unpaired t-test is usually calculated using the Hedges g, also called d. In the unpaired t-test calculator on DATAtab you can easily get the effect size. ### What do you need the effect size for?

The calculated p-value depends very much on the sample size. For example, if there is a difference in the population, the larger the sample size, the more clearly the p-value will "show" this difference. If the sample size is chosen very high, even very small differences, which may no longer be relevant, can be "detected" in the population. To standardize this, the effect strength is used in addition to the p-value.

## Calculate t-test for independent samples with DATAtab

A lecturer would like to know whether the statistics exam results in the summer semester differ from those in the winter semester. To this end, she creates an overview with the points achieved per exam.

##### Research question:

Is there a significant difference between the examination results in the summer and winter semester?

##### Null hypothesis H0:

There is no difference between the two samples. There is no difference between the statistics exam results in the summer semester and in the winter semester

##### Alternative hypothesis H1:

There is a difference between the two samples. There is a difference between the statistics exam results in the summer semester and in the winter semester

Summer semester Winter semester
52 53
61 71
40 38
46 34
50 68
56 68
44 46
47 41
70 38
40 23
65 28
38
68

After copying the above sample data into the Hypothesis Test Calculator on DATAtab, you can calculate the t-test for independent samples. The results for the t-test example look like this:

##### Group statistics
 Summer semester Winter semester n Mean Standard deviation Standard error of the mean 13 52.077 11.026 3.058 11 46.182 16.708 5.038
##### Unpaired t-test
 Summer semester & Winter semester Equal variance Unequal variance t df p 1.035 22 0.312 1 16.824 0.331
##### 95% confidence interval
 Summer semester & Winter semester Equal variance Mean value difference Standard error of difference Lower Upper 5.895 5.893 -6,328 18.118 5.895 5.893 -6.55 18.34

## How to interpret a t-test for independent samples?

To make a statement about whether your hypothesis is significant or not, one of the following two values is used

• p-value (2-tailed)
• lower and upper confidence interval of the difference

In this t-test example, the p-value (2-tailed) is 0.312 or 31%. This means that the probability that you draw a sample where both groups differ more than the groups in the example is 31%. Since the significance level was set at 5 %, it is thus lower than 31 %. For this reason, no significant difference is assumed between the two samples and they therefore come from the same population.

The second way to determine whether or not there is a significant difference is to use the confidence interval of the difference. If the lower and upper limits runs through zero, there is no significant difference. If this is not the case, there is a significant difference. In this t-test example, the lower value is -6.328 and the upper value is 18.118. Since zero is between the two values, there is no significant difference.

It is common practice to first display the two samples in a chart before calculating a t-test for independent samples. For this purpose, a boxplot is suitable which visualizes the Measurement of Central Tendency and Measurement of Variability of the two independent samples very well. To calculate an independent t-test online, you can also use the independent t-test calculator.

Cite DATAtab: DATAtab Team (2023). DATAtab: Online Statistics Calculator. DATAtab e.U. Graz, Austria. URL https://datatab.net