Hypothesis test

Hypothesis tests are statistical test procedures, such as the t-test or an analysis of variance, with which you can test hypotheses based on collected data.

When do I need a hypothesis test?

A hypothesis test is used whenever you want to test a hypothesis about the population with the help of a sample. So whenever you want to prove or say something about the population with a sample, hypothesis tests are used.

A possible example would be that the company "My-Muesli" would like to know whether their produced muesli bars really weigh 250g. For this purpose, a random sample is taken and a hypothesis test is then used to draw conclusions about all the muesli bars produced.

In statistics, hypothesis tests aim to test hypotheses about the population on the basis of sample characteristics.

Hypothesis testing and the null hypothesis

As we know from the previous tutorial on hypotheses, there is always a null and an alternative hypothesis. In "classical" inferential statistics, the null hypothesis is always tested using a hypothesis test. The hypothesis is tested to see if there is no difference or no relationship.

If you want to be 100% accurate, the null hypothesis H0 can only ever be rejected or not rejected using a hypothesis test. The non-rejection of H0 is not a sufficient reason to conclude that H0 is true. Therefore, the wording "H0 was not rejected" is preferable to "H0 was retained."

Briefly anticipating the p-value: if the p-value is less than 0.05, the null hypothesis is rejected; if the p-value is greater than 0.05, it is not rejected.

Why is there a probability of error in a hypothesis test?

Whether an assumption or hypothesis about the population is rejected or not rejected by a hypothesis test can only ever be determined with a certain probability of error. But why does the probability of error exist?

Here is the short answer: each time you take a sample, you of course get a different one, which means that the results are different every time. In the worst case, a sample is taken that happens to deviate very strongly from the population and the wrong statement is made. Therefore there is always a probability of error for every statement or hypothesis.

Level of significance

A hypothesis test can never reject the null hypothesis with absolute certainty. There is always a certain probability of error that the null hypothesis is rejected even though it is actually true. This probability of error is called the significance level or α.

The significance level is used to decide whether the null hypothesis should be rejected or not. If the p-value is smaller than the significance level, the null hypothesis is to be rejected; otherwise, it is not to be rejected.

Usually, a significance level of 5% or 1% is set. If a significance level of 5% is set, it means that it is 5% likely to reject the null hypothesis even though it is actually true.

Illustrated by the two-sample t-test, this means that the observed means of two samples have a certain distance to each other. The greater the observed distance between the mean values, the less likely it is that both samples come from the same population. The question now is, at what point is it "unlikely enough" to reject the null hypothesis? If a significance level of 5% is set, at 5% it is "unlikely enough" to reject the null hypothesis.

The probability that two samples are drawn from a population and that they have the observed mean difference, or even a greater one, is indicated by the p-value. Accordingly, if the p-value is less than the significance level, the null hypothesis is rejected; if the p-value is greater than the significance level, the null hypothesis is not rejected.

If, for example, a p-value of 0.04 results, the probability that two groups with an observed mean distance or an even greater distance come from the same population is 4%. The p-value is thus less than the significance level of 5% and thus the null hypothesis is rejected.

It is important to note that the significance level is always set before the test and may not be changed afterwards in order to obtain the "desired" statement after all. To ensure a certain degree of comparability, the significance level is usually 5% or 1%.

• α ≤ 0.01 highly significant (h.s.)
• α ≤ 0.05 significant (s.)
• α > 0.05 not significant (n.s.)

Example Significance level and p-value

H0: Men and women in Austria do not differ in their average monthly net income.

To test this hypothesis, a significance level of 5% is set and a survey is conducted asking 600 women and 600 men about their monthly net income. An independent t-test gives a p-value of 0.04

The p-value 0.04 is less than the significance level of 0.05, thus we rejecting the null hypothesis. Based on the data collected, we have sufficient evidence that there is a statistically significant difference in average monthly next income for the population of men and women in Austria.

Types of errors

Because a hypothesis can only be rejected with a certain probability, different types of errors occur. Due to the sample selection, it can happen that the null hypothesis is rejected by chance, although in reality there is no difference, i.e. the null hypothesis is valid. Conversely, the result of the hypothesis test can also be that the null hypothesis is not rejected, although in reality there is a difference and thus the alternative hypothesis is actually true.

Accordingly, there are two types of errors in hypothesis testing:

• Type 1 error: If the alternative hypothesis is accepted although the null hypothesis is valid.
• Type 2 error: If the null hypothesis is not rejected although the alternative hypothesis applies.

Overall, the following cases arise:

Significance vs effect size

We now know that we usually accept the alternative hypothesis when the p-value is less than 0.05. We then assume that there is an effect, e.g., a difference between two groups.

However, it is important to keep in mind that just because an effect is statistically significant does not mean that the effect is relevant.

If a very large sample is taken and the sample has a very small spread, even a very small difference between two groups may be significant, but it may not be relevant to you.

Example

A company sells frozen pizza and wants to test whether higher quality packaging leads to increased sales.

Based on the data collected, it shows that the p-value is less than 0.05 and therefore there is a statistically significant increase.

So the company can assume that the higher quality packaging will increase the sales statistically significant. It is less than 5% probable that this increase or an even greater increase would occur if the packaging had no influence.

But now the question is whether the increase is also economically relevant. It may be that the income from the increased sales figures does not compensate for the higher costs of the packaging.

Therefore, one should always consider both whether an effect is significant and whether the effect is relevant at all.

How do I find the right hypothesis test

In order to test hypotheses, various test procedures are available. On the one hand, these are divided according to the levels of measurement of the sample

and, on the other hand, how many samples are present and how the samples are related to each other.

DATAtab helps you to find the right test, you just need to select the data you want to evaluate. Depending on the scale level of your data, DATAtab will suggest the appropriate test.

Depending on which variables are selected, is calculated:

• t-test one sample
• t-test independent samples
• t-test dependent samples
• Chi Square-Test
• Binomial test
• ANOVA with/without rep. measures
• 2 way ANOVA with/without rep. measures
• Wilcoxon-Test
• Mann-Whitney U-Test
• Friedman Test
• Kruskal-Wallis Test
• ...

The following table lists the relevant test procedures. If you know the scale level of the variables in your hypothesis, you can see in the table which test could fit!

Level of measurement
nominal ordinal metric
Binomial test 1 x nominal
t-test for one sample 1 x metric
Chi-Square Test 1 x or 2 x nominal
t-test for independent samples 1 x nominal with two categories 1 x metric
Mann-Whitney U test 1 x nominal with two categories 1 x ordinal
One-way analysis of variance 1 x nominal with more than two categories 1 x metric
Kruskal-Wallis-Test 1 x nominal with more than two categories 1 x ordinal
Pearson correlation 2 x metric
Spearman correlation 2 x ordinal
Point biserial correlation 1 x nominal with two categories 1 x metric
t-test for paired samples 2 x metric
Wilcoxon-Test 2 x ordinal
Analysis of variance for repeated measurements more than 2 x metric
Friedman Test more than 2 x ordinal

If a correlation hypothesis is to be tested, a correlation analysis is calculated. Either the Pearson correlation or the Spearman correlation is then used here.

Examples of hypothesis testing

Independent sample t-test

Is there a difference in the average number of burglaries (dependent variable) in houses with and without alarm systems (independent variable with 2 groups)?

Paired t-test

Does the consumption of cigarettes have a negative effect on the blood pressure? (Before and after measurement)

ANOVA

People living in small, medium or large cities (independent variable with three groups) differ in their health awareness (dependent variable).

Cite DATAtab: DATAtab Team (2024). DATAtab: Online Statistics Calculator. DATAtab e.U. Graz, Austria. URL https://datatab.net