# Analysis of Variance (ANOVA)

## What is an analysis of variance?

An analysis of variance (ANOVA) tests whether statistically significant differences exist between more than two samples. For this purpose, the means and variances of the respective groups are compared with each other. In contrast to the t-test, which tests whether there is a difference between two samples, the ANOVA tests whether there is a difference between more than two groups.

There are different types of analysis of variance, the most common are the one-way and two-way analysis of variance, each of which can be calculated either with or without repeated measurements.

In this tutorial you will learn the basics of ANOVA, for each of the four types of analysis of variance you will find a separate detailed tutorial:

Tip: You can easily calculate all four variants of the ANOVA online on DATAtab. Just visit the ANOVA calculator.

## Why not calculate multiple t-tests?

ANOVA is used when there are more than two groups. Of course, it would also be a possibility to calculate a t-test for each combination of the groups. The problem here, however, is that every hypothesis test has some degree of error. This probability of error is usually set at 5%, so that, from a purely statistical point of view, every 20th test gives a wrong result

If, for example, 20 groups are compared in which there is actually no difference, one of the tests will show a significant difference purely due to the sampling.

## Difference between one-way and two-way ANOVA

The one-way analysis of variance only checks whether an independent variable has an influence on a metric dependent variable. This is the case, for example, if it is to be examined whether the place of residence (independent variable) has an influence on the salary (dependent variable). However, if two factors, i.e. two independent variables, are considered, a two-factor analysis of variance must be used.

One-factor ANOVA Two-factors ANOVA
Does a person's place of residence (independent variable) influence his or her salary? Does a person's place of residence (1st independent variable) and gender (2nd independent variable) affect his or her salary?

Two-factor analysis of variance tests whether there is a difference between more than two independent samples that are split between two variables or factors.

## Analysis of variance with and without repeated measures

Depending on whether the sample is independent or dependent, either analysis of variance with or without repeated measures is used. If the same person was interviewed at several points in time, the sample is a dependent sample and analysis of variance with repeated measurements is used.

## One-factor ANOVA

The one-way analysis of variance is an extension of the t-test for independent groups. With the t-test only a maximum of two groups can be compared; this is now extended to more than two groups. For two groups (k = 2), the analysis of variance is therefore equivalent to the t-test. The independent variable is accordingly a nominally scaled variable with at least two characteristic values. The dependent variable is on a metric scale. In the case of the analysis of variance, the independent variable is referred to as the factor.

### Definition

Is there a difference in the population between the different groups of the independent variable with respect to the dependent variable?

The aim of ANOVA is to explain as much variance as possible in the dependent variable by dividing it into the groups. Let us consider the following example.

### One way ANOVA example

With the help of the dependent variable, e.g. highest educational qualification with the three characteristics group 1, group 2 and group 3 should be explained as much variance of the dependent variable salary as possible. In the graphic below, under A) a lot of variance can be explained with the three groups and under B) only very little variance.

Accordingly, in case A) the groups have a very high influence on the salary and in case B) they do not.

In the case of A), the values in the respective groups deviate only slightly from the group mean, the variance within the groups is therefore very small. In the case of B), however, the variance within the groups is large. The variance between the groups is the other way round; it is large in the case of A) and small in the case of B). In the case of B) the group means are close together, in the case of A) they are not.

 Case A) Case B) Variance within the groups Variance between group means Small Large Large Small

### Analysis of variance hypotheses

The null hypothesis and the alternative hypothesis result from a one-way analysis of variance as follows:

• Null hypothesis H0: The mean value of all groups is the same.
• Alternative hypothesis H1: There are differences in the mean values of the groups.

The results of the Anova can only make a statement about whether there are differences between at least two groups. However, it cannot be determined which groups are exactly different. A post-hoc test is needed to determine which groups differ. There are various methods to choose from, with Duncan, Dunnet C and Scheffe being among the most common methods.

### Example

In a screw factory, a screw is produced by three different production lines. You now want to find out whether all production lines produce screws with the same weight. To do this, take 50 screws from each production line and measure the weight. Now you use the ANOVA procedure to determine whether the average weight of the screws from the three production lines differs significantly from one another.

An example of the one-way analysis of variance would be to investigate whether the daily coffee consumption of students from different fields of study differs significantly.

Dependent variable Independent variable
Level of measurement An metric-scaled variable A nominally scaled variable with
more than two levels
Example Weekly coffee consumption Subject (math, psychology, economics)

## Assumptions for one-way analysis of variance

• Scale level: The scale level of the dependent variable should be metric that the independent variable is nominally scaled
• Homogeneity: The variances in each group should be roughly the same. This can be checked with the Levene test.
• Normal distribution: The data within the groups should be normally distributed. This means that the majority of the values ​​are in the average range, while very few values ​​are significantly below or significantly above. If this condition is not met, the Kruskal-Wallis test can be used.

If there are no independent samples but dependent ones, then a one-factor analysis of variance with repeated measures is used.

## Welch's ANOVA

If the condition of variance homogeneity is not fulfilled, Welch's ANOVA can be calculated instead of the "normal" ANOVA. If the Levene test results in a significant deviation from the variances in the groups, DATAtab automatically calculates the Welch's ANOVA in addition.

### Effect size Eta squared (η²)

The best known measures of effect size for analysis of variance are the Eta squared and the partial Eta squared. For a single factor ANOVA, the Eta squared and the partial Eta squared are identical.

The Eta squared estimate the variance that a variable explains. however, it should be noted that the variance explained is always overestimated. Eta squared is calculated by dividing the sum of squares between by the sum of squares total.

## Two factor analysis of variance

As the name suggests, two-factor analysis of variance examines the influence of two factors on a dependent variable. This extends the one-way analysis of variance by a further factor, i.e. by a further nominally scaled independent variable. The question is again whether the mean of the groups differs significantly.

Dependent variable Independent variable
Level of measurement One metric-scaled variable Two nominally scaled variables
Example Weekly coffee consumption Subject (math, psychology, economics)
and semester (winter, summer)

### Example

In a screw factory, a screw is produced by three different production systems, factor 1 in two shifts, factor 2. You now want to find out whether the production facilities or the shifts have an influence on the weight of the bolts. To do this, take 50 screws from each production line and each shift and measure the weight. Now you use two-factor ANOVA to determine whether the average weight of the screws from the three production lines and the two shifts is significantly different from one another.

## Example with DATAtab

##### One-way analysis of variance:

You want to check whether there is a difference in coffee consumption between students in different subjects. To do this, ask 10 students from each field of study.

Coffee consumption Subject
21 Math
23 Math
18 Economics
22 Economics
... ...

After the table above has been copied into the hypothesis test calculator, simply click on Hypothesis test and select the three variables. The result looks like this:

##### One-way analysis of variance:
 Math Economics n Mean SD 10 16.6 7.291 10 19.8 4.131 10 17.8 6.443 30 18.067 5.938
 Between the groups Within the groups Total Sum of squares df Mean of squares F p 52.267 2 26.133 0.702 0.505 1005.6 27 37.244 1057.867 29

Where N is the number of cases for each category, df is the degrees of freedom, F is the F-statistic from the calculated analysis of variance and p is the p-value.

Cite DATAtab: DATAtab Team (2023). DATAtab: Online Statistics Calculator. DATAtab e.U. Graz, Austria. URL https://datatab.net