# Normality test

One of the most common assumptions for statistical tests is that the data used are normally distributed. For example, if you want to run a t-test or an ANOVA, you must first test whether the data or variables are normally distributed.

The assumption of normal distribution is also important for linear regression analysis, but in this case it is important that the error made by the model is normally distributed, not the data itself.

#### Nonparametric tests

If the data are not normally distributed, the above procedures cannot be used and non-parametric tests must be used. Non-parametric tests do not assume that the data are normally distributed.

## How is the normal distribution tested?

Normal distribution can be tested either analytically (statistical tests) or graphically. The most common analytical tests to check data for normal distribution are the:

• Kolmogorov-Smirnov Test
• Shapiro-Wilk Test
• Anderson-Darling Test

For graphical verification, either a histogram or, better, the Q-Q plot is used. Q-Q stands for quantile-quantile plot, where the actually observed distribution is compared with the theoretically expected distribution.

## Statistical tests for normal distribution

To test your data analytically for normal distribution, there are several test procedures, the best known being the Kolmogorov-Smirnov test, the Shapiro-Wilk test, and the Anderson Darling test.

In all of these tests, you are testing the null hypothesis that your data are normally distributed. The null hypothesis is that the frequency distribution of your data is normally distributed. To reject or not reject the null hypothesis, all these tests give you a p-value. What matters is whether this p-value is less than or greater than 0.05.

If the p-value is less than 0.05, this is interpreted as a significant deviation from the normal distribution and it can be assumed that the data are not normally distributed. If the p-value is greater than 0.05 and you want to be statistically clean, you cannot necessarily say that the frequency distribution is normal, you just cannot reject the null hypothesis.

In practice, a normal distribution is assumed for values greater than 0.05, although this is not entirely correct. Nevertheless, the graphical solution should always be considered.

Note: The Kolmogorov-Smirnov test and the Anderson-Darling test can also be used to test distributions other than the normal distribution.

### Disadvantage of the analytical tests for normal distribution

Unfortunately, the analytical method has a major drawback, which is why more and more attention is being paid to graphical methods.

The problem is that the calculated p-value is affected by the size of the sample. Therefore, if you have a very small sample, your p-value may be much larger than 0.05, but if you have a very very large sample from the same population, your p-value may be smaller than 0.05.

If we assume that the distribution in the population deviates only slightly from the normal distribution, we will get a very large p-value with a very small sample and therefore assume that the data are normally distributed. However, if you take a larger sample, the p-value gets smaller and smaller, even though the samples are from the same population with the same distribution. With a very large sample, you can even get a p-value of less than 0.05, rejecting the null hypothesis of normal distribution.

To avoid this problem, graphical methods are increasingly being used.

## Graphical test for normal distribution

If the normal distribution is tested graphically, one looks either at the histogram or even better the QQ plot.

If you want to check the normal distribution using a histogram, plot the normal distribution on the histogram of your data and check that the distribution curve of the data approximately matches the normal distribution curve.

A better way to do this is to use a quantile-quantile plot, or Q-Q plot for short. This compares the theoretical quantiles that the data should have if they were perfectly normal with the quantiles of the measured values.

If the data were perfectly normally distributed, all points would lie on the line. The further the data deviates from the line, the less normally distributed the data is.

In addition, DATAtab plots the 95% confidence interval. If all or almost all of the data fall within this interval, this is a very strong indication that the data are normally distributed. They are not normally distributed if, for example, they form an arc and are far from the line in some areas.

## Test Normal distribution in DATAtab

When you test your data for normal distribution with DATAtab, you get the following evaluation, first the analytical test procedures clearly arranged in a table, then the graphical test procedures.

If you want to test your data for normal distribution, simply copy your data into the table on DATAtab, click on descriptive statistics and then select the variable you want to test for normal distribution. Then, just click on Test Normal Distribution and you will get the results.

Furthermore, if you are calculating a hypothesis test with DATAtab, you can test the assumptions for each hypothesis test, if one of the assumptions is the normal distribution, then you will get the test for normal distribution in the same way.

Cite DATAtab: DATAtab Team (2024). DATAtab: Online Statistics Calculator. DATAtab e.U. Graz, Austria. URL https://datatab.net