# Normality test

One of the most common assumptions for statistical test procedures is that the data used must be normally distributed. For example, if a t-test or an ANOVA is to be calculated, it must first be tested whether the data or variables are normally distributed.

If the normal distribution of the data is not given, the above procedures cannot be used and the non-parametric tests, which do not require normal distribution of the data, must be used.

In the case of a regression analysis, the assumptions of normal distribution are also important, but here it is important that the error made by the model is normally distributed and not the data itself.

## How do I test normal distribution?

Normal distribution can be tested either analytically or graphically. The most common analytical tests to check data for normal distribution are the:

- Kolmogorov-Smirnov Test
- Shapiro-Wilk Test
- Anderson-Darling Test

For the graphical test either a histogram or the Q-Q plot is used. Q-Q stands for Quantile Quantile Plot, it compares the actual observed distribution and the expected theoretical distribution.

## Analytically test data for normal distribution

To test your data analytically for normal distribution, there are several test procedures, the best known being the Kolmogorov-Smirnov test, the Shapiro-Wilk test, and the Anderson Darling test.

With all these tests, you are testing the null hypothesis that your data is normally distributed. So, the null hypothesis is that the frequency distribution of your data is normally distributed. To reject or not reject the null hypothesis, you get a p-value out of all these tests. Now the big question is whether this p-value is smaller or larger than 0.05.

If the p-value is smaller than 0.05, this is interpreted as a significant deviation from the normal distribution and you can assume that your data are not normally distributed. If the p-value is greater than 0.05 and you want to be statistically completely clean, you cannot necessarily say that the frequency distribution corresponds to the normal distribution, you just cannot disprove the null hypothesis.

In practice, even if it is not completely clean, it is still handled in such a way that a value greater than 0.05 is assumed to be a normal distribution. Nevertheless, one should always look at the graphical solution.

For your information, you can use the Kolmogorov-Smirnov test and the Anderson-Darling test to test distributions other than the normal distribution.

### Disadvantage of the analytical tests for normal distribution

Now, unfortunately, there is a big disadvantage of the analytical procedures, why more and more more and more to use the graphical methods.

The problem is that the calculated p-value is affected by the size of the sample. Therefore, if you have a very small sample, your p-value may be much larger than 0.05, but if you have a very very large sample from the same population, your p-value may be smaller than 0.05.

Let's say the distribution in your population deviates very slightly from the normal distribution. Then you will get a very large p-value with a very small sample and thus assume that the data is normally distributed. However, if you take a larger sample, then the p-value becomes smaller and smaller, even though the samples come from the same population with the same distribution. With a very large sample, you can even get a p-value that is smaller than 0.05 and thus reject the null hypothesis that it is a normal distribution.

Um dieses Problem zu umgehen, werden immer mehr die graphischen Verfahren verwendet.

## Graphical test for normal distribution

If the normal distribution is tested graphically, one looks either at the histogram or even better the QQ plot.

If you go the histogram route, you plot the normal distribution on the histogram of your data and see if the curve of the normal distribution roughly matches that of the normal distribution curve.

However, it is better if you use the so-called Quantile Quantile Plot or QQ Plot for short. Here, the theoretical quantiles that the data should have if they are perfectly normally distributed and the quantiles of the measured values are compared.

If the data is Perfectly Normally Distributed, all points would lie on the line. The more the data deviates from the line, the less the data is normally distributed.

In addition, DATAtab plots the 95% confidence interval. If all or almost all of your data lies within this interval, it is a very strong indication that your data is normally distributed. Your data would not be normally distributed if, for example, they form an arc and are far from the line in some areas.

## Test Normal distribution in DATAtab

If you test your data with DATAtab for normal distribution, you get the following evaluation, first you get the analytical test procedures clearly arranged in a table, then come the graphical test procedures.

If you want to test your data for normal distribution, simply copy your data into the table on DATAtab, click on descriptive statistics and then select the variable you want to test for normal distribution. Then, just click on Test Normal Distribution and you will get the results.

Furthermore, if you calculate a hypothesis test with DATAtab, then you can test the preconditions for each hypothesis test, if one precondition is the normal distribution, then you get the test for normal distribution in the same way.

### Statistics made easy

- Many illustrative examples
- Ideal for exams and theses
- Statistics made easy on 251 pages
**Only 6.99 €**

*"Super simple written"*

*"It could not be simpler"*

*"So many helpful examples"*