# Learn Statistics: Analysis of Variance (ANOVA)

One thing that regularly stumps scientists is the handling of data. We seem to be very good at generating obscene amounts of it, but representing it meaningfully can be a little off-putting if you don’t happen to be a bioinformatician. In previous tutorials, we looked at hypothesis testing using variations of the t-Test, and we continue the series by comparing more than 2 sample sets with ANOVA.

*Of course, the calculations involved can be done on a simple calculator but your task will be made much easier with the use of spreadsheet software (Excel) or more specialized tools which are available in most high schools and universities (Minitab, Prism, SigmaPlot, etc.). *

*Each will have its own tutorials on carrying out these tests and so this article will not be heavily technical but rather focus on the correct application of statistical testing.*

Table of Contents

## Table of Statistical Analysis

Test | When To Use | An Example |

1 sample t-test | Tests if the mean of a single population is equal to the hypothesized value | A lecturer claims that the mean time taken to complete a quiz is 1 hour. From a sample data set, can we reject this claim? |

2 sample t-test | Tests if the difference between means of two independent populations is equal to a hypothesized value | Does the mean quiz score of female students differ significantly from the mean quiz score of male students? |

paired t-test | Tests if the difference between means of dependent or paired observations is equal to a hypothesized value | The mean response time of adults before and after they have consumed alcohol. Is the difference significant enough to conclude that alcohol affects response time? |

ANOVA | Tests for statistical difference among means for more than two populations | Studying the effectiveness of three types of pain reliever: aspirin vs. tylenol vs. ibuprofen |

## When to Use Analysis of Variance (ANOVA)?

The different forms of t-Tests are powerful tools to determine statistical significance, as we have discussed in previous tutorials, but a little itty bitty problem quickly arises as we dive into hypothesis testing: **What if we have more than 2 sample groups!?**

For example, if 5 independent populations are involved, being restricted to t-Tests means that 10 separate calculations would have to be performed, comparing each mean with the others.

Not only would this take forever, but it also increases the risk of Type 1 error by inflating the p-value, hence incorrectly rejecting the null hypothesis.

ANOVA provides the solution to test if the means of several groups are equal, and therefore perform like a supercharged t-Test!

ANOVA is about looking at the ‘signal’ relative to the ‘noise’ between the variances of the groups. We want to see if the between-group variance (signal), is comparable to the

within-group variance (noise).

## An Example of ANOVA in Use

A scientist wants to determine the effectiveness of three types of pain relievers (e.g. Aspirin vs. Tylenol vs. Ibuprofen), and collects data on three groups ranking their change in pain level before and after receiving treatment.

- The null hypothesis (H
_{0}): There is no difference between the means (1 = 2 = 3) - The alternative hypothesis (H
_{1}): There is a significant difference in*at least*one of the means

In this case, there are three independent sample sets with ‘effectiveness’ as the dependent variable. In a typical application of ANOVA, the null hypothesis is that all groups are simply **random samples of the same population**.

For example, when studying the effect of different treatments on similar samples of patients, the null hypothesis would be that all treatments have the same effect (perhaps none). Rejecting the null hypothesis would imply that different treatments result in different effects.

As with hypothesis testing, a significance value has to be chosen in order to determine the confidence level of the test. A level of 0.05 indicates a 5% risk of concluding that a difference exists when there is no actual difference. From this, a p-value can be generated.

- P-value < 0.05: The differences between some of the means are statistically significant (the null hypothesis is rejected).
- P-value > 0.05: The differences between the means are not statistically significant (there is insufficient evidence to reject the null hypothesis).

And that concludes our quick analysis and application guide to ANOVA! Join us next time for more statistics fun ðŸ™‚ In the meantime, wouldn’t it be a great idea to check out how to properly plot a scatter graph?