One thing that regularly stumps scientists is the handling of data. We seem to be very good at generating obscene amounts of it, but representing it meaningfully can be a little off-putting, unless you happen to be a bioinformatician. Let’s wet our toes with a simple One Sample t-test to see how we can easily incorporate statistical analysis into our work.

*Of course, the calculations involved can be done on a simple calculator but your task will be made much easier with the use of spreadsheet software (Excel) or more specialized tools which are available in most high schools and universities (Minitab, Prism, SigmaPlot, etc.). *

*Each will have its own tutorials on carrying out these tests and so this article will not be heavily technical but rather focus on the correct application of statistical testing.*

Table of Contents

## Table of Statistical Analysis

Test | When To Use | An Example |

1 sample t-test | Tests if the mean of a single population is equal to the hypothesized value | A lecturer claims that the mean time taken to complete a quiz is 1 hour. From a sample data set, can we reject this claim? |

2 sample t-test | Tests if the difference between means of two independent populations is equal to a hypothesized value | Does the mean quiz score of female students differ significantly from the mean quiz score of male students? |

paired t-test | Tests if the difference between means of dependent or paired observations is equal to a hypothesized value | The mean response time of adults before and after they have consumed alcohol. Is the difference significant enough to conclude that alcohol affects response time? |

ANOVA | Tests for statistical difference among means for more than two populations | Studying the effectiveness of three types of pain reliever: aspirin vs. tylenol vs. ibuprofen |

## When to Use a t-Test?

A t-test is a form of hypothesis testing that uses a set of sample data to test a hypothesis for the entire population. It is used when the population standard deviation (Ïƒ) is unknown and the sample size is small (n<30). In real-world samples, we don’t usually have a basis for knowing Ïƒ.

__Since the t-distribution becomes equivalent to the normal distribution (bell curve) when the sample size is large, the correct practice is:__

- If Ïƒ is known, use the normal distribution.
- If Ïƒ unknown:
- If n>30, use the normal distribution.
- If n<30, use a t-distribution.

A one-sample t-test can determine whether Î¼ (mu, the population mean) is equal to a hypothesized mean. The test uses s (sample standard deviation) to estimate Ïƒ (sigma, the population standard deviation). If the difference between xÌ… (x bar, sample mean) and the hypothesized mean is large relative to s, then the means are unlikely to be equal.

### Confidence Intervals

The confidence interval is usually defined before hypothesis testing. With a 95% confidence interval for xÌ…, you can be 95% confident that the returned range of values is contained within Î¼.

Generally, confidence intervals of 95% are used unless otherwise stated sometimes Î± (alpha, the significant level) is used to describe this (for 95% CI, Î± = 0.05).

### Assumptions

It is important to note that using the t-test for hypothesis testing requires the adoption of certain assumptions about the data being analyzed. If these assumptions are not met, then the conclusions obtained from the test cannot be validated. The assumptions for a one-sample t-test are:

- The sample must be random.
- Sample data must be continuous.
- Sample data should be normally distributed (although this assumption is less critical when the sample size is 30 or more).

## Example: One-Sample t-Test

For example, you want to determine whether the mean time for completing an online quiz is statistically different from the lecturer’s claim of 1 hour. Î¼, in this case, represents the mean time taken by the entire cohort of students to finish the quiz. However, our sample size consists of only 7 students.

Now we are interested if Î¼ is either equal to 1 hour or it is not. Therefore the possibilities can be encompassed within two hypotheses:

- The null hypothesis (H
_{0}): Î¼ is equal to 1 hour. - The alternative hypothesis (H
_{1}): Î¼ is not equal to 1 hour.

Using software to generate this data will yield several key parameters such as the sample mean, sample standard deviation, confidence interval, T-statistic and p-value. A sample data set (n=7) has been generated below:

### Test of Î¼ = 1 vs Î¼ â‰ 1 for n = 7

Variable | n | Mean | St Dev | 95% CI (lower, upper limit) | T-statistic | p-value |
---|---|---|---|---|---|---|

Hours | 7 | 1.271 | 0.355 | 0.943, 1.599 | 2.025 | 0.089 |

The key parameter here is the p-value (probability value), and answers the question ‘What is the probability that the sample mean calculated fulfills the null hypothesis, taking into account sample size and standard deviation?’

If the p-value is larger than 0.05 then the null hypothesis cannot be rejected (0.089 > 0.05). Therefore, we do not have enough evidence to suggest that the lecturer’s claim of the online quiz taking 1 hour is false.

#### About the Author

Sean is a consultant for clients in the pharmaceutical industry and is an associate lecturer at La Trobe University, where unfortunate undergrads are subject to his ramblings on chemistry and pharmacology.