Hypothesis Testing

Learn the foundations of statistics – the foundational methodology of science’s greatest achievements.

What is the purpose of hypothesis testing?

To test a prediction to prove it is correct

What type of design is used to compare two different groups under different conditions?

Mixer or Factorial design

What is the term for when we mistakenly reject a null hypothesis?

Type I errors

What type of test would be used to determine if the mean depression score for a treatment group is significantly lower than the mean depression score for the control group?

One-tailed t-test

What is a hypothesis?

A hypothesis is a proposed explanation or prediction about a phenomenon or event that can be tested through further investigation and experimentation. It is an educated guess or an assumption based on prior knowledge, observations, and logical reasoning. In scientific research, a hypothesis is a tentative statement that can be either confirmed or refuted based on empirical evidence.

A well-formulated hypothesis includes a clear and testable statement, a prediction about the expected outcome of the experiment, and a proposed explanation of why the predicted outcome would occur.

The process of testing a hypothesis involves collecting data, analyzing it, and drawing conclusions about whether or not the hypothesis is supported by the evidence.

For example, you could formulate a hypothesis that the average Kangaroo is taller than the average 12 year-old child, and then collect data. These data might either support or refute this hypothesis – though it is important to note this is not the same as *proving* a or “disproving* a statement.

Formulating hypotheses

The formulation of hypotheses, and then testing of them, in a replicable and repeatable manner, is the foundation of the scientific method. For an experiment to be considered valid, other people need to be able to do the same experiment and get similar results.

You might start with the prediction that eating a full and balanced breakfast is good for learning. This is all well and good, but how are you going to test that? Well, you’ll need to formulate a hypothesis!

Hypothesis: ‘Grade 8 students who eat a balanced breakfast of wholegrains and milk will perform better on a mathematics test than students who do not eat breakfast’.

Do you notice how the hypothesis is much more specific than the prediction? That’s because it has to be. So, a hypothesis is a very specifically worded and formulated prediction, that you will be able to reliably and consistently test.

The Null and Alternative Hypothesis

Two key terms you need to know when it comes to hypothesis testing are the null hypothesis and the alternative hypothesis. The null hypothesis states that there is no difference between the samples we are testing – for example, placebo and treatment groups for a new medicine. The alternative hypothesis states that there is a difference.

In our kangaroo experiment, the null hypothesis is that there is no difference in height. When testing hypotheses, we can only ever reject the null hypothesis or fail to reject the null hypothesis. We can never accept the null or the alternative hypothesis.

This is because we have gathered evidence against the null hypothesis, evidence that either succeeds in allowing us to reject the null or not – and because there is always error in statistics, we cannot prove either hypothesis for certain.

Statistical tests can be used to test effects of interventions – like new medicines – existing differences between populations – like whether people in Germany are taller than people in the UK – and correlations – like the strength of the relationship between fossil fuel consumption and global average temperatures.

Study design for hypothesis testing

The two main methods for testing hypotheses on samples are between-subjects and within-subjects study designs. Between subjects tests the difference between Jack and Jill. Within Subjects tests the difference within Jack before and after he fell down the hill. The third method is mixed – or factorial – design, which is a blend of both between and within subject design.

With a between-subjects study design, you have two different groups and you compare their outcomes, like people who take a new treatment drug and people who don’t.

Between subjects design is useful when you don’t want to introduce things like learning effects, which is possible with within-subjects design.

With a within-subject design, you test changes within the same person, or observation. For example, if you compare somebody’s 100m sprint time before and after taking a new energy drink, you are doing a within-subject test.

Put simply, it means testing the same sample twice under different conditions.

Within-subject design is useful because it requires fewer participants. It also improves the chance that you find a true effect of your independent variable.

Mixer or Factorial design

Mixer – or factorial – design means testing different groups under different conditions.

For example, you might compare 100m sprint time on first attempt and second attempt for two different groups – people who tried a new energy drink between attempts 1 and 2, and people who didn’t take the energy drink.

That way you can test if it was the energy drink that is responsible for any effects, or if people are just slower/faster on their second attempt.

Type I errors in hypothesis testing

There are two main types of error that you can make when testing your hypothesis – type I errors and type II errors.

First, you have type I errors. These are where we mistakenly reject a null hypothesis. In other words, we claim that something is true when it is false.

These are also known as false positives – for example, when you tell everyone about your great new study which found that this cool new supplement increases your test scores by 10%. But actually your sample size was small – it turns out you were wrong. So you’ve gone and claimed a positive correlation incorrectly.

Type II errors in hypothesis testing

A type II error occurs when we mistakenly accept a null hypothesis. In other words, it’s when we think something is false when it is actually true. This is also known as a false negative.

For example, let’s say a new drug is being tested to see if it helps lower cholesterol levels. The null hypothesis is that the drug has no effect on cholesterol levels. The alternative hypothesis is that the drug does have an effect on cholesterol levels. If we fail we accept the null hypothesis when the alternative hypothesis is true, we have made a type II error – the drug does in fact influence cholesterol levels.

Confidence intervals

Confidence intervals are a way to express the level of uncertainty around a statistic, such as a sample mean. They provide a range of values that is likely to contain the true population parameter with a certain level of confidence.

For example, if you take a sample of students and calculate their average test score, you can use a confidence interval to express the range of scores that you expect the average test score for the entire student population to fall within.

A common level of confidence used is 95%. This means that if you were to repeat the process of taking a sample and calculating a confidence interval many times, about 95% of the intervals would contain the true population parameter.

The P-value

A P-value is a way to help you decide whether the results of a study are strong enough to support a certain conclusion. The P-value is a number between 0 and 1 that tells you the likelihood that the results you are seeing are just due to chance. The smaller the P-value, the stronger the evidence that the results are not due to chance.

For example, if a P-value is 0.05, that means there is a 5% chance that the results you’re seeing are just due to chance. So, if the P-value is less than 0.05, you can be pretty sure that the results are not due to chance and are instead due to something else, like the treatment you’re testing. But if the P-value is greater than 0.05, you can’t be sure that the results are not due to chance and you can’t say for sure that the treatment caused the effect.

The P-value & Hypothesis testing

We can never prove the alternative hypothesis to be right. But, we can reject the null hypothesis. Which means we are certain, with a p-value and level of statistical significance to back it up, that there is a difference.

Our level of significance is something we set ourselves before we conduct our study, but common values include .01 and .05. For example, if we set .05 as our p-value, and from our study we get a p-value of less than .05 – that tells us there is a less than 5% chance that our data came from a population where the null hypothesis is true, we are quite certain that there is a difference – so we can reject the null hypothesis.

For example, imagine a study comparing the heights of men and women. Our null hypothesis would be that there is no statistically significant difference. It is the P value that we use to determine whether differences are statistically significant. In other words, if we have a P value of less than .05, we say that there is less than a 5% chance our results are due to chance.

Statistical power

Statistical power is a measure of how likely a statistical test is to detect an effect, if one exists. It is something you calculate before you run your analysis – to ensure that you have enough data to draw the conclusions that you would like to be able to draw from your test.

There are four things you need to know when it comes to calculating the power of your test – the type of statistical test you plan to use, the significance level you are using, the sample size you are planning to use, and the effect size you want to be able to detect.

With this information you can calculate the power of your study design to achieve the aims you would like it to.

If your power is not high enough, you need to either recruit more participants or find more data/observations – increase your sample size – or lower the expectations for your study and its ability to detect large effect sizes or find very statistically significant results.

One-tailed versus two-tailed tests

A one-tailed t-test and a two-tailed t-test are both statistical methods used to determine whether a sample mean is significantly different from a known or hypothesized population mean. The main difference between the two tests is the direction of the difference that is being tested.

A one-tailed t-test is used when the research hypothesis predicts the direction of the difference (i.e., whether the sample mean is greater or less than the population mean). It is also known as a directional test. For example, a one-tailed t-test could be used to test whether a new drug is more effective than a placebo.

A two-tailed t-test, on the other hand, is used when the research hypothesis does not predict the direction of the difference. It is also known as a non-directional test. For example, a two-tailed t-test could be used to test whether a new drug is different from a placebo, without specifying whether it is more or less effective.

One-tailed t-tests, an example

A one-tailed t-test is a statistical method used to determine whether a sample mean is significantly different from a known or hypothesized population mean. The test is used when the research hypothesis predicts the direction of the difference (i.e., whether the sample mean is greater or less than the population mean).

For example, let’s say a pharmaceutical company wants to test the effectiveness of a new drug for treating depression. They randomly assign patients to either the treatment group (receiving the new drug) or the control group (receiving a placebo).

After 8 weeks, the researchers measure the severity of depression using a standardized scale. They can use a one-tailed t-test to determine if the mean depression score for the treatment group is significantly lower than the mean depression score for the control group. In this case, the research hypothesis would be that the new drug is effective in reducing depression.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Hypothesis Testing

What is a hypothesis?

Formulating hypotheses

The Null and Alternative Hypothesis

Study design for hypothesis testing

Mixer or Factorial design

Type I errors in hypothesis testing

Type II errors in hypothesis testing

Confidence intervals

The P-value

The P-value & Hypothesis testing

Statistical power

One-tailed versus two-tailed tests

One-tailed t-tests, an example

You will forget 90% of this article in 7 days.

Leave a Reply Cancel reply

Hypothesis Testing

What is a hypothesis?

Formulating hypotheses

The Null and Alternative Hypothesis

Study design for hypothesis testing

Mixer or Factorial design

Type I errors in hypothesis testing

Type II errors in hypothesis testing

Confidence intervals

The P-value

The P-value & Hypothesis testing

Statistical power

One-tailed versus two-tailed tests

One-tailed t-tests, an example

You will forget 90% of this article in 7 days.

You might also like

Leave a Reply Cancel reply