Learn advanced concepts in statistics such as confounding variables, z-scores, and effect size
Independent variables versus dependent variables
Whenever you’re conducting an experiment to see how one thing affects another, you will have at least one Independent Variable – IV – and at least one Dependent Variable – DV. So what does that mean?
Your independent variable is the variable you think is the cause of your effect. For example, imagine that your hypothesis was that ‘playing violent video games leads to more aggression in teenage males’. In this example, your independent variable would be ‘teenage males playing violent video games’.
Your dependent variable is the variable that you think is dependent on the independent variable. In the above example, your dependent variable would be the ‘amount of violent aggression’. Put simply, you think that changes in the independent variable cause the dependent variable to change.
Generally, when we are conducting experiments, we are manipulating one variable, which is the independent variable and observing or measuring the effects on the dependent variable. In this case, we are making people play more violent video games and then testing their aggression! So, if in doubt, ask yourself, which one are you manipulating? That’s your independent variable.
Mediating variables
A mediating variable is a third variable that sits between two variables, being affected by the first, and also influencing the second – the first being your independent variable, and the second being your dependent variable.
As an example, if you’re studying the effects of sleep on educational achievement, alertness could be an intermediary variable. So there is a causal pathway that runs from amount of sleep to alertness to academic achievement.
The result here is that academic achievement could be improved by influencing the independent variable – getting more sleep – or by influencing the mediating variable, alertness, through the use of stimulants – like coffee.
Moderating variables
As an example, if you’re studying the relationship between stress and anxiety, one moderating variable might be time spent with friends and family – a support system. If they do have a support system, the relationship between stress and anxiety might be weaker than it is for people without one. There are many different moderating variables that can come into play. They can also be difficult to control for in your research. However, you can take them into account when interpreting your results if you are aware of them.
But another example that shows the influence that moderator variables can have on your study might be the relationship between education level and marital prospects. Men of higher education might be more likely to marry than men with lower levels of education. Whereas women with higher levels of education might actually be less likely to marry than women with lower levels of education. In this case the relationship is not weaker, but actually the opposite.
As you can see, moderator variables affect not only the strength of relationships but also the direction.
Confounding variables, an example
To understand confounding variables, let’s pose the question – can an ice cream commit murder?
What if I told you that as the sales of ice cream rise, so does the murder rate? That means there is a positive correlation between sales of ice cream and the murder rate and your statistical tests show that this relationship is significant. So does that mean that ice cream is causing violence?
Or, is there a confounding variable here at play? Well, most likely. And that confounding variable is temperature. As the temperature rises, more people are out and interacting instead of at home hiding from the cold. As a result, there is an increase in the murder rate. Perhaps people get a little hot and bothered, too. The sales of ice cream also go up when the temperature rises. So, temperature is a confounding variable that could cause you to wrongly blame ice cream for acts of violence.
Mediating variables versus confounding variables
A mediating variable is a variable that acts as a link between an independent variable and a dependent variable. The relationship is causal, and all variables are in fact related.
A confounding variable on the other hand is a variable that affects the relationship between two variables that share no causal relationship, two independent variables. The confounding variable makes it seem as if they are related, but they are not.
As you can see in the graphic below, a confounding variable influences or is correlated with the two independent variables below.
This is in comparison to the mediating variable, which sits within a causal chain between the independent and dependent variable.
Z-scores
The z-score tells you how far away a value is from either a known population mean or your sample’s mean. Specifically, the z-score tells you how many standard deviations away it is.
So what does that all mean?
Well, with the z-score, you can find the percentile that corresponds to your value. It’s really useful for figuring out how awesome you did on your test because you used Kinnu. For example, let’s say you score 180 on your test which is way above the class average of 150. Assume that your distribution has a standard deviation of 10. So what is your z-score? Well, in mathematical notation the formula looks like this:
In more plain English, it looks like this
z = (x – mean) / standard deviation, where x is the data point, mean is the mean of the dataset, and standard deviation is the standard deviation of the dataset.
So, we know your score is 3 standard deviations from the mean… Now what? Well that’s when we use a z-table. The Z-Table, which is a table of stored values that you can find online or in most scientific calculators, will show you that a value 3 standard deviations above the mean is in the 99th percentile – 99.87 – which means you scored better than 99.87% of test takers!
Z-tables
If you know how to calculate a z-score you’ll be able to compare your result to a z-table, to find the corresponding percentile value. As an example, if your test score was 145, but the average for your class was only 100, with a standard deviation of 15, then you could calculate your z-score and compare it to the z-table to find out just how smart you are.
Let’s look at that on the z-table…
First you take your first number – before the decimal – in our case, 3. Locate that on the y-axis – the vertical axis – and then locate your decimal, in our case 0, on the x-axis – the horizontal axis. The value at the intersection of these, is your percentile. In our case, your test score was higher than 99.87% of people!
Effect size
While statistical significance – the p-value – is influenced by the number of observations in your sample, the effect size is not. Effect size doesn’t care about how many observations you have. Effect size is based purely on the actual data – the measurements – not the number of measurements you have.
Common measures of effect sizes include Cohen’s d which is used whenever you’re comparing two means – for example, ‘does strength training for 5 weeks improve a basketballer’s vertical jump’ where you compare the mean jump height both before and after the 5 week strength training example.
Another measure of effect size you might already be familiar with is Pearson’s correlation coefficient. Again, it doesn’t matter how many observations you have, just how strongly correlated the data is. Pearson’s correlation coefficient measures the strength of the relationship between variables – like the relationship between time spent studying in Kinnu and test scores.
Interpreting effect size
The two main measures for effect size are Cohen’s d – for comparing the magnitude of the difference between two means – and Pearson’s r for finding the strength of a correlation. Neither tell you the significance of a relationship, but they do tell you the size or magnitude of it. This tells you how important an entry is in the context of your data.
Below is a rough guide as to what constitutes a small or a large effect size. Just keep in mind that Pearon’s r can range between -1 and 1. To use the table below, just take the absolute value and ignore any minus sign.
Cohen’s d on the other hand can be anywhere between 0 and infinity! But it only takes a value of 0.8 or more for it to be considered a large effect size.
How effect size relates to sample size and power
While effect size is not dependent on your sample size, you can use it to calculate what sample size you might need to reliably detect an effect size of the desired magnitude for your study. Because your ability to detect a statistically significant difference – via your p-value – is related to your sample size.
For example, if you know you want to find a large effect size in your experiment, if one exists – then you will need enough observations in your sample so that the statistical test has enough power to detect an effect of that size. The effect is either in your data or it’s not. The difference of that size is either there or it is not. But, you need enough observations to uncover it, if it is there. The number of observations don’t create the effect size, but they do help you uncover it.
More observations gives your test more statistical power to detect these effect sizes. More statistical power requires more observations.