Build a strong data foundation by developing an understanding of variable types
Introduction to discrete variables
A discrete variable is something that is counted, but not measured. What could that possibly mean, you might ask? Well, when it comes to things you can measure, like kilograms, you can have a puppy that weighs 6.5 kilograms, but you can’t have 6.5 golden retrievers.
Weight is something you measure, as is height or distance, while things like people, the numbers on a die, and puppies are things that you count. Other discrete variables are things like models of car, or even a score on a 10-point scale.
We use discrete variables for counting the frequencies of different categories in the population, or separating our data into groups for comparison and analysis.
Continuous variables
Continuous data is measured but not counted. Think of a ruler – that’s a measuring tool and you can use it to measure objects. You could measure the length of the footlong sandwich at your local restaurant to see if it really is as long as they say it is. You might find that it’s 0.96547 feet. Or it could be 1.0002 feet.
If you have a really precise measuring tool, you can measure things to really high degrees of accuracy, and get a number like 21.2542 centimeters. Or zoom out with a less precise measurement, like 21 centimeters.
The point is that continuous data can take on a value to any level of precision, depending how accurately you measure it. This is in comparison to count data, for example, counting the number of new customers your business attracted this month – you just can’t have 21.2542 people.
Other examples of continuous data include things such as height, weight, the battery percentage on your phone, and how far you have to travel to work.
Ordinal categorical variables versus discrete numeric variables
Sometimes you might get confused between what is an ordinal categorical variable and what is a discrete numeric variable.
Common examples of ordinal categorical variables include things like star ratings. There’s no universally understood difference between one star and two stars. Is five stars necessarily five times as good as one star? Would you only pay twice as much for a restaurant with five stars as one with only two-and-a-half stars?
On the other hand, when it comes to discrete numeric variables, often called count data, we know that two cats is twice as many as one cat. We also know that the difference between three cats and four cats is the same as the difference between eight cats and nine cats.
Not all ordinal categorical variables are numeric, though. Consider for example things like education level, where we have the categories ranked in a clear order like ‘primary school’, ‘high school’, ‘college’. The data has a hierarchy, but the difference between each stage isn’t uniform or knowable, just like with our five star rating example.
Nominal data
Nominal data is categorical data, like your preferred mode of transport, be it bus, car, or bicycle, that cannot be ordered in a meaningful way. As a counterexample, the numbers on each side of a die can be ordered – 3 is greater than 2 and 2 is greater than one.
Sure, you could say that a bus is greater than a car and a car is greater than a bicycle, because each differs greatly in size – but it doesn’t work that way with data. While they appear bigger to you, they’re just categories of something. Now if you weighed them or took the volume of them then that’s another story, but then you’re dealing with continuous data.
For an easier example, let’s take the colors of cars. Blue is not greater than red, which is not greater than green. So you see how they’re all the same, but different. You can’t order them in any objectively meaningful way.
Actually ‘nominal’ comes from ‘name’. So, you name the things to help you remember and make it easier to identify. Like when the police are chasing a red ferrari. ‘Red’ and ‘Ferrari’ are both examples of nominal data. Good luck catching it though!
Categorical Variables
Categorical variables represent groups – for example classifications like brand of car, and hierarchical rankings like educational level. You might also hear categorical variables referred to as ‘qualitative variables’, as opposed to ‘quantitative variables’ which are continuous variables.
Types of categorical variables include ordinal data, nominal data, and binary data. With categorical variables we can organize our data into groups so that we can compare them between one another. As an example, grouping each product that you sell and plotting the revenue that it brings in with a pie chart.
Categorical variables like gender or age group are commonly used in statistical analysis, enabling comparison and testing of differences between groups for things like medical interventions and more.
Ordinal variables
As we already discussed, ordinal variables are categorical variables, but you can also order them meaningfully, often from high to low. Put simply, they have relative value but no value of their own accord.
One example everyone will be familiar with is ‘education level’. Sure, the names for the different stages of school differ all around the world, but there is one common theme – each stage is ‘higher’ than the previous stage. For example, primary school, middle school, and high school.
Another example is if I were to ask you how much you like chocolate milk. I could give you the options ‘extremely dislike’, ‘dislike’, ‘neutral’, ‘like’, and ‘extremely like’. Well, that’s ordinal data too, because it tells me how much you like something.
So I can say that you like chocolate milk more than, or less than, my neighbor Jenny. I can’t say that ‘green car more than blue car’ without sounding like a neanderthal and making no sense. That’s the difference between ordinal and nominal variables; ordinal values tell you about the relative qualities of units, whereas nominal ones only tell you about a quality of that unit.
Independent and Dependent Variables
When a statistician conducts a research experiment, they are typically interested in whether an independent variable will influence a dependent variable. An independent variable is a value that does not depend on any other value, and a dependent variable is a value that changes depending on how the independent variable changes.
For example, in a study on the effect of studying on test scores, the independent variable would be the amount of time spent studying, and the dependent variable would be the test scores. The researchers intentionally manipulate the independent variable – say by making people study more or less – and see how it impacts the dependent variable – in this case their test scores.
Independent vs Dependent
An independent variable is the cause. It is what influences the dependent variable, and it is independent of any other variables in your study.
The next time you are cooking something, consider this: the temperature you use is the independent variable, and cooking time is the dependent variable – the result of your manipulation of the independent variable.
Once the oven’s preheated, your cooking time cannot in and of itself have an effect on the temperature of the oven. However, the temperature you choose will have an effect on how long you need to cook your food for.
Identifying Independent Variables
Independent variables can be identified by meeting these two criteria.
One – it comes before the other variable in time, for example, students take a new brain enhancing energy drink before a test.
Two – the variable is manipulated in some way, or used as a method for grouping by the researcher. Consider for example that we either adjusted the dose of the new energy drink – which means we manipulated it – or we administered it to some students but not others – in which case we used it as a grouping method.
The goal of this research is to find out how this variable influences another variable. In our case, whether the energy drink influences test scores.
Identifying Dependent Variables
There are three simple criteria to check whether you are dealing with a dependent variable.
One – is this variable considered an outcome for your study. As an example, do you expect to find higher employee engagement as a result of your new company culture change initiative? Then employee engagement is an outcome.
Two – is this variable dependent on other variables in your study? In our case, employee engagement is dependent on the company culture change initiative.
Three – this variable gets measured after a change or manipulation is made to another variable. In the case of employee engagement, you want to test if it improves. So you measure it before you administer the culture change initiative. But you also measure it after.