Probability sampling is sometimes also called ‘random selection’ or ‘random sampling’.
Probability sampling is sometimes also called ‘random selection’ or ‘random sampling’. ‘Probability sampling’ is just a fancy way of saying that everyone has an equal chance of being selected.
What does that mean for you?
Well, random sampling is preferable because it will generalize better. That means you can be more confident that the inferences you draw from your sample will also hold true in your population.
Obviously you want to get where you are going so random sampling helps you build a map you can be more confident in.
Non-probability sampling is also known as ‘non-random sampling’, which means that some people have a better chance of being selected than others. The selection process isn’t completely random.
Results from samples taken using non-random sampling don’t generalize as well to the population as results from samples taken using random sampling. So you might get led astray by your sample data. You could be directionally right but just out by a little distance – it’s like if you took the right street to get to the local pool and you just ran out of gas before you got there. Or you could be directionally wrong, in which case you ended up in a different town altogether.
So you’re feeling lazy. A convenience sample is a nonrandom sample that is mainly valuable because of how easy it is to take.
If you’ve been in a mall, sometimes you might see people there taking surveys. By doing this, they are using convenience sampling. They set up a booth, or stand around waiting for passers-by, and they approach people at random to collect data and opinions.
But, just because they’re approaching people randomly, don’t confuse convenience sampling with random sampling. Due to the fact that not everybody has an equal probability of being in the mall on that exact day, not everybody has a chance of being selected. Some people were at home that day, right? That means they weren’t in the mall and couldn’t be part of the study.
So, when you’re sampling based on who is easiest to access, you are using convenience sampling.
Complications with convenience sampling
There are some complications with convenience sampling. Consider for example, if you set up a booth at the local swimming pool to gauge how many residents use the swimming facilities – what could go wrong?
Obviously, it’s great that you thought to find people who have experience with the city’s swimming facilities.
But… doesn’t it all seem just a little bit too… convenient?
Consider for example the fact that people who enjoy the facilities will logically keep going to the swimming pool. But, people who don’t enjoy the facilities are more likely to stop going.
Convenience sampling like this has low external validity, which is a fancy way of saying that you can’t infer a lot of things about the opinions of everyone in your town – which is your population of interest – if you only sampled people at the pool because your sample isn’t representative of the entire population.
Voluntary response sampling
Voluntary response sampling means constructing a sample based on who volunteers to be a part of your study. It is another form of non-random sampling.
Sometimes not everyone understands why voluntary response sampling is non-random. After all, you don’t choose the people intentionally, and they kind of just volunteer at random, right? Well, voluntary response sampling is not random because the characteristics and motivations that inspire people to volunteer for your sample introduce bias.
Imagine if you asked people to volunteer their opinions on the old tree in the field being cut down to make way for new housing developments. Both environmental activists and passionate citizens would be very motivated to volunteer their opinions. Property developers too would perhaps participate. But other citizens who didn’t hold very strong opinions would be far less likely to volunteer.
Random sampling: Simple random sample
It’s simple, it’s random, and it’s a sample. It’s random sampling. What more could you want?
Basically, in simple random sampling, every member in your population – be it everyone in your company, or all hermit crabs on the beach – has exactly the same chance of being selected for your sample. One of the ways you can do this practically is by assigning everyone a number and using a random number generator to choose your participants.
As an example, consider that there are 1000 people in your company, you assign everyone a number, and use a random number generator to generate 100 numbers – anyone with those numbers is now part of your sample.
Simple random sampling is super fair, super random, and super simple. It ensures that everyone has exactly the same chance of being a part of your cool research study.
Random sampling: Systematic sampling
You might already be familiar with systematic sampling, in fact we are often introduced to it early, in school.
How does systematic sampling work? Did you ever have to choose teams for school sports, and number everyone 1 or 2? Well that’s how systematic sampling works. As an example, if you number every second person ‘2’ and you say ‘two’s are part of this sample’.
It doesn’t have to be every second person either, it could be every 15th, anything you choose.
Problems with systematic sampling and patterns in your data
Systematic sampling is typically a cheaper method than simple random sampling, but you need to make sure there are no patterns lurking in your population or data.
As an example, imagine if you had two classes in school and you told everyone to find a partner with someone from the other class, and then you said everybody in class A is number 1, and everybody in class B is number 2. In this case you had a pattern in your data and nobody in class A would get to be part of your sample. That causes ‘sampling bias’ which means our results won’t generalize to the population. And when conducting research, we typically want our results to generalize.
Random sampling: Stratified sampling
Stratified sampling involves dividing your population into groups called ‘strata’, and then sampling those strata using another random sampling method, like simple random sampling. It actually makes a lot of sense, even though it might seem a little more complicated than other sampling methods at first.
Stratified sampling is used when you want to ensure that every group, or strata, is properly represented in your sample – rather than it being completely random with something like simple random sampling.
An example of stratified sampling
Stratified sampling involves first dividing your population into strata, and then using random sampling on that strata.
As an example, let’s say that there are 8000 engineers in a company and 2000 office staff. You want to select a sample of 1000 people to find out what the company as a whole is like. To make sure your sample reflects the company perfectly, you separate your company into two groups, engineers and office staff.
Then within each group, you use a random sampling method such as simple random sampling to select exactly 200 office staff, and 800 engineers. That way you get a sample that is proportionate to your broader population.
There could be two groups, otherwise called stratas, like in the engineers and office workers example, or there could be 20 groups. Nobody is limiting you!
Random sampling: Cluster sampling
Cluster sampling is when you divide your population into clusters, and then select only some of those clusters at random. It is a probability sampling method, otherwise known as a random sampling method.
In cluster sampling, if a group is selected, then all of the members of that group will be included in the study. Members of the groups not selected at random will not be included in the study.
An example of cluster sampling
As an example of cluster sampling, imagine you work for a huge company and you’re the big boss with the corner office. One day, you’ve decided that you want to survey all of your company’s offices, of which there are 100 all across the country. Importantly, all offices are pretty similar. They have approximately the same amount of people, within the same number of roles.
You couldn’t possibly travel to every office to collect all the data you need. So you task your statisticians with coming up with a solution. They come back to you and say ‘we can use random sampling to select 30 offices, which we would label as 30 clusters, instead of sampling every single office, it’s much less work’. It’s important to note that in cluster sampling all your clusters should be similar – so if each office was for different departments, like engineering, sales, etc. then it wouldn’t work.
From here you could include everyone from the 30 offices in your cluster. Or, you could then cluster people again based on another characteristic – that is called multistage sampling because you sample in multiple stages.
How do stratified sampling and cluster sampling differ
Cluster sampling first divides the population into groups, then randomly selects a number of groups, and then includes all the members of those randomly chosen groups in the study. So in cluster sampling, your group is not guaranteed to be part of the study, but if your group is randomly selected then you will definitely be a part of the study.
Stratified sampling on the other hand first divides a population into groups, and then randomly selects some members from all of the created groups. In stratified sampling, your group is guaranteed to be part of the study, but not all members in your group will be.
When to use stratified sampling and cluster sampling
Cluster sampling first divides the population into groups, then randomly selects a number of groups, and then includes all the members of those randomly chosen groups in the study. You should use it if you expect that all your clusters are homogenous, which means they are the same. Like different office locations of a data science company.
Stratified sampling on the other hand first divides a population into groups, and then randomly selects some members from all of the created groups. You should use stratified sampling when you expect that your groups are heterogeneous, which means they are different. For example, separating the maths majors from the humanities majors at university.