Sampling Methods

How to find a sample from your population.

Convenience sampling
Simple random sampling
Any number
Cluster sampling

Probability sampling

Probability sampling is sometimes also called ‘random selection’ or ‘random sampling’. ‘Probability sampling’ is just a fancy way of saying that everyone has an equal chance of being selected.

What does that mean for you?

Well, random sampling is preferable because it will generalize better. That means you can be more confident that the inferences you draw from your sample will also hold true in your population.

Obviously you want to get where you are going so random sampling helps you build a map you can be more confident in.

Non-probability sampling

Non-probability sampling is also known as ‘non-random sampling’, which means that some people have a better chance of being selected than others. The selection process isn’t completely random.

Results from samples taken using non-random sampling don’t generalize as well to the population as results from samples taken using random sampling. So you might get led astray by your sample data.

You could be directionally right but just out by a little distance – it’s like if you took the right street to get to the local pool and you just ran out of gas before you got there. Or you could be directionally wrong, in which case you ended up in a different town altogether.

Convenience sample

So you’re feeling lazy. A convenience sample is a nonrandom sample that is mainly valuable because of how easy it is to take.

If you’ve been in a mall, sometimes you might see people there taking surveys. By doing this, they are using convenience sampling. They set up a booth, or stand around waiting for passers-by, and they approach people at random to collect data and opinions.

But, just because they’re approaching people randomly, don’t confuse convenience sampling with random sampling.

Due to the fact that not everybody has an equal probability of being in the mall on that exact day, not everybody has a chance of being selected. Some people were at home that day, right? That means they weren’t in the mall and couldn’t be part of the study.

So, when you’re sampling based on who is easiest to access, you are using convenience sampling.

Complications with convenience sampling

There are some complications with convenience sampling. Consider for example, if you set up a booth at the local swimming pool to gauge how many residents use the swimming facilities – what could go wrong?

Obviously, it’s great that you thought to find people who have experience with the city’s swimming facilities.

”People

But… doesn’t it all seem just a little bit too… convenient?

Consider for example the fact that people who enjoy the facilities will logically keep going to the swimming pool. But, people who don’t enjoy the facilities are more likely to stop going. Therefore, you’ll have some bias in your sample.

Convenience sampling like this has low external validity, which is a fancy way of saying that you can’t infer a lot of things about the opinions of everyone in your town – which is your population of interest – if you only sampled people at the pool because your sample isn’t representative of the entire population.

Voluntary response sampling

Voluntary response sampling means constructing a sample based on who volunteers to be a part of your study. It is another form of non-random sampling.

Sometimes not everyone understands why voluntary response sampling is non-random. After all, you don’t choose the people intentionally, and they kind of just volunteer at random, right? Well, voluntary response sampling is not random because the characteristics and motivations that inspire people to volunteer for your sample introduce bias.

”People

Imagine if you asked people to volunteer their opinions on the old tree in the field being cut down to make way for new housing developments. Both environmental activists and passionate citizens would be very motivated to volunteer their opinions. Property developers too would perhaps participate. But other citizens who didn’t hold very strong opinions would be far less likely to volunteer.

Random sampling: Simple random sample

It’s simple, it’s random, and it’s a sample. It’s random sampling. What more could you want?

Basically, in simple random sampling, every member in your population – be it everyone in your company, or all hermit crabs on the beach – has exactly the same chance of being selected for your sample. One of the ways you can do this practically is by assigning everyone a number and using a random number generator to choose your participants.

”Hermit

 

Adrian Scottow, CC BY-SA 2.0, https://creativecommons.org/licenses/by-sa/2.0/, via Flickr

As an example, consider that there are 1000 people in your company, you assign everyone a number, and use a random number generator to generate 100 numbers – anyone with those numbers is now part of your sample.

Simple random sampling is super fair, super random, and super simple. It ensures that everyone has exactly the same chance of being a part of your cool research study.

Random sampling: Systematic sampling

You might already be familiar with systematic sampling, in fact we are often introduced to it early, in school.

How does systematic sampling work? Did you ever have to choose teams for school sports, and number everyone 1 or 2? Well that’s how systematic sampling works. As an example, if you number every second person ‘2’ and you say ‘two’s are part of this sample’.

”Number

 

It doesn’t have to be every second person either, it could be every 15th, anything you choose.

Problems with systematic sampling and patterns in your data

Systematic sampling is typically a cheaper method than simple random sampling, but you need to make sure there are no patterns lurking in your population or data.

As an example, imagine if you had two classes in school and you told everyone to find a partner with someone from the other class, and then you said everybody in class A is number 1, and everybody in class B is number 2. You would inadvertantly bias your sample, because there is a pattern behind the 2’s.

In this case you had a pattern in your data and nobody in class A would get to be part of your sample. That causes ‘sampling bias’ which means our results won’t generalize to the population. And when conducting research, we typically want our results to generalize.

Random sampling: Stratified sampling

Stratified sampling involves dividing your population into groups called ‘strata’, and then sampling those strata using another random sampling method, like simple random sampling. It actually makes a lot of sense, even though it might seem a little more complicated than other sampling methods at first.

Stratified sampling is used when you want to ensure that every group, or strata, is properly represented in your sample – rather than it being completely random with something like simple random sampling.

An example of stratified sampling

Stratified sampling involves first dividing your population into strata, and then using random sampling on that strata.

”Engineers

As an example, let’s say that there are 8000 engineers in a company and 2000 office staff. You want to select a sample of 1000 people to find out what the company as a whole is like. To make sure your sample reflects the company perfectly, you separate your company into two groups, engineers and office staff.

Then within each group, you use a random sampling method such as simple random sampling to select exactly 200 office staff, and 800 engineers. That way you get a sample that is proportionate to your broader population.

There could be two groups, otherwise called stratas, like in the engineers and office workers example, or there could be 20 groups. Nobody is limiting you!

Random sampling: Cluster sampling

Cluster sampling is when you divide your population into clusters, and then select only some of those clusters at random. It is a probability sampling method, otherwise known as a random sampling method.

In cluster sampling, if a group is selected, then all of the members of that group will be included in the study. Members of the groups not selected at random will not be included in the study.

An example of cluster sampling

As an example of cluster sampling, imagine you work for a huge company and you’re the big boss with the corner office. One day, you’ve decided that you want to survey all of your company’s offices, of which there are 100 all across the country. Importantly, all offices are pretty similar. They have approximately the same amount of people, within the same number of roles.

”An

You couldn’t possibly travel to every office to collect all the data you need. So you task your statisticians with coming up with a solution. They come back to you and say ‘we can use random sampling to select 30 offices, which we would label as 30 clusters, instead of sampling every single office, it’s much less work’. It’s important to note that in cluster sampling all your clusters should be similar – so if each office was for different departments, like engineering, sales, etc. then it wouldn’t work.

From here you could include everyone from the 30 offices in your cluster. Or, you could then cluster people again based on another characteristic – that is called multistage sampling because you sample in multiple stages.

How do stratified sampling and cluster sampling differ?

Cluster sampling first divides the population into groups, then randomly selects a number of groups, and then includes all the members of those randomly chosen groups in the study. So in cluster sampling, your group is not guaranteed to be part of the study, but if your group is randomly selected then you will definitely be a part of the study.

Stratified sampling on the other hand first divides a population into groups, and then randomly selects some members from all of the created groups. In stratified sampling, your group is guaranteed to be part of the study, but not all members in your group will be.

When to use stratified sampling and cluster sampling

Cluster sampling first divides the population into groups, then randomly selects a number of groups, and then includes all the members of those randomly chosen groups in the study. You should use it if you expect that all your clusters are homogenous, which means they are the same. Like different office locations of a data science company.

”A

Attribution: Datawheel, CC0, via Wikimedia Commons

Stratified sampling on the other hand first divides a population into groups, and then randomly selects some members from all of the created groups. You should use stratified sampling when you expect that your groups are heterogeneous, which means they are different. For example, separating the maths majors from the humanities majors at university.

You might also like

Problems with Data;

Working with data can be difficult – avoid common data traps in your analysis.

Properties of Your Data;

Learn the fundamental metrics required to interpret your data like a pro.

Probability Functions;

Step into the world of probability distributions – learn how real world events are modeled and visualized.

Correlations between variables;

The different types of correlation, and how to identify them.

Leave a Reply

Your email address will not be published. Required fields are marked *

Scan to download