How samples and populations work for advanced statistics.
Parameters versus statistics
Statistics and parameters are related but distinct concepts in statistics.
A parameter is a fixed value that describes a population. It is a characteristic of a population that is usually unknown, but can be estimated using sample statistics. For example, the population mean (μ) of the height of adult humans is a parameter. It is the true average height of all adult humans, but it is not known exactly.
Statistics, on the other hand, are values that describe a sample. They are used to make inferences about the population from which the sample was drawn. For example, if we take a sample of 100 adult humans and measure their heights, we can calculate the sample mean (x̄) which is an estimate of the population mean (μ). The sample mean is a statistic, not a parameter.
Symbology for parameters versus statistics
When we talk about populations and samples, we use different symbology for each, even when we are talking about the same concept. Take variance as an example.
If it’s the population parameter, we use sigma squared – σ². but if it is a sample statistic, we use S². Both mean variance, which is a measure of how far away from the mean your data is spread out. So, make sure you use the correct symbols depending on whether you’re talking about populations or samples!
Here’s a table to make it easier!
Point Estimates
A point estimate is when we make a prediction or inference about our population based on our sample data. Consider for example, if you calculated the average mean height of a random sample of 1000 people taken from your town, which is your population of interest.
Then, you could use that mean as a point estimate for the entire population. So from only a small group of people, you can estimate the average height of your town, or even your entire country!
Interval Estimates
An interval estimate, like the confidence interval, gives you a little more room to move than a point estimate. Instead of saying ‘we estimate the average height of adult males in London is 175cm’, which is a point estimate – you can say ‘we are 95% certain that the average height of males in London is between 165 and 180cm’, which is the 95% confidence interval, an interval estimate.
Interval estimates are used to account for uncertainty – a sample can’t tell you everything about a population with 100% accuracy! Sometimes your 95% confidence interval can be so broad that you might not be able to make a decision based on the data, especially if getting it wrong is risky!
Sampling error
With any statistical test, you will always have some sampling error – that’s because your sample is always going to be smaller than the actual population, which introduces error into your estimates and predictions.
It means that there is a difference between your sample statistic, for example, the mean, and the population parameter. In the case of the mean, the population parameter it is estimated is called the ‘true average’.
Even the best subject design is subject to sampling error, which is why when we report estimates of population parameters, we often do so in terms of confidence intervals.
Using populations and samples in practice
The population is a larger group we are interested in studying. For example, all people in the USA. A sample is the portion of that group we were able to gather, like a randomly selected group of 100 Americans. An observation a value we collect about as part of a study – for example, that one participant’s name is James would be an observation
In statistics, we use our sample to make inferences about the population at large. This is done using summary statistics like the mean, median, and also confidence intervals.
We can measure things about our sample, like their height. And we can infer from that what the average height of all people in America might be. From our sample data, we can create a 95% confidence interval and we can infer that we are 95% sure that the average height of the American adult population is between 160cm and 183cm. What we cannot do is conclude that the average height is exactly 175cm, because there is always room for error in statistics.
We could do the same, and instead measure support for a new government policy or proposal. So, statistics can be very useful across a broad range of fields. This is why everyone can benefit from learning a little bit of statistics.