*How to use and present data and numbers to prove your point.*

## Why Use statistics?

According to Arthur James Balfour, “there are three kinds of falsehoods: lies, damned lies and statistics.” But what did he mean by this? Well, he, along with others, argues that it’s possible to prove anything by the misleading use of statistics. In fact, there are a number of influential books that have been written on it.

However, the fundamental use of statistics when arguing is to prove that something is true. Moreover, you can use statistics as a yardstick to measure the relative importance of problems. Imagine that you’re the director of a medical research company who’s looking to invest in a new treatment. You’d far sooner try to treat Covid, a disease affecting millions, than to treat an extremely rare type of elbow injury that only affects a hundred thousand people globally.

## Competing statistics

However, often 2 people engaged in an argument will both have their own competing statistics. One person will say one thing and back it up and another will produce a similar statistic for the other side. But how does the audience distinguish between the 2 competing statistics?

Well, there are 2 things that they evaluate. First of all, they evaluate the credibility of the facts. Does it fall in line with what they think the rough number would be? Does it come from a reliable source? Often, statistics will hit home more effectively when they come from a source trusted by the audience. While liberal audiences tend to favor the news channel CNN, conservative ones prefer Fox News. Similarly, while establishment supporters like to use government statistics, anti-establishment figures are more critical of them.

Secondly, people can be more or less persuaded by a statistic based on its presentation. Has it been contextualized? Is it easy to visualize? Is it attention grabbing? Do they feel it’s relevant to them? Often, this is where battles over statistics are won and lost.

## What Can Humans Compute?

It can be difficult for people to visualize numbers. It is easy for us to think of 10 people – you would just imagine a recent birthday party or gathering. It is even possible to visualize hundreds. For those who often go to concerts and sports games, it might even be within the realm of possibility to visualize thousands. But the human brain has not evolved to visualize any numbers bigger than that.

This is a problem spoken about by EH Gombrich. In his *Little History of the World*, he says it’s almost impossible for us to imagine the passing of millions of years because we have no personal experience to compare it to. Similarly, it is impossible for us to visualize a million people, or 18 billion coffee grinds because we’ve never had any relatable experience.

As a result, the presentation of statistics, both in audible and written forms, is almost as important as those statistics themselves.

## Understanding the Small

One of the things that’s difficult to imagine is things which are very small. For example, while we can picture a yard, it is more difficult for us to imagine a nanometer. Even if we know there are a million nanometers in a millimeter, we still can’t imagine it very easily because we’ve never seen something that size.

For example, imagine that you were arguing about the safety of air travel. Saying that in 2009 there were 763 air travel accidents might make it seem frequent[6]. In fact, if you presented the statistics in isolation, you might be led to believe that airplanes are very dangerous. However, if you were also told that roughly the same number of people died by bedsheet strangulation[7] each year, it makes the number seem much smaller. After all, nobody refuses to go to sleep.

Moreover, if you look at the total number of air journeys, the statistic is further contextualized. In 2009, there were 29.5 million air journeys in the world[8]. That means that 0.002% of journeys ended in death. That’s a far cry from the 763 that looked like such a big number earlier.

## Understanding the Large

The most common way of visualizing large statistics is by contextualizing them. For example, we can’t picture 1.4 million people identifying as gay in the UK[9]. However, if you were told that 1 in 48 people in Britain self-identified as gay, it is far easier to picture. Everyone knows 50 people so they can visualize the statistics.

Similarly, it would be difficult for us to understand the significance of the fact that 375,000 cars sold each year are electric1[10]. In fact, it might seem like a huge number. However, if we were told that only 1 in 5 cars are electric, it helps us get a better picture of the environmental impact of new cars sold.

Typically, finding another number to divide the large number by is key to understanding its relative significance, and showing it to a population.

## True Statistics Are Not Always Believable

Another strange thing about statistics is that they aren’t always believable just because they’re true. A Stanford study showed that, even once told something is true, people are still prone to not believing it if it doesn’t fit with their expectations[11]. Additionally, some statistics which are true, people don’t believe. For example, 41% of Americans think that humans co-existed with dinosaurs, despite us having missed each other by 64 million years[12].

As a result, it is often important to help explain the methodology behind the statistics obtained. Rather than just spouting the statistic and offhandedly attributing it to its source, it can be useful to explain the methodology behind its calculation and finding. That way, people are more likely to believe it.

## Multiple Attribution and Brexit NHS

One common statistical fallacy is multiple attribution. This occurs when people take a statistic and try to state that it can be used for multiple things.

In the 2016 Brexit referendum, the VoteLeave campaign said that Britain’s exit from the European Union would save the country £350 million[13]. That’s equivalent to 10,500 new nurses, 13,000 policemen or approximately 2 hospitals. The problem with this is that, because of the multiple contextualisation, people imagined it going to the nurses and the policemen and the new hospitals. This led to the satisfaction of multiple different interest groups – those who wanted increased policing and those who wanted increased medical funding. As a result, both groups largely voted for the leave campaign. However, the money cannot be used twice.

By providing multiple uses for a resource – whether it be money or barrels of oil – you can make it seem larger than it actually is.

## Correlation and Causation

Another common statistical fallacy is assuming that correlation always implies causation. However, this is not always the case. Sometimes, 2 statistics might have similar patterns but have no real world relation to each other. We call these ‘spurious correlations.’

For example, there is a high mathematical correlation between consumption of mozzarella cheese and the number of civil engineering doctorates awarded[14] from 2000 to 2009. However, this does not mean that eating mozzarella increases the quality of civil engineering education. The coincidence of the 2 statistics is simply that – a coincidence. Similarly, there is a 95.24% correlation between the number of people who drowned after falling out of a fishing boat and the marriage rate in Kentucky[15]. However, there is little evidence to suggest that people are getting married in Kentucky because people are falling out of fishing boats.

As a result, it is important, when being presented with a statistic, to think about whether or not it is possible that a correlation was caused simply by the benefit of coincidence, or whether it is actually real mathematically.

## With or Without Mechanization

One way of getting around the spurious correlation fallacy is to show causal links. When you outline a series of steps from one thing that happened to lead to it causing another, we call that process ‘mechanizing…’ For example, you might ‘mechanize’ that the reason why an increase in sports car ownership led to more crashes is because sports cars drive faster and are harder to control. By explaining the reason why 2 statistics might have a correlation, you can show that the correlation isn’t a spurious one.

Typically, if a statistic is the kind that has an obvious link, explaining the causality is not necessary. However, if your concept is slightly further afield – such as explaining why the rise in domestic cats is lowering the spread of avian disease across continents, you might need to fill in the blanks with the extra steps. The 2 statistics look like a coincidence unless you explain that domestic cats are eating the birds; therefore, stopping them from migrating.

In short, a good rule of thumb is that if your statistics require more than 1 link, it is usually a good idea to mechanize the link.