Calculate advanced conditional probabilities using Bayes’ Theorem
The difference between Bayes’ Theorem and Conditional Probability
Conditional probability tells you the probability of one event happening based on whether another event is true or not. Conditional probability is used for simpler problems. Bayes’ Theorem is a structured formula for more complicated problems, with the ability to update as new information comes in.
But because Bayes’ Theorem contains a conditional probability in the numerator, it is important to understand how to calculate conditional probability in order to calculate Bayes’ Theorem.
The equation for Conditional Probability is as follows:
However, the formula for Bayes’ Theorem includes two conditional probabilities – one in the numerator, and one to be calculated – and looks like this:
Bayes’ Theorem can be used for problems such as knowing the probability that you have a disease given you got a positive test, when you need to take other information into account like the base rate of the disease in the population, and the test’s accuracy – which we use to update our prior probability.
About Bayes’ Theorem
Bayes’ Theorem is an extension of Conditional Probability, and at its core allows us to update the predicted probabilities of an event through incorporating new information, allowing for dynamic calculation of probabilities. It is calculated via the following formula, where P(A|B) is the probability of event A given that B has already happened.
Today, we have found uses for it in spam detection, risk detection, and more.
Algorithms such as Naive Bayes allow us to classify the emotional sentiment of text social media posts, or even recommend films we might like to watch next. Bayesian Neural Networks allow us to forecast stock markets, or perform facial recognition tasks.
Put simply, Bayes’ Theorem takes a result A and relates it to the conditional probability of that result given other related events. When false positives are involved, Bayes’ Theorem gives a more accurate assessment of risk. As an example, in medical testing where a positive result does not tell you your chances of having a disease without adjusting for the base rate of the disease in the population as well as the test’s accuracy.
Bayes' Theorem Priors, Posterior, and Likelihoods
The core idea behind Bayesian probability is to start with an initial belief, and become less wrong about our belief by updating it with new information as new information becomes available. Our initial belief is called the prior probability – P(A). For example, the base rate of disease prevalence in a population.
We get the posterior probability – P(A|B), for example the probability you have a disease given that you tested positive for it – by updating our initial belief with new information such as the likelihood – P(B|A), and P(B) – which in this example is the probability of getting a positive test whether you have the disease or not. P(B) is equal to:
The likelihood tells you the likelihood you will get result B, given that we already know that A is true. For example, this is the probability of getting a positive test result if you do in fact have the disease. Which in our example, is the test’s accuracy.
Bayes' Theorem example
Put simply, Bayes’ Theorem takes a result A and relates it to the conditional probability of that test result given other related events. When false positives are involved, Bayes’ Theorem gives a more accurate assessment of risk.
If a medical test that was 99% accurate returned a positive test for a rare and deadly disease for your pet, that affects 1% of the population, does that mean there is a 99% chance your pet has the disease? Actually, it doesn’t.
Let’s take a sample of 10,000 pets to show why.
First, we know that P(disease) = .01
We also know that the test is 99% accurate, therefore the 1 percent error rate will mean that out of the 100 diseased pets in our sample of 10,000 – 99 will test positive and one will not, this means that P(tested positive | has disease) = .99
But the error rate also means that 99 of the 9,900 non-diseased pets will test positive, too.
Therefore P(positive test) = (99+99) / 10,000 = 0.0198 – the formula to calculate for our example is as follows:
P(has disease | tested positive) = [P(positive test | has disease) * P(has disease)] / P(positive test)
Thus, P(has disease | tested positive) = (0.99 x 0.01) / 0.0198 = 0.50 = 50%.
This means that conditional on a positive test, there is not a 99% chance, but a 50% chance your pet has the disease.
Joint Probability
Joint probability is the intersection of 2 or more events. It tells you the chance that all events will occur simultaneously. However, it is important to note that the events must be independent of one another.
The best example of this is rolling two dice. How likely is it that you will roll two sixes? Neither event can influence the other; one die landing with 6 facing up does not influence the other die.
To calculate joint probability you will need to multiply the probability of event A, by the probability of event B. Of course, to do that you first need to know the probability of both A and B.
For a rolling two sixes in a game of dice, the probability of rolling a six is equal to:
Then, to calculate the probability of rolling two sixes, we simply multiply our probabilities together, like so:
To get the percentage chance, multiply your answer by 100.
So the probability of rolling two sixes is only 2.77%!
Sampling with replacement
Sampling with replacement is an important concept to understand when it comes to probability. This is because the concept can be used to help improve the quality of our point estimates like sample means via bootstrapping.
Let’s deal with an M&M’s factory – we want to know how many of each color the factory produces. Instead of counting them all, we take one out of a big bucket, record the color of it, and then place it back. We do this thousands of times, until we have the frequencies for each color.
The reason we place it back is so that the next trial is not influenced by the previous trial. This is because by completely removing a green M&M and eating it, we forever change the probability of drawing a green M&M from the bucket – there is one less now so it is less likely.
In fact, with sampling with replacement, you could take out the same green M&M hundreds of times.
Sampling with replacement is useful when you need to get an idea of the frequencies within a population, but couldn’t possibly count every single M&M at the factory!
Bootstrapping
Bootstrapping, through sampling with replacement, helps us measure uncertainty surrounding our point estimate, for example the sample mean, which is an estimate of the true population average.
When bootstrapping, our sample gets treated as if it were the population. And we take a sub-sample from that, and calculate the statistics of interest on that sub-sample. We then get a probability distribution of the sample means from each sample.
The key thing here is that your subsample is the exact same size as your original sample. If you’re wondering how that is possible without using the exact same sample each time, it is possible because each individual observation is sampled with replacement. This means that person 5, your observation, can be in your sub-sample any number of times, it could be 0, or it could be 10 times. Like in bootstrap sample 3 below, observation 5 was sampled twice.
Through bootstrapping, we can save a lot of time and money by not needing to gather entirely new samples, yet we can still get a reliable estimate of population parameters.
Unconditional Probability
Unconditional probability is otherwise known as ‘marginal probability’, and means that the next event is completely independent of the previous event.
Base rates, like the percentage of days in a year that it rains, are one form of unconditional probability. This is because they are not conditioned on any other event, like the presence of clouds.
Another simple example of unconditional probability is the coin toss. Assuming no outside forces are acting on the coin, no matter how many times we toss a coin, we know what the probability of getting heads, or rolling a 2 will be. This is unlike drawing from a deck of cards without replacing the card you just drew, which changes the future probabilities irrevocably.
How do you calculate unconditional probability? Easy!
For a slightly more complicated example compared to our coin flip… there are 4 aces in a deck of 52 cards, so to pull an ace out of a fresh deck of cards, your calculation will look as follows: