Bayes’ theorem is a rule in probability and statistical theory that calculates an event’s probability based on related conditions or events. Scientists and medical professionals may use these calculations to determine the likelihood of various outcomes.
Here’s an explanation of how to use Bayes’ Rule in simple situations, and introduce the relationship between Bayesian and frequentist probability.
This Venn diagram shows the probability of two partially intersecting events, and , in a universe.
A Quick Reminder about Probability
Let’s review some basic principles about the probability, , of an event, ‘A.’
- We can think of events or as elements in a universe, ‘U‘, of events, so . The event ‘A’ might be singular (the clock strikes 3 o’clock), or a set of events (the clock strikes 3, 4 or 5 o’clock).
- For an event , the probability of that event is , and .
- The probability that the union of all possible events within a universe is exactly one; .
- The probability of the empty set, ‘Ø’, which is an impossible event, is exactly zero; P(Ø)=0.
- Two events, and , are mutually exclusive if and only if the intersection of and is the empty set; then . Note that the “intersection” relationship is commutative: the sequence of versus does not matter.
- If and are not mutually exclusive, then 0″ title=”P(A cap B) = P(B cap A) > 0″ class=”latex”/>.
- We write any event other than as ; and .
That’s enough background; now let’s derive Bayes’ Theorem.
Conditional Probability Derives Bayes’ Theorem
The conditional probability of event , given that event occurs, is written as . How can we calculate this value? Let’s simplify the MathWorld derivation a bit; you can find the full article in the Resources section of this article.
The formula for the combined probability of two events is . The probability that both events will occur equals the probability of itself, multiplied by the probability that will occur given that has occurred.
Since , we also have .
Then we rearrange the equation and solve for , to complete Bayes’ Theorem: as .
Forms and Notes about Bayes’ Rules
There are several forms of Bayes’ Theorem, which some call Bayes’ Rules.
Another version of Bayes’ Rule is .
This form emphasizes that the term adjusts the “prior” probability, , to give the conditional, or “posterior” probability .
Note that the conditional probability relationship, ‘|’, is not commutative; .
Also, let’s remember to avoid dividing by zero. These simple Bayes’ Theorem formulas require that event is possible, so 0″ title=”P(B) > 0″ class=”latex”/>.
As a reminder that the order matters, events for Bayesian probabilities are often written as the prior “Hypothesis” () and the “Evidence” (): so .
Another Form of Bayes’ Theorem
Sometimes we do not know the overall probability of the Evidence. Rather than , we have instead.
If we know P(E|H) and P(E|~H), we use the second form of Bayes’ Theorem: .
Again, we must not divide by zero.
A Common Use for Bayes’ Theorem
Doctors often use the logic of Bayes’ Theorem in diagnosing a patient: “What is the probability that this particular patient has influenza, based on the current flu epidemic, and on the symptoms she presents?”
Let’s say that, in a normal year, on any day during flu season, 10% of the population actually has influenza. During that time, Alice visits her doctor with several symptoms that are compatible with influenza. Let’s say that, in general, 70% of patients reporting these symptoms actually do have influenza. However, some other illnesses share the same symptoms. What should her doctor do?
The Bayesian approach assigns:
- P(H) is the probability of the hypothesis that Alice has influenza, if she were randomly selected from the population. During flu season, P(H)=10%. Therefore P(~H)=90%.
- P(E|H) is the probability of the evidence, that Alice has flu-like symptoms, given the hypothesis that she has influenza. Here, P(E|H)=70%.
- P(E|~H) is the probability of the evidence of these specific flu-like symptoms, given that Alice does not have influenza. Let’s say that her doctor knows that P(E|~H)=40%, since other diseases also present these symptoms.
- P(H|E) is the probability Alice has influenza, given the evidence. The doctor must calculate this.
Although the spreadsheet shows % signs, let’s just show decimal values in the text below.
We use the second form of Bayes’ Theorem and substitute the percentages from the list above:
- by substitution.
Therefore, Alice has a 16% chance of actually having influenza, based on her symptoms and the fact that those symptoms developed during flu season.
Next the doctor swabs Alice’s throat. Laboratory analysis of the swab indicates a specific strain of influenza, with P(E|H)=80% but P(E|~H)=1%. The doctor will use P(H)=16.279% as the new “prior” hypothesis, based on Alice’s symptoms as well as the current flu season for that value. The new calculation is:
- by substitution.
Now the doctor assigns a 94% probability that Alice does have influenza, and prescribes treatment accordingly.
What Affects Bayesian Probability?
Alice’s diagnosis highlights three issues with Bayesian Probability, as shown in the spreadsheet above:
- Even a large P(E|H) might make only a small adjustment to P(H|E), especially if P(H) is fairly small.
- Small values of P(E|~H) might adjust P(H|E) significantly.
- When performing follow-up tests on the same subject, we should use the most recent “previous result”, the posterior P(H|E), as the new “prior” probability P(H).
Even if Alice’s symptoms warranted P(E|H)=99%, her P(H|E) would still be below 22%. If, instead, her symptoms were more “unique” to influenza, with P(E|~H)=5%, then her P(H|E) would jump to over 60% even if P(E|H) were still 70%.
Bayesian Probability, “One in a Thousand”, and Screening Tests
Public service announcements remind us to take screening tests for gender-specific cancer every year, once we reach a certain age. Bayesian Probability guides this process, too. The second example uses completely made-up percentages so we can easily translate those numbers into “one in a thousand” phrases.
Let’s use a fictitious set of percentages:
- P(H) = 0.001%, the probability that a random person (of the right age and gender) has this cancer. In other words, only 1 random person has this cancer, of every 100,000 tested.
- P(E|H) =99.9%, the probability that the test will report positive for a person with this cancer. So 999 of 1,000 people, who already have this cancer, will have a true positive result. We need to make 100,000,000 random tests to find these 999 people.
- P(E|~H) = 10%, the probability that the test will report positive for a cancer-free person. So if we test 100,000,000 random people, and find 999 truly ill, there will be 99,999,001 who are cancer free. Then 10% of the cancer-free people will have a false positive result: that’s 9,999,900 of 100,000,000 random test subjects.
- P(H|E) is the probability to be calculated, that the person truly has cancer, based on the screening test.
In this example, a positive result from the screening test increases a person’s probability of cancer from P(H) = 0.001% to P(H|E) = 0.01%. So 99.99% of the positive results will be false positives.
We send the 10 million people with positive results from the screening test to the more specific diagnostic test. Here:
- P(H) = 0.010%, the probability that a person, who received a positive result from the first screening test, truly has cancer. This “new prior” is ten times higher than the “prior hypothesis” before the screening test.
- P(E|H) =80%, the probability that the diagnostic test will report positive for a person with this cancer. So 80 of 1000 people with cancer, will have a true positive result. We expect to test 999 cancer victims, so we should find 800.
- P(E|~H) = 0.001%, the probability that the test will report positive for a cancer-free person. Finally we can weed out the false positives. So we expect to test 9,999,900 “false positive” people, and find only 1000 (mathematcially, 999.99) false positives from the diagnostic test.
- P(H|E) is the probability to be calculated, that the person truly has cancer, based on the diagnostic test.
From the second spreadsheet, we find that P(H|E) = 89% after the diagnostic test.
Using the “one in a thousand” phrase reminds us that there are significant numbers of people who will be mis-diagnosed, even after the second test. Doctors should send the 1,800 people with positive results from the diagnostic test to yet another diagnostic test.
Medical researchers should try to improve both of these fictitious tests; the rest of the second spreadsheet shows how significant some improvements can be.
Frequentist versus Bayesian Probability
We state “frequentist probability” values for straight-forward probabilities P(H), P(E), or even for conditional probabilities such as P(E|H). Generally these share the common characteristic that people can observe the frequency of the events (, , or ) within a universe of test cases.
However, Bayes’ Theorem allows us to calculate the posterior P(H|E) from these observed frequencies.
Bayesian Probability also offers the use of the posterior P(H|E) when new evidence becomes available. Frequentist probability theory does not handle this in a natural way.
Finally, a Bayesian viewpoint states that the probability value expresses “confidence in a belief.” The frequentist viewpoint states that probability is the frequency of one type of event out of many trials: how many people are ill, or how often an ace is on top of a well-shuffled deck of cards.
Both frequentist and Bayesian probability provide value in testing theories in fields other than medicine. Both interpret information given by a Venn diagram with intersecting probabilities of two events. Note that, in this diagram, is much larger than in the first image; what would that do to ?
Comparing frequentist versus Bayesian probability in more detail requires another article.
Final Notes for Bayes’ Theorem
The use of Bayes’ Rule requires having some “prior” 0″ title=”P(H) > 0″ class=”latex”/>; preferably, with good justification. The formulas show that an “impossible” event, with , can never become possible, despite very convincing evidence. Bayesian Probability does not handle this situation in a natural way. Determining and justifying P(H) can be a significant challenge.
Using Bayes’ Rule along with the “one in a thousand” phrasing, can help avoid the “base rate fallacy.” The screening test in the second spreadsheet has a very high P(E|H). Many people instinctively think the high P(E|H) overwhelms the extremely low P(H), and mistakenly believe that a positive screening result definitely indicates illness. The base rate fallacy deserves a separate article.
Bayes’ Theorem has become an important tool for statisticians and scientists, along with the frequentist approach.
Decoding Science. One article at a time.