What's the probability you have cancer if you are a smoker?

Actually
nobody can answer that question exactly, but it is a special case of
the crucial general problem of how we update our beliefs when we
discover new evidence. And to do that properly you need Bayes Theorem, which we explain in purely lay terms here and mathematically here (we strongly recommend you read the lay introduction before proceeding).
You have some prior belief about something (let's call it A) being true or not. For example, A
might be your belief about a randomly selected person having cancer, or
it might be your belief about Spurs winning the FA Cup next year.
Now you find out some new piece of information (let's call it B) which is relevant to your belief. In the cancer example, B might be the information that the selected person is a smoker; in the Spurs example B might be the information that Spurs' star player will be unavailable for the next year due to injury.
In the cancer case you feel intuitively that you need to revise your
prior upwards (i.e. the probability should increase) and in the Spurs
case you need to revise your prior downwards (i.e. the probability should decrease). But by how much in each case?
Let's write the prior as P(A) meaning "the probability A is true".
What we want is a way to calculate "the probability of A is true given that B is true" - which we write as P(A|B). This is also called the posterior belief.
Bayes Theorem is a formula for calculating the posterior from
the prior. It involves finding the probabilities of two other statements:
- One of these is your prior belief about B, i.e. P(B). In the cancer case P(B) is simply the proportion of the relevant population who are smokers.
- The other is what is called the likelihood, namely the probability of observing the evidence given that the original statement is true. In other words this is P(B|A); in the cancer case, this is the probability that a person is a smoker if we know they have cancer. We can find out P(B|A) simply by finding the
proportion of known cancer patients who are smokers.
Now Bayes theorem is simply the following fomula
So we get our posterior P(A|B) by multiplying the prior P(A) with the
likelihood p(B|A) and dividing by P(B).
So, in the cancer case, suppose
the relevant population is the set of people coming into a chest
clinic. Suppose that Ricky comes into the clinic for the first time.
Our prior that Ricky
has cancer, P(A), will typically be based on data from the
clinic. If 10% of people who registered with the clinic have been
diagnosed with cancer then our prior P(A)=0.1.
We should also know the proportion of registered patients who are smokers. Suppose it is 50%. Then P(B)=0.5.
Finally, suppose we know that 80% of patients diagnosed with cancer are smokers. Then P(B|A)=0.8.
So, using Bayes Theorem we can compute the posterior P(A|B) as
P(A|B) = (0.8 x 0.1) / 0.5 = 0.16
Thus, if we discover that Ricky is a smoker our belief in Ricky having cancer
increases from 0.1 to 0.16. This is not a dramatic
increase. If you had to put a bet on it, you still wouldn’t bet
that that Ricky has cancer without any other evidence. In practice
the results of diagnostic tests will provide further evidence for which
you can use Bayes Theorem to revise your belief again.
You can download this model (right click and save as) and then open it in the AgenaRisk tool to see this example running.
The great thing about Bayes Theorem is that it can also be used in those
very common cases where you don't have much statistical data, but you
do have subjective judgements (possibly of experts).
Return to Main Page Making Sense of Probability: Fallacies, Myths and Puzzles