What's the probability you have cancer if you are a smoker?

Actually nobody can answer that question exactly, but it is a special case of the crucial general problem of how we update our beliefs when we discover new evidence.  And to do that properly you need Bayes Theorem, which we explain in purely lay terms here and mathematically here (we strongly recommend you read the lay introduction before proceeding).

You have some prior belief about something (let's call it A)  being true or not. For example, A might be your belief about a randomly selected person having cancer, or it might be your belief about Spurs winning the FA Cup next year.  Now you find out some new piece of information (let's call it B) which is relevant to your belief. In the cancer example, B might be the information that the selected person is a  smoker; in the Spurs example B might be the information that Spurs' star player will be unavailable for the next year due to injury.  

In the cancer case you feel intuitively that you need to revise your prior upwards (i.e. the probability should increase) and in the Spurs case
you need to revise your prior downwards (i.e. the probability should decrease). But by how much in each case?

Let's write the prior as P(A) meaning "the probability A is true".
What we want is a way to calculate "the probability of A is true given that B is true" - which we write as P(A|B).  This is also called the posterior belief.

Bayes Theorem is a  formula for calculating the posterior from the prior. It involves finding the probabilities of two other statements:

  1. One of these is your prior belief about B, i.e. P(B). In the cancer case P(B) is simply the proportion of the relevant population who are smokers. 
  2. The other is what is called the likelihood, namely the probability of observing the evidence given that the original statement is true. In other words this is P(B|A); in the cancer case, this is the probability that a person is a smoker if we know they have cancer.  We can find out P(B|A) simply by finding the proportion of known cancer patients who are smokers. 
Now Bayes theorem is simply the following fomula


So we get our posterior P(A|B) by multiplying the prior  P(A) with the likelihood p(B|A) and dividing by  P(B).

So, in the cancer case,
suppose the relevant population is the set of people coming into a chest clinic. Suppose that Ricky comes into the clinic for the first time.  Our prior that Ricky has cancer,  P(A), will typically be based on data from the clinic. If 10% of people who registered with the clinic have been diagnosed with cancer then our prior P(A)=0.1.

We should also know the proportion of registered patients who are smokers. Suppose it is 50%. Then P(B)=0.5.

Finally, suppose we know that 80% of patients diagnosed with cancer are smokers. Then P(B|A)=0.8.

So, using Bayes Theorem we can compute the posterior P(A|B) as

P(A|B) = (0.8 x  0.1) / 0.5 = 0.16

Thus, if we discover that Ricky is a smoker our belief in Ricky having cancer increases from 0.1 to 0.16. This is not a dramatic increase. If you had to put a bet on it, you still wouldn’t bet that that Ricky has cancer without any other evidence. In practice the results of diagnostic tests will provide further evidence for which you can use Bayes Theorem to revise your belief again.

You can download this model (right click and save as) and then open it 
in the AgenaRisk tool to see this example running.

The great thing about Bayes Theorem is that it can also be used in those very common cases where you don't have much statistical data, but you do have  subjective  judgements (possibly of experts). 

Return to Main Page Making Sense of Probability: Fallacies, Myths and Puzzles