Conditional probability

In the introduction to Bayesian probability we explained that the notion of degree of belief in an uncertain event A was conditional on a body of knowledge K. Thus, the basic expressions about uncertainty in the Bayesian approach are statements about conditional probabilities. This is why we used the notation P(A|K) which should only be simplified to P(A) if K is constant. Any statement about P(A) is always conditioned on a context K

In general we write P(A|B) to represent a belief in A under the assumption that B is known. Even this is, strictly speaking, shorthand for the expression P(A|B,K) where K represents all other relevant information. Only when all such other information is irrelevant can we really write P(A|B).

The traditional approach to defining conditional probabilities is via joint probabilities. Specifically we have the well known 'formula':


 This should be really be thought of as an axiom of probability. Just as we saw the three probability axioms were 'true' for frequentist probabilities , so this axiom can be similarly justified in terms of frequencies:

Example: Let A denote the event 'student is female' and let B denote the event 'student is Chinese'. In a class of 100 students suppose 40 are Chinese, and suppose that 10 of the Chinese students are females. Then clearly, if P stands for the frequency interpretation of probability we have:

P(A,B) = 10/100 (10 out of 100 students are both Chinese and female)

P(B) = 40/100 (40 out of the 100 students are Chinese)

P(A|B) = 10/40 (10 out of the 40 Chinese students are female)

It follows that the formula for conditional probability 'holds'.

In those cases where P(A|B) = P(A) we say that A and B are independent.
If P(A|B,C) = P(A|C) we say that A and B are conditionally independent given C.
For a full discussion of these important notions see here and also the section on transmitting evidence in BBNs.