Statistical Determination

Statistical Determination
BBNs can be used as valid representations of statistical determination. Under statistical determination the probabilities of events are determined by the chance of experiencing or selecting a particular event from a population of possible events or from a stochastic process. There are obvious overlaps between statistical and causal models of determination. Statistical models of determination are subsumed by causal models since causal models must also admit to chance (subjective probabilities might reflect the randomness of experience).

Statistical models employ statistics that measure aggregates of multiple instances of individual phenomena. E.g. means, medians and standard deviations are used to characterise populations and samples from populations. Such statistics do not represent direct physical and hence objective quantities and as such do not offer causal explanations for individual events. Also parameters in statistical distributions might also fail to admit to physical interpretation since they may merely be mathematical contrivances.

Example: a piece of software may be subjected to repeated demands. When a demand fails the failure is noted and the testing continues. Each demand is a Bernoulli trial and is sampled without replacement from an infinitely large population of possible demands. The chance of m failures from n demands is defined by the binomial distribution with parameter p, probability of failure per demand.

Here the probability of m failures is wholly determined by the chance, p of encountering m failures in n demands sampled. Each individual failure may have been caused by combinations of particular faults and triggering events but this is not admitted within the statistical model. Instead we can reason either about the distribution of probability of failures for the sample or individual statistics characterising the distribution, such as sample variance.

In statistical models the probabilities are wholly defined by the parameters of the statistical model adopted. Other variables cannot be admitted except by extending the model by conditioning the parameters on other, perhaps causal, variables.

Example: Consider again the example above. We might chose to condition the probability of failure, p, on the faults in the software and the probability with which those faults would be triggered. In this way we extend the statistical model by introducing prior variables which might be considered causal in nature.

Despite the differences between causal and statistical determination there are no special dangers presented by considering statistical and causal determination to be identical. However the clear advantage of statistical models is that they can be used to generate the node probability tables in a BBN using chances generated by the model. Causal determination might complement this advantage by providing sensible interpretations for the parameters.

When encoding statistical distributions in BBNs we must be careful about the introduction of contradictions. For instance, when using the binomial distribution the proposition

p(n+1 failures | n trials, p)

is a contradiction, since we could never have more failures than trials. Under these circumstances

p(n+1 | n trials, p)

would be set to zero since contradictions are impossible.