Difficulties in assessing variance, covariance, and correlation


It has been shown that people have great difficulty estimating statistical variance. Estimates are influenced by the mean of the stimuli. Instead of estimating variance, it is the coefficient of the variation (the standard deviation divided by the mean) that is estimated. The explanation given by Peterson and Beach (1967) is as follows:

"Think of the top of a forest. The tree tops seem to form a fairly smooth surface, considering that the trees may be 60 or 70 feet tall. Now, look at your desk top. In all probability it is littered with many objects and if a cloth were thrown over it the surface would seem very bumpy and variable. The forest top is far more variable than the surface of your desk, but not relative to the sizes of the objects being considered."

Experiments concerning bivariate observations include people's ability to recognise functional relationships presented in simple 2 x 2 contingency tables. Typically, these summarise the number of instances of the presence and absence of some variable X, apparently associated with the presence or absence of some variable, Y (eg. X as a disease and Y as a symptom). In many cases, it is found that people's judgemental strategies ignore one or more of the four cells. Most commonly, there is a virtually exclusive reliance on the size of the "present-present" cell relative to the entire population. In other words, if there are more people with the disease that also have the symptom than those with the disease but without the symptom, then the conclusion is that the relationship is positive. But valid inferences in such cases can only be made by considering all of the four cells, for example by comparing the proportion of diseased people showing the symptom with the proportion of non-diseased people also showing the symptom.

A different example of this in terms of everyday inference concerns answers to the question: "Does God answer prayers?"

If you consult the "present-present" cell only, you may answer "yes", if when you've asked God for something it has happened. The skeptic may query this asking how often you had asked for something and it did not happen. But this comparison of only two cells is inadequate. Although it seems crazy, data from the "absent-absent" cell i.e. things did not happen and that were not prayed for must still to be considered, as well as the things that did happen and were not prayed for.

Investigating covariance assessment, experiments have also compared data-based correlation estimates where the data is pairs of numbers or sounds, and theory-based estimates where no data is presented. Subjects used a simple rating scale to describe their subjective impression of the strength and direction (positive or negative) of relationships between pairs of variables.

It was found that for data-based experiments, subjects had difficulty recognising positive relationships with correlations of less than 0.6-0.7. Correlations in the range of 0.2-0.4 were barely detected and correlations of 0.6-0.8 were underestimated. Only correlations of over 0.85 were consistently rated positive. With theory-based estimates on the other hand, positive and negative correlations were correctly recognised and so were the relative magnitude of the correlations. In the theory-based experiments, covariation estimates were based on subjects' prior expectations or theories rather than any immediately available data. eg. alternative measures of honesty, or personal attitudes, habits or preferences. These were compared with "objective" correlations taken from previous empirical studies.