MHPE 441: Medical Decision Making

Lecture notes: Week 11

Quantitative Judgment without Bayes

As we've said before, Bayes' Theorem provides a normative method for updating one's beliefs with new information. However, Bayes' theorem becomes hopeless when there are many, possibly correlated, cues to a judgment. In these situations, we turn to other models: regression, probabilistic mental models, and neural networks. This lecture discusses all three, though we're really only going to spend out time on regression, which is the most common.

Multiple regression

Multiple regression is a statistic procedure in which we try to express the variable we want to predict (which I'll call the "outcome" variable for short) as a function of the cues we know. We look for the function that either minimizes the (squared) error -- the difference between what we predict and the actual outcome, or maximizes the probability that the outcome would have resulted given the cues.

Most multiple regression is multiple linear regression. We assume that the function that relates the cues to the outcome is a linear function. That is, we predict the outcome by adding up some multiple of each cue. For example, we might predict a judgment of how attractive a college candidate is, on a scale of 0-200 (roughly) by using SAT*0.1 + GPA*25 - 60. as our equation. This says that increasing your GPA by 1 point will increase your attractiveness by 25 points, while increasing your SAT by 100 points will increase your attractiveness by 10 points. This suggests that GPA is 250 times more important than SAT scores, but this isn't true because GPA has much less room to vary than SAT scores.

One typically feeds cue and outcome information for a number of cases into a computer and the computer generates the coefficients for each cue.

The predictive value of a positive test is a function of the prevalence and test characteristics. According to Bayes' Theorem, it's:

PVP = [prevalence*sensitivity]/[prevalence*sensitivity - (1-prevalence)*(1-specificity)]

If we instead tried to do a multiple regression, we'd get an equation like this:

Predicted PVP = a*prevalence + b*sensitivity + c*specificity + d

This looks quite different, and will be systematically biased. On the other hand, it'll be close, especially when we're in the realm of multiple correlated tests. Linear equations usually capture most of the important relationships.

A major finding with regards to judgments made by these linear models is that they're almost always better than judgments made by people! The models predict the outcomes better than an intuitive judgment by a human judge because they (1) derive more appropriate weights for each cue (the regression coefficients) and then (2) more consistently apply these weights. Human judges are often swayed from consistency by irrelevant individual features of the case they're judging. If we can't model the outcomes themselves, but we can only model the judge's judgments (called bootstrapping or policy capturing), we still gain the benefit of (2).

A second major finding is that most of the benefit of linear models in practice depends on having the right cues and giving them the right sign (i.e., better MCAT predicts better success in medical school, not worse) -- the actual coefficients are less important. Robyn Dawes has shown that in some cases, unit-weighted models (where all coefficients are 1 or -1) or even random-weighted models (where coefficients are determined randomly, apart from sign) can outperform human judges. (In these models, the cues are standardized -- divided by their standard deviations -- so they're all on a similar scale).