But here's another excerpt.
This one shows how we supplemented our use of conventional "statistical significance"/NHT testing of the study results with use of Bayesian likelihood ratios.
We did use former, but I think the latter are more useful generally for conveying practical strength of evidence & also for assessing the relative plausibility of competing hypotheses, an objective central to empirical inquiry for which NHT/"statistical significance" is ill-suited (see Goodman, S.N., Introduction to Bayesian methods I: measuring the strength of evidence, Clin Trials 2, 282 - 290 (2005); Edwards, W., Lindman, H. & Savage, L.J., Bayesian Statistical Inference in Psychological Research., Psych Rev 70, 193 - 242 (1963)). Anyone who disagrees that Likelihood ratios are cool is a Marxist!
Oh, BTW: "IPCI" refers to "identity-protective cognition impacat," which is the average percentage-point difference in the probability that a subject type (judge, lawyer, student, member of the public, or house pet) would be to find a statutory violation when doing so affirmed rather than defied his or her cultural worldview.
* * *
c. Judges vs. members of the public using Bayesian methods. As an alternative to assessing the improbability of the “null hypothesis,” one can use Bayesian methods to assess the strength of the evidence in relation to competing hypothesized IPCIs. Under Bayes’s Theorem the likelihood ratio reflects how much more consistent an observed outcome is with one hypothesis than a rival one. It is the factor in proportion to which one should adjust one’s assessment of the relative probability (expressed in odds) of one hypothesis in relation to another.
Imagine, for example, that we are shown two opaque canvas bags, labeled “B1”and “B2,” each of which is filled with marbles (we use “canvas bags” for this example in anticipation of the reasonable concern that Bayes’s Theorem might apply only to marble-filled urns). We are not told which is which, but one bag, it is stipulated, contains 75% red marbles and 25% blue, and the other 75% blue and 25% red. We are instructed to “sample” the contents of the bags by drawing one marble from each, after which we should make our best estimate of the probability that B1 is the bag containing mostly blue marbles and B2 the one containing mostly red. We extract a blue marble from B1 and a red one from B2.
Bayes’s Theorem furnishes logical instructions on how to use this “new evidence” to revise our estimates of the probability of the hypothesis that B1 is the bag containing mostly blue marbles (and hence B2 mostly red). If we assume that that hypothesis is true, then the probability that we would have drawn a blue marble from B1 is 3/4 or 0.75, as is the probability that we would have drawn a red marble from B2. The joint probability of these independent events—that is, the probability of the two occurring together, as they did—is 3/4 x 3/4 or 9/16. If we assume that hypothesis “B1 is the one that contains mostly blue marbles” is false, then the joint probability of drawing a blue marble from B1 followed by a red marble from B2 would be 1/4 x 1/4, or 1/16. Other possible combinations of colors could have occurred, of course (indeed, there are four possible combinations for such a trial). But if we were to repeat this “experiment” over and over (with the marbles being replaced and the labels on the bags being randomly reassigned after each trial), then we would expect the sequence “blue, red” to occur nine times more often when the bag containing mostly blue marbles is the one labeled “B1” than when it is the bag labeled “B2.” Because “blue, red” is the outcome we observed in our trial, we should revise our estimate of the probability of the hypothesis “B1 contains mostly blue marbles” by a factor of 9—from odds of 1:1 (50%) to 9:1 (90%).
We can use precisely the same logic to assess the relative probability of hypothesized judge and pubic IPCIs. In effect, one can imagine each subject-type as an opaque vessel containing some propensity to engage in identity-protective cognition. The strengths of those propensities—the subject types’ “true” IPCIs—are not amenable to direct inspection, but we can sample observable manifestations of them by performing this study’s statutory interpretation experiment. Calculating the relative likelihood of the observed results under competing hypotheses, we can construct a likelihood ratio that conveys how much more consistent the evidence is with one hypothesized subject-type IPCI than with another.
Figure 8 illustrates the use of this method to test two competing hypotheses about the public’s “true” IPCI: that members of the public would be 25 percentage points more likely to find a violation when doing so is culturally affirming, and alternatively that they would be only 15 percentage points more likely to do so. To make the rival hypothesis commensurable with the study results, we can represent each as a probability distribution with the predicted IPCI as its mean and a standard error equivalent to the one observed in the experimental results. Within any one such distribution, the relative probability of alternative IPCIs (e.g., 15% and 25%) can be determined by assessing their relative “heights” on that particular curve. Likewise, the relative probability of observing any particular IPCI under alternative distributions another can be determined by comparing the ratio of the heights for the probability density distributions in question.
The public IPCI was 22%. The probability of observing such a result (or any in close proximity to it) is eight times more likely under the more extreme “public IPCI = 25%” hypothesis than it is under the more modest “public IPCI = 15%” hypothesis (Figure 8). This the Bayesian likelihood ratio, or the factor in proportion to which one should modify one’s assessment of the relative probability that the “true” public IPCI is 25 as opposed to 15 percentage points.
We will use the same process to assess the weight of four competing hypothesis about the vulnerability of judges to identity protective cognition. The first is that judges will be “unaffected” (IPCI = 0%). This prediction, of course, appears similar to the “null hypothesis.” But whereas “null hypothesis testing” purports to specify only whether the null hypothesis can be “rejected,” Bayesian methods can be used to obtain a genuine assessment of the strength of the evidence in support of there being “no effect” if that is a genuine hypothesis of interest, as it is here. The remaining three hypotheses, the plausibility of which will be tested relative to the “IPCI = 0%” hypothesis are that that judges will be “just as affected as the public” (IPCI = 22%); that judges will be moderately affected (IPCI = 10%); and that judges will be affected to only a comparatively mild degree (IPCI = 5%).
The results are reflected in Figure 9. Not surprisingly, the experimental data are much more supportive of the first hypothesis—that judges would be unaffected by the experimental manipulation—than with the second—that they would be “as affected as much as the public.” Indeed, because the probability that we would have observed the actual experimental result if the latter hypothesis is true are astronomically low, there is little practical value in assigning a likelihood ratio to how much more strongly the evidence supports the hypothesis that judges were “unaffected” by the experimental manipulation.
Of course, members of the public were influenced by their cultural predispositions to a strikingly large extent. To learn that the evidence strongly disfavors the inference that judges are that biased does not in itself give us much insight into whether judges possess the capacity for impartial decisionmaking that their duties demand. It was precisely for that reason that less extreme IPCIs were also hypothesized.
Even those predictions, however, proved to be less supported by the evidence than was the hypothesis that judges would be unaffected by identity-protective reasoning. The evidence was 20 times more consistent with the “judge IPCI = 0” hypothesis than the “judge IPCI = 10%” hypothesis. The weight of the evidence was not as decided but still favored—by a factor of about three—the “judge IPCI = 0” hypothesis over the “judge IPCI = 5%” hypothesis (Figure 9).