follow CCP

Recent blog entries
popular papers

What Is the "Science of Science Communication"?

Climate-Science Communication and the Measurement Problem

Ideology, Motivated Cognition, and Cognitive Reflection: An Experimental Study

'Ideology' or 'Situation Sense'? An Experimental Investigation of Motivated Reasoning and Professional Judgment

A Risky Science Communication Environment for Vaccines

Motivated Numeracy and Enlightened Self-Government

Ideology, Motivated Cognition, and Cognitive Reflection: An Experimental Study

Making Climate Science Communication Evidence-based—All the Way Down 

Neutral Principles, Motivated Cognition, and Some Problems for Constitutional Law 

Cultural Cognition of Scientific Consensus

The Tragedy of the Risk-Perception Commons: Science Literacy and Climate Change

"They Saw a Protest": Cognitive Illiberalism and the Speech-Conduct Distinction 

Geoengineering and the Science Communication Environment: a Cross-Cultural Experiment

Fixing the Communications Failure

Why We Are Poles Apart on Climate Change

The Cognitively Illiberal State 

Who Fears the HPV Vaccine, Who Doesn't, and Why? An Experimental Study

Cultural Cognition of the Risks and Benefits of Nanotechnology

Whose Eyes Are You Going to Believe? An Empirical Examination of Scott v. Harris

Cultural Cognition and Public Policy

Culture, Cognition, and Consent: Who Perceives What, and Why, in "Acquaintance Rape" Cases

Culture and Identity-Protective Cognition: Explaining the White Male Effect

Fear of Democracy: A Cultural Evaluation of Sunstein on Risk

Cultural Cognition as a Conception of the Cultural Theory of Risk


The "asymmetry thesis": another PMRP issue that won't go away

I feel like I've done 10^8 posts on this .... That's wrong: I counted, and in fact I've done 10.3^14.

But that's because it's a difficult question. Or at least is if one treats it as one of "measurement" & "weight of the evidence."  I remain convinced that it is not of great practical significance--that is, even if "motivated reasoning" and like dynamics are "asymmetric" across the ideological spectrum (or cultural spectra) that define the groups polarized on policy-consequential facts, the evidence is overwhelming and undeniable that members of all such groups are subject to this dynamic, & to an extent that makes addressing its general impact -- rather than singling out one or another group as "anti-science" etc. -- the proper normative aim for those dedicated to advancing enlightened self-govt.

But issues of "measurement" & "weight of the evidence" etc. are still, in my view, perfectly legitimate matters of scholarly inquiry. Indeed, pursuit of them in this case will, I'm sure, enlarge knowledge, theoretical and practical.

"Asymmetry" is an open question--& not just in the sense that nothing in science is ever resolved but in the sense that those on both "sides" (i.e., those who believe politically motivated reasoning is symmetric and those who believe it is asymmetric) ought to wonder enough about the correctness of their own position to wish that they had more evidence.

Here's an excerpt from my The Politically Motivated Reasoning Paradigm survey/synthesis essay addressing the state of the "debate":

4. Asymmetry thesis

The “factual polarization” associated with politically motivated reasoning is pervasive in U.S. political life. But whether politically motivated reasoning is uniform across opposing cultural groups is a matter of considerable debate (Mooney 2012).

In the spirit of the classic “authoritarian personality” thesis (Adorno 1950), one group of scholars has forcefully advanced the claim that it is not. Known as the “asymmetry thesis,” their position links biased processing of political information with characteristics associated with right-wing political orientations. Their studies emphasize correlations in observational studies between conventional ideological measures and scores on self-report reasoning-style scales such as “need for closure” and “need for cognition” and on personality-trait scales such “openness to experience” (Jost, Glaser, Kruglanski & Sulloway 2003; Jost, Hennes & Lavine 2013).

But the research that the “neo-authoritarian personality” school features supplies weak evidence for the asymmetry thesis. First, the reasoning style measures that they feature are of questionable validity. It is a staple of cognitive psychology that defects in information processing are not open to introspective observation or control (Pronin 2007) –a conclusion that applies to individuals high as well as more modest in cognitive proficiency (West, Meserve & Stanovich 2012). There is thus little reason to believe a person’s own perception of the quality of his reasoning is a valid measure of the same.

Indeed, tests that seek to validate such self-report reasoning style scales consistently find them to be inferior in predicting the disposition to resort to conscious, effortful information processing than performance-based measures such as the Cognitive Reflection Test and Numeracy (Toplak, West & Stanovich 2011; Liberali, Reyna, Furlan & Pardo 2011). Those measures, when applied to valid general population samples, show no meaningful correlation with party affiliation or liberal-conservative ideology (Kahan 2013; Baron 2015).

More importantly, there is no evidence that individual differences in reasoning style predict vulnerability to politically motivated reasoning. On the contrary, as will be discussed in the next part, evidence suggests that proficiency in dispositions such as cognitive reflection, numeracy, and science comprehension magnify politically motivated reasoning (Fig. 6).

Ultimately, the only way to determine if politically motivated reasoning is asymmetric with respect to ideology or other diverse systems of identity-defining commitments is through valid experiments. There are a collection of intriguing experiments that variously purport to show that one or another form of judgment—e.g., moral evolution, willingness to espouse counter-attitudinal positions, the political valence of positions formed while intoxicated, individual differences in activation of “brain regions” etc.—is ideologically asymmetric or symmetric (Thórisdóttir & Jost 2011; Jost, Nam, Jost & Van Bavel 2013; Eidelman et al. 2012; Crawford & Brandt 2013; Schreiber, Fonzo et al. 2013). These studies vary dramatically in validity and insight. But even the very best and genuinely informative ones (e.g., Conway, Gideon, et al. 2015; Liu & Ditto 2013; Crawford 2012) are in fact examining a form of information processing distinct from PMRP and with methods other than the PMRP design or its equivalent.

One study that did use the PMRP design found no support for the “asymmetry thesis” (Kahan 2013). In it, individuals of left- and right-wing political outlooks displayed perfectly symmetric forms of politically motivated fashioning in evaluating evidence that people who reject their group’s position on climate change have been found to engage in open-minded evaluation of evidence (Figure 5).

But that’s a single study, one that like any other is open to reasonable alternative explanations that themselves can inform future studies. In sum, it is certainly reasonable to view the “asymmetry thesis” issue as unresolved. The only important point is that progress in resolving it is unlikely to occur unless studied with designs that reflect PMRP design or ones equivalently suited to support inferences consistent with the PMRP model.


Adorno, T.W. The Authoritarian personality (Harper, New York, 1950).

Baron, J. Supplement to Deppe et al.(2015). Judgment and Decision Making 10, 2 (2015).

Conway, L.G., Gornick, L.J., Houck, S.C., Anderson, C., Stockert, J., Sessoms, D. & McCue, K. Are Conservatives Really More Simple‐Minded than Liberals? The Domain Specificity of Complex Thinking. Political Psychology (2015), advance on-line, DOI: 10.1111/pops.12304.

Crawford, J.T. The ideologically objectionable premise model: Predicting biased political judgments on the left and right. Journal of Experimental Social Psychology 48, 138-151 (2012).

Eidelman, S., Crandall, C.S., Goodman, J.A. & Blanchar, J.C. Low-Effort Thought Promotes Political Conservatism. Pers. Soc. Psychol. B. (2012).

Jost, J.T., Glaser, J., Kruglanski, A.W. & Sulloway, F.J. Political Conservatism as Motivated Social Cognition. Psychological Bulletin 129, 339-375 (2003).

Jost, J.T., Hennes, E.P. & Lavine, H. “Hot” political cognition: Its self-, group-, and system-serving purposes. in Oxford handbook of social cognition (ed. D.E. Carlson) 851-875 (Oxford University Press, New York, 2013).

Kahan, D. M.. Ideology, Motivated Reasoning, and Cognitive Reflection. Judgment and Decision Making, 8, 407-424 (2013).

Liberali, J.M., Reyna, V.F., Furlan, S., Stein, L.M. & Pardo, S.T. Individual Differences in Numeracy and Cognitive Reflection, with Implications for Biases and Fallacies in Probability Judgment. Journal of Behavioral Decision Making 25, 361-381 (2012).

Nam, H.H., Jost, J.T. & Van Bavel, J.J. “Not for All the Tea in China!” Political Ideology and the Avoidance of Dissonance. PLoS ONE 8(4) 8, :e59837. doi:59810.51371/journal.pone.0059837 (2013).

Pronin, E. Perception and misperception of bias in human judgment. Trends in cognitive sciences 11, 37-43 (2007).

Thórisdóttir, H. & Jost, J.T. Motivated Closed-Mindedness Mediates the Effect of Threat on Political Conservatism. Political Psychology 32, 785-811 (2011).

Toplak, M., West, R. & Stanovich, K. The Cognitive Reflection Test as a predictor of performance on heuristics-and-biases tasks. Memory & Cognition 39, 1275-1289 (2011).

West, R.F., Meserve, R.J. & Stanovich, K.E. Cognitive sophistication does not attenuate the bias blind spot. Journal of Personality and Social Psychology 103, 506 (2012).



Weekend update: "Color" preprint of " 'Ideology' vs. 'Situation Sense' "!

I've posted a revised "preprint" version of Kahan, D.M., Hoffman, D.A., Evans, D., Devins, N., Lucci, E.A. & Cheng, K. 'Ideology'or'Situation Sense'? An Experimental Investigation of Motivated Reasoning and Professional Judgment, U. Pa. L. Rev. 164 (in press).

It is prettttttttttttttttttty darn close to final.

Main difference is that it has color rather than B&W graphics.  I have a feeling, w/ all the advances in information technology associated with "our internet," & w/ humans now having walked on the moon & all, that I might still live to see the day when all scholarly journals use color graphics (at least for their on-line versions; I think I've already live long enough to see the day when no one reads the "hardcopy"/"print" versions of journals!).... Call me a dreamer!

I'm sure, too, you all remember but in case not:  This is the study that examines a sample of judges, lawyers, law students & ordinary people to test competing theories about how identity-protective cognition relates to critical reasoning & professional judgment. 

We find that judges & lawyers who are as culturally polarized on societal risks-- like climate change & marijuana legalization--as are members of general population converge in readings of manifestly ambiguous statutes despite experimental manipulations that were intended to and did polarize culturally diverse  members of the public (and to a modest extentculturally diverse law students).

We view this result as most consistent with the theory that professional judgment furnishes experts with a degree of immunity from "identity-protective reasoning" when they perform "in-domain" but not "out-of-domain" decisionmaking tasks.

But as I emphasized in another recent post (one that presents an excerpt from another "in press" paper, The Politically Motivated Reasoning Paradigm), the "weight" of the evidence the study furnishes in this regard-- particularly as it relates to other types of experts like scientists who study contested societal risks--is indeed modest.  More study is called for!

I'm sure I'll live long enough to see this & every other interesting question about cognition definitively resolved too.  At which point, life will be so damn boring that people will stop fretting about its finite duration.

Anyway, happy clicking on grpahics!


1. Summary data

2. Multivariate regression model estimates

3. "Weight of the evidence" likelihood ratios

4. Data-collection process

David Hoffman carefully extracts cultural worldview from a state supreme court judge



Solving 2 nasty confounds: The "Politically Motivated Reasoning Paradigm [PMRP] Design"

Okay, so “yesterday,” I discussed the significance of two “confounds” in studies of “politically motivated reasoning.”

“Politically motivated reasoning” is the tendency of individuals to conform their assessment of the significance of evidence on contested societal risks and like facts to positions that are congenial to their political or cultural outlooks.

The “confounds” were heterogeneous priors and pretreatment effects. “Today” I want to address how to avoid the nasty effects of these confounds.

The inference-defeating consequences of heterogeneous priors and pretreatment effects are associated with a particular kind of study design. 

In it, the researcher exposes individuals of opposing political or cultural identities to counter-attitudinal information on a hotly contested topic such as gun control or climate change. Typically, the information is in the form of empirical studies or advocacy materials, real or fictional. If the information exposure fails to narrow, or even widens, the gap in the positions of subjects of opposing identities, this outcome is treated as evidence of politically motivated reasoning.

But as I explained in the last post, this inference is unsound.  

Imagine, e.g., that members of one politically identifiable group might be more uniformly committed to “their side’s” position than the those of another, some of whose members might be weakly supportive of the former’s position. If so, we would expect members of the latter group to be overrepresented among the subjects who “change their minds” when members of both groups are exposed to evidence more supportive of the other group’s position.  This is the “heterogeneous priors” confound.

You can't judge an experiment by its results; only by its design . . . .Alternatively, a greater proportion of one group might already have been exposed to evidence equivalent to that featured in the study design.  In that case, fewer members of that group would be expected to change their mind—not because they were biased but because they would have already adjusted their beliefs to take account of it. This is the “pretreatment effect” confound.

Put these two confounds together, and it’s clear that, under the design I described, no outcome is genuinely inconsistent with subjects having assessed the information in the “politically unbiased” manner associated with Bayesian information processing (Druckman, Fein & Leeper 2012; Druckman 2012; Bullock 2009; Gerber & Green 1999).

The solution, then, is to change the design.

If you can't remember exactly what the difference is between politically motivated reasoning, confirmation bias, and Bayesian information processing, click here. If you can, click here anyway!That’s one of the central points of The Politically Motivated Reasoning Paradigm (in press).  In that paper, I describe studies (e.g., Uhlman, Pizzaro, Tannenbaum, & Ditto 2009; Bolsen, Druckman & Cook 2014; Scurich & Shniderman 2014) that use a common strategy to avoid the confounding effects of heterogeneous priors and pretreatment effects.  I refer to it as the “PMRP” (for “Politically Motivated Reasoning Paradigm) “design.”

Under the PMRP design, the researcher manipulates the subjects’ perception of the consequences of crediting one and the same piece of evidence.  What’s compared is not individual subjects’ reported beliefs before and after being exposed to information but rather the weight or significance subjects of opposing predispositions attach to the evidence conditional on the experimental manipulation(cf. Koehler 1993). If subjects credit the evidence when they perceive it is consistent with their political predispositions but dismiss it when it’s not, then we can be confident that it is their politically biased weighing of evidence and not any discrepancy in priors or pre-study exposure to evidence that is driving subjects of opposing cultural or political identities apart.

One CCP study used the PMRP design to examine how study subjects of opposing cultural identities would assess the behavior of political protestors (Kahan, Hoffman, Evans, Braman & Rachlinski 2012). Instructed to adopt the perspective of juries in a civil case, the subjects examined a digital recording of demonstrators alleged to have assaulted passersby. The cause and identity of the demonstrators was manipulated: in one condition, they were described as “anti-abortion protestors” assembled outside the entrance to an abortion clinic; in the other, they were described as “gay-rights advocates” protesting the military’s “Don’t ask, don’t tell” policy outside a military-recruitment center.

Subjects of opposing “cultural worldviews” who were assigned to the same experimental condition—and who thus believed they were watching the same type of protest—reported forming opposing perceptions of whether the protestors “blocked” and “screamed in the face” of pedestrians trying to access the facility. At the same time, subjects who were assigned to different conditions—and who thus believed they were watching different types of protests—formed perceptions comparably different from subjects who shared their cultural worldviews.

go ahead, click it -- it won't bite!
In line with these opposing perceptions, the results in the two conditions produced mirror-image states of polarization on whether the behavior of the protestors met the factual preconditions for liability. 

But that outcome—an increased state of political polarization, in effect, in “beliefs”—is not, in my view, an essential one under the PMRP design. Indeed, if the issue featured in a study is familiar (like whether human beings are causing climate change, or whether permitting individuals to carry concealed firearms in public increases or decreases crime), we shouldn’t expect a one-shot exposure to evidence in the lab to change subjects' “positions.”

The only thing that matters is whether subjects of opposing outlooks opportunistically shifted the weight  (or in Bayesian terms, the likelihood ratio) they assigned to one and the same piece of evidence based on its congruence with their political predispositions.  If that’s how individuals of opposing cultural identities behave outside the lab, then contrary to what would occur under a Bayesian model of information processing they will not converge on politically contested facts no matter how much valid evidence they are furnished with.

Or won’t unless & until something is done in the world that changes the stake individuals with outlooks like those have in conforming their assessment of evidence to the positions then associated with their cultural identities (Kahan 2015).

The PMRP design is definitely not the only one that validly measures politically motivated reasoning. Indeed, the consistency of findings of studies that reflect the PMRP design and those based on other designs (e.g., Binning, Brick, Cameron, Cohen, & Sherman 2015; Nyhan, Riefler & Ubel 2015;  Druckman & Bolsen 2011; Bullock 2007; Cohen 2003) furnish more reason for confidence that the results of both are valid. Nevertheless, the test that the PMRP design is self-consciously constructed to pass—demonstration that individuals are opportunistically adjusting the weight they assign evidence to conform it to their political identities—supplies the proper standard for assessing whether the design of any particular study supports an inference of politically motivated reasoning.


Binning, K.R., Brick, C., Cohen, G.L. & Sherman, D.K. Going Along Versus Getting it Right: The Role of Self-Integrity in Political Conformity. Journal of Experimental Social Psychology 56, 73-88 (2015).

Bolsen, T., Druckman, J.N. & Cook, F.L. The influence of partisan motivated reasoning on public opinion. Polit. Behav. 36, 235-262 (2014).

Bullock, J. The enduring importance of false political beliefs. Unpublished Manuscript, Stanford University  (2007).

Bullock, J.G. Partisan Bias and the Bayesian Ideal in the Study of Public Opinion. The Journal of Politics 71, 1109-1124 (2009).

Cohen, G.L. Party over Policy: The Dominating Impact of Group Influence on Political Beliefs. J. Personality & Soc. Psych. 85, 808-822 (2003).

Druckman, J.N. & Bolsen, T. Framing, Motivated Reasoning, and Opinions About Emergent Technologies. Journal of Communication 61, 659-688 (2011).

Druckman, J.N., Fein, J. & Leeper, T.J. A source of bias in public opinion stability. American Political Science Review 106, 430-454 (2012).

Druckman, J.N. The Politics of Motivation. Critical Review 24, 199-216 (2012).

Druckman, J.N., Fein, J. & Leeper, T.J. A source of bias in public opinion stability. American Political Science Review 106, 430-454 (2012).

Gerber, A. & Green, D. Misperceptions about Perceptual Bias. Annual Review of Political Science 2, 189-210 (1999).

Kahan, D. M. The Politically Motivated Reasoning Paradigm. Emerging Trends in Social & Behavioral Sciences (in press).

Kahan, D. M. What is the “science of science communication”? J. Sci. Comm., 14(3), 1-12 (2015).

Kahan, D. M., Hoffman, D. A., Braman, D., Evans, D., & Rachlinski, J. J. They Saw a Protest : Cognitive Illiberalism and the Speech-Conduct Distinction. Stan. L. Rev., 64, 851-906 (2012).

Nyhan, B. & Reifler, J. The roles of information deficits and identity threat in the prevalence of misperceptions.  (2015),

Scurich, N. & Shniderman, A.B. The Selective Allure of Neuroscientific Explanations. PLoS One 9 (2014).

Uhlmann, E.L., Pizarro, D.A., Tannenbaum, D. & Ditto, P.H. The motivated use of moral principles. Judgment and Decision Making 4 (2009).


Testing for "politically motivated reasoning": 2 nasty confounds

The paper I posted “yesterday”—“The Politically Motivated Reasoning Paradigm”—is mainly about what “politically motivated reasoning” is and how to design studies to test whether it is affecting citizens’ assessment of evidence and by how much. 

The paper  is concerned, in particular, with two confounds—alternative explanations, essentially—that typically constrain the inferences that can be drawn from such studies.  The problems are heterogeneous priors and pretreatment effects (Druckman, Fein & Leeper 2012; Druckman 2012; Bullock 2009; Gerber & Green 1999).

Rather than describe these constraints abstractly, let me try to illustrate the problem they present.

Imagine a researcher is doing an experiment on “politically motivated reasoning”—the asserted tendency of individuals to conform evidence on disputed risks or other policy-relevant facts to the positions that are associated with their political outlooks.

She collects information on the subjects' “beliefs” in, say, “human caused global warming” and the strength of those beliefs (reflected in their reported probability that humans are the principal cause of it). She then presents the subjects with evidence—in the form of a study that suggests human activity is the principal cause of global warming--and measures their beliefs and their confidence in those beliefs again.

This is what she observes: 

Obviously, the subjects have become even more sharply divided. The difference in the proportion of Democrats and Republicans who accept AGW widened, as did the difference in their respective estimates of the probability of AGW.

Does the result support an inference that the subjects selectively credited or discredited the evidence consistent with their political predispositions?

Not really, no.

The clam that individuals are engaged in “politically motivated reasoning” implies they aren’t assessing the information in an unbiased manner, uninfluenced by the relationship between that information and outcomes congenial to their political views.

We can represent this kind of “unbiased” information processing in a barebones Bayesian model, in which individuals revise their existing belief in the probability of a hypothesis, expressed in odds, by a factor equivalent to how much more consistent the new information is with that hypothesis than with a rival one. That factor is known as the “likelihood ratio,” and conceptually speaking reflects the “weight” of the new information with respect to the competing hypotheses.

The distinctive feature of “politically motivated reasoning” is the endogeneity of the likelihood ratio and individuals’ political predispositions.  The political congeniality of crediting the evidence determines the weight they assign it.  Because “whose side does this evidence support—yours or mine?” is a criterion unrelated to its validity, individuals who reason this way will fail to converge on the best understanding of the best available evidence.

But in the hypothetical study I described, we really don’t know if that’s happening.  Certainly, we would expect to see a result like the one reported—partisans becoming even more “polarized” as they examine the “same” evidence--if they were engaged in politically motivated reasoning.

But we could in fact see exactly this dynamic consistent with the unbiased, Bayesian information-processing model.

As a simplification, imagine the members of a group of deliberating citizens, Rita, Ron, and Rose—all of whom are Republicans—and Donny, Dave, Daphne—all Democrats.  Each has a “belief” about the contribution of human beings to “human caused climate change,” and each has a sense of how confident they are about their beliefs—a sensibility we can represent in terms of how probable they think it is (expressed in odds) that human beings are the principal cause of climate change.

The table to the left represents this information 

Now imagine that they are shown a study.  The study presents evidence supporting the conclusion that humans are the principal cause of climate change. 

Critically, all of the individuals in this group agree about the weight properly afforded the evidence in the study!

They all agree, let’s posit, that the study has modest weight—a likelihood ratio of 3, let’s say, which means that it is three times more consistent with the hypothesis that human beings are responsible for climate change than with the contrary hypothesis (don’t confuse likelihood ratios with “p-values” please; the latter have nothing to do with the inferential weight evidence bears).

In other words, none of them adjusts the likelihood ratio or weight afforded to the evidence to fit their predispositions.

Nevertheless, the results of the hypothetical study I described could still display the polarization the researcher found!

This table shows how: 

First, the individuals in this "sample" started with different priors.  Daphne, e.g., put the probability that human beings were causing climate change at 2:1 (0.5:1 in favor) against before she got the information.  Rita’s prior odds were 1000:1 against (.001:1 in favor). 

When they both afforded the new information a likelihood ratio of 3, Daphne flipped from the view that human beings “probably” weren’t responsible for climate change to the view that they probably were (1.5:1 or 3:2 in favor).  But because Rita was more strongly convinced that human beings weren’t causing climate change, she persisted in her belief that humans probably weren’t responsible for climate change even after appropriately adjusting downward (from 1000:1 to about 333:1) against (Bullock 2009).

Second, the individuals in our sample started with differing amounts of knowledge about the existing evidence on climate change.  

In particular, Ron and Rose, it turns out, already knew about the evidence that the researcher showed them in the experiment! That's hardly implausible: members of the public are constantly being bombarded with information on climate change and similarly contentious topics.  Their priors—10:1 against against human-caused climate change, and 2:1 in favor, respectively--already reflected their unbiased (I’m positing) assessment of that information (or its practical equivalent). 

They thus assigned the evidence a likelihood ratio of “1” in reporting their "after evidence" beliefs in the study not because they were conforming the likelihood ratio to their predispositions—indeed, they agree that the evidence is 3x more consistent with the hypothesis that humans are causing climate change than that they are not—but because their priors already reflected having given the information that weight when they previously encountered it in the real world.

If the “outcome variable” of the study is “what percentage of Republicans and Democrats think human activity is a principal cause of climate change,” then we will see polarization even with Bayesian information processing—i.e, without the sort of selective crediting of information that is the signature of politically motivated reasoning--becaues of the heterogeneity of the group members' priors.

Likewise, if we examine the “mean” probabilities assigned to AGW by the Democrats and Republicans, we find the differential grew in the information-exposed condition.  The reason, however, wasn't differences in how much weight they gave the information, but pre-treatment (pre-study) differences in their exposure to information equivalent to that conveyed to them in the experiment (Druckman, Fein & Leepr 2012).

In sum, given the study design, we can’t draw confident inferences that the subjects engaged in politically motivated reasoning.  They could have.  But because of the confounds of heterogeneous priors and pretreatment exposure to information, we could have ended up with exactly these results even if they were engaged in unbiased, Bayesian information processing.

To draw confident inferences, then, we need a better study design for politically motivated reasoning—one that avoids these confounds.

I describe that design in the “Politically Motivated Reasoning Paradigm” paper.  I call it the “Politically Motivated Reasoning Paradigm” (PMRP) design.

I’ll say more about it . . . “tomorrow”!


Druckman, J.N. The Politics of Motivation. Critical Review 24, 199-216 (2012).

Druckman, J.N., Fein, J. & Leeper, T.J. A source of bias in public opinion stability. American Political Science Review 106, 430-454 (2012).

Bullock, J.G. Partisan Bias and the Bayesian Ideal in the Study of Public Opinion. The Journal of Politics 71, 1109-1124 (2009).

Gerber, A. & Green, D. Misperceptions about Perceptual Bias. Annual Review of Political Science 2, 189-210 (1999).

Kahan, D.M. The "Politically Motivated Reasoning Paradigm." Emerging Trends in Social & Behavioral Sciences (in press).




New paper: "The Politically Motivated Reasoning Paradigm"

What is it, how do you measure it, is it ideologically symmetric, do any of the herbal supplements advertised as counteracting it really work, etc.  Take a look & find out.

Still time for revisions, so comments welcome!


"*Scientists* & identity-protective cognition? Well, on the one hand ... on the other hand ... on the *other* other hand ..." A fragment

Scientific proof that "skeptical" scientisis are biased!From something I'm working on. I'll post the rest of it "tomorrow," in fact.  But likely this section will end up on the cutting room floor (that's okay; there's lots of stuff down there & eventually I expect to find use for most of it someplace; is a bit of fire hazard, though . . . .)

6. Professional judgment

Ordinary members of the public predictably fail to get the benefit of the best available scientific evidence when their collective deliberations are pervaded by politically motivated reasoning. But even more disturbingly, politically motivated reasoning might be thought to diminish the quality of the best scientific evidence available to citizens in a democratic society (Curry 2013).

Not only do scientists—like everyone else—have cultural identities. They are also highly proficient in the forms of System 2 information processing known to magnify politically motivated reasoning.   Logically, then, it might seem to follow that scientists’ factual beliefs about contested societal risks are likely skewed by the stake they have in conforming information to the positions associated with their cultural groups.

But a contrary inference would be just as “logical.” The studies linking politically motivated reasoning with the disposition to use System 2 information processing have been conducted on general public samples, none of which would  have had enough scientists in them to detect whether being one matters. Unlike nonscientists  with  high CRT or Numeracy scores, scientists use professional judgment when they evaluate evidence relevant to disputed policy-relevant facts. Professional judgment consists in habits of mind, acquired through training and experience and distinctively suited to specialized forms of decisionmaking.  For risk experts, those habits of mind confer resistance to many cognitive biases that can distort the public’s perceptions(Margolis 1996).  It is perfectly plausible to believe that one of the biases that professional judgments can protect risk experts from is “politically motivated reasoning.”

Here, too, neither values nor positions on disputed policies can help decide between these competing empirical claims. Only evidence can.  To date, however, there are few studies of how scientists might be affMy spidey sense tells me this is a future classic!ected by politically motivated reasoning, and the inferences they support are equivocal. 

Some observational studies find correlations between the positions of scientists on contested risk issues and their cultural or political orientations (Bolsen, Druckman, & Cook 2015; Carlton, Perry-Hill, Huber & Prokopy 2015).  The correlations, however, are much less dramatic than ones observed in general-population samples.  In addition, with one exception (Slovic, Malmfors et al. 1995), these studies have not examined scientists’ perceptions of facts in their own domains of expertise.

This is an important point. Professional judgment inevitably comprises not just conscious analytical reasoning proficiencies but perceptive sensibilities that activate those proficiencies when they are needed (Bedard & Biggs 1991; Marcum 2012). Necessarily preconscious (Margolis 1996), these sensibilities reflect the assimilation of the problem at hand to an amply stocked inventory of prototypes. But because these prototypes reflect the salient features of problems distinctive of the expert’s field, the immunity from bias that professional judgment confers can’t be expected to operate reliably outside the domain of her expertise (Dane & Pratt 2007).

A study that illustrates this point examined legal professionals.  In it, lawyers and judges, as well as a sample of law students and members of the public, were instructed to perform a set of statutory interpretation problems. Consistent with the PMRP design, the facts of the problems—involving behavior that benefited either illegal aliens or “border fence” construction workers; either a pro-choice or pro-life family counseling clinic—were manipulated in a manner designed to provoke responses consistent with identity protective cognition in competing cultural groups.  The manipulation had exactly that effect on members of the public and on law students.  But it didn’t on either judges or lawyers:  despite the ambiguity of the statutes and the differences in their own cultural values, those study subjects converged in their responses, just as one would predict if one expected their judgments to be synchronized by the common influence of professional judgment. Nevertheless, this relative degree of resistance to identity-protective reasoning was confined to legal-reasoning tasks: the judges and lawyers’ respective perceptions of disputed societal risks—from climate change to marijuana legalization—reflected the same identity-protective patterns observed in the general public and student samples (Kahan, Hoffman, Evans, Lucci, Devins & Cheng in press). Extrapolating, then, we might expect to see the same effect in risk experts: politically motivated divisions on policy-relevant facts outside the boundaries of their specific field of expertise; but convergence guided by professional judgment inside of them.

Or alternatively we might expect convergence not on positions that are true necessarily but that are so intimately bound up with a field’s own sense of identity that acceptance of them has become a marker of basic competence (and hence a precondition of recognition and status) within it.  In Koehler (1993), scientists active  in either defending or discrediting scientific proof of “parapsychology” were instructed to review the methods of a fictional ESP study. The result of the study was experimentally manipulated: Half the scientists got one that purported to find evidence supporting ESP, the other half one that purported to find evidence not supporting it. The scientists’ assessments of the quality of the study’s methods turned out to be strongly correlated with the fit between the representeveeeeeeeeeery interesting ....d result and the position associated with the scientists’ existing positions on the scientific validity of parapsychology—although Koehler found that this effect was in fact substantially more dramatic among the “skeptic” than the “non-skeptic” scientists. 

Koehler’s study reflects the core element of the PMRP design: the outcome measure was the weight that members of opposing groups gave to one and the same piece of evidence conditional on the significance of crediting it. Because the significance was varied in relation to the subjects’ prior beliefs and not their stake in some goal independent of forming an accurate assessment, the study can and normally is understood to be a demonstration of confirmation bias.  But obviously, the “prior beliefs” in this case were ones integral to membership in opposing groups, the identity-defining significance of which for the subjects was attested to by how much time and energy they had devoted to promoting public acceptance of their respective groups’ core tenets. Extrapolating, then, one might infer that professional judgment might indeed fail to insulate from the biasing effects of identity-protective cognition scientists whose professional status has become strongly linked with particular factual claims.

So we are left with only competing plausible conjectures.  There’s nothing at all unusual about that. Indeed, it is the occasion for empirical inquiry—which here would take the form of the use of the PMRP design or one of equivalent validity to assess the vulnerability of scientists to politically motivated reasoning—both in and outside of the domains of their expertise, and with and without the pressure to affirm “professional-identity-defining” beliefs.


Curry, J. Scientists and Motivated Reasoning. Climate Etc. (Aug. 20, 2013)

Bedard, J.C. & Biggs, S.F. Pattern recognition, hypotheses generation, and auditor performance in an analytical task. Accounting Review, 622-642 (1991).

Bolsen, T., Druckman, J.N. & Cook, F.L. Citizens’, scientists’, and policy advisors’ beliefs about global warming. The ANNALS of the American Academy of Political and Social Science 658, 271-295 (2015).

Carlton, J.S., Rebecca, P.-H., Matthew, H. & Linda, S.P. The climate change consensus extends beyond climate scientists. Environmental Research Letters 10, 094025 (2015).

Dane, E. & Pratt, M.G. Exploring Intuition and its Role in Managerial Decision Making. Academy of Management Review 32, 33-54 (2007).

Kahan, D.M., Hoffman, D.A., Evans, D., Devins, N., Lucci, E.A. & Cheng, K. 'Ideology' or 'Situation Sense'? An Experimental Investigation of Motivated Reasoning and Professional Judgment. U. Pa. L. Rev. 164 (in press).

Koehler, J.J. The Influence of Prior Beliefs on Scientific Judgments of Evidence Quality. Org. Behavior & Human Decision Processes 56, 28-55 (1993).

Marcum, J.A. An integrated model of clinical reasoning: dual-process theory of cognition and metacognition. Journal of Evaluation in Clinical Practice 18, 954-961 (2012).

Margolis, H. Dealing with risk : why the public and the experts disagree on environmental issues (University of Chicago Press, Chicago, IL, 1996).

Margolis, H. Patterns, thinking, and cognition : a theory of judgment (University of Chicago Press, Chicago, 1987).

Slovic, P., Malmfors, T., Krewski, D., Mertz, C.K., Neil, N. & Bartlett, S. Intuitive toxicology .2. Expert and lay judgments of chemical risks in Canada. Risk Analysis 15, 661-675 (1995).


A classic!


Disentanglement principle corollary no. 16a: "You don't have to choose ... between being a reality tv star & being excited to learn what science knows (including what it knows about how people come to know what's known by science)"

Sometimes 1 or 2 of the 14 billion regular followers of this blog ask, "are there really 14 billion reglar followers of this blog?..." 

Yeah. There really are!


"Hey Joe": "Practical scholarship" on climate "science communication"

Sorry for lack of context here, but my guess is that it will become clear enough after a few sentences.

Dear Joe:

I apologize for disparaging your work at the Society for Risk Analysis session yesterday.  You perceived my remarks that way, and on reflection I can see why you did, & why others likely formed the same impression.  I truly regret that.

In fact, it wasn’t your work that I meant to be criticizing. 

My intention was to respond to the argument you presented (with the admirable degree of clarity I wish I had been able to summon in response) in favor of “practical scholarship.”  Because you see, I don’t think the sort of work you defended is either practical or scholarly.

You  proposed to those in the room that the empirical study of climate science communication should be evaluated in light of its contribution to a “goal” of promoting a “world war II scale mobilization” of public opinion (I encourage you to post your slides; they were very well done). 

Research aimed at identifying the significance of values & science comprehension for public conflict on climate change (the subject of the panel we were both on; great new research unveiled by the Shi, Visschers, Siegrist team!) doesn’t meet this criterion, you made clear. Indeed, it detracts from it, because, in your opinion, it implies change will take a “long time” (I disagree it implies any such thing but that’s another matter).

As an example of research that is “practical,” you offered your own, which you characterized as aimed at convincing democratic representatives that their prospects for re-election depend on honoring the sorts of “public preferences” revealed by the structured preference-elicitation  methods you described.

You also stated that your work, along with that of others, is intended to “create cover” for officials to take positions supportive of climate change policies (a common refrain among researchers who generate endless streams of public opinion polls purporting to find that there is fact widespread public consensus for one or another climate change mitigation initiative). 

We should all pitch in to help acehieve this result, you exhorted.

Again, to be clear, my point is that this vision of empirical work on science communication is neither “scholarly” nor “practical.”

Scholarship—of the empirical variety, in any event—tries to help people figure out what’s true, particularly under conditions in which there are multiple plausible understandings of phenomena of consequence.  That’s what the scholarship on the relationship between “values” and “science literacy” that you disparaged is about.  The occasion for that scholarly inquiry is a practical one: to figure out what sorts of dynamics are blocking public engagement with the best available evidence on climate change.

What’s definitely not practical (as Theda Skocpol has noted) is to think that public opinion researchers can be mobilized into a project to “show” elected officials what the public “really” wants.

Elected officials are in the profession of satisfying the expectations of their constituents. They invest plenty of money, most of the time wisely, to figure out how to do that.

They know that surveys purporting to show that a “majority” of Republicans support “the EPA's greenhouse gas emission standards” are measuring non-opinion.   They know too that the sort of preference-elicitation methods you demonstrated—however truly valuable they might be for learning about cognition—are not modeling the decisionmaking dynamics that determine election outcomes. 

Most importantly, they know—because those who agree with your conception of “practical scholarship” are constantly proclaiming this-- that your goal is to create an impression in these actors for your own purposes: to help “shove” them into supporting a particular set of policies (enough with these “nudges” already, you inspiringly proclaimed: we are facing the moral equivalent of Hitler invading Europe!), not help them get re-elected. 

They know, in short, that “non-opinion” survey methods are actually intended to message them!  And I would have sort of thought this was obvious, but it’s not a very good “messaging strategy” to incessantly go on & on within earshot of Republicans about “strategies” for “overcoming” the “Republicans' cognitive resistance to climate mitigation.”

The targeted politicians (Democrat and Republican) therefore sensibly discount (ignore really) everything produced by researchers who are following this "message the politicians" strategy.  They listen instead to the professionals, who tell them something very different from what these "practical scholars" are saying (over & over & over; “keep repeating—that it hasn't worked yet is proof that we just need to do it for longer!,”--another refrain inside this bubble) .  Politicians who take what these researchers say at face value, they’ve observed, get knocked out of office. 

I believe there is plenty that science communication researchers  can do to help actual people, including elected officials, promote science-informed decisionmaking relating to climate change by collaborating with them to adapt and test lab insights to their real-world problems. 

The form of research that I think is best for that aims to help those decisionmakers change the meaning of climate change in their communities, so that discussions of it no longer are perceived as being about “whose side are you on” but instead about “what do we know, what more do we need to know, and what should we do.”

That research doesn't try to conjure a new world into existence by disseminatng "studies" that constantly purport to find it already exists. 

It tries to supply people who actually are acting to make such a world with empirical information that they can use to exercise their judgment as best as they can.

Indeed, what motivated my rebuke of you yesterday was frustration at how closely aligned the program you defended (very clearly, very articulately) is with divisive forms of partisan advocacy that actually perpetuate the social meanings that make climate change a “struggle for the soul of America” rather than a practical problem that all Americans, regardless of the cultural identities, have a common interest in fighting. 

Frustration too at how much the sort of "practical" "scholarship" you called for is distracting and diverting and confusing people who are looking to empirical researchers for help.

At how self-defeating it obviously is ever to propose that a criterion other than “figuring out & sharing one’s best understanding of the truth on contested empirical issues” could possibly be practical.   

How twisted it is to call that singularly unscientific orientation  “science communication” research!

It's pretty simple really: Tell people what they need to know, not what they want to hear

That’s both ethical and practical.

Again, sorry I disparaged your scholarly work, which I think can teach people a lot about how people think. 

The intended target was your conception of “practical scholarship.”  And I did very much intend to be critical of that view and of those who are propogating the mindset you very much evinced in your talk.




p.s. My slides from talk on the challenge of "unconfounding" knowledge & identity in measuring "climate change science comprehension."


Mine goes to 11 ... or 10, at least, for now

What to do when stuck in Ft. Lauderdale airport b/c missing connecting flight to Keys?....

See what happens when the "Rules of Evidence Are Impossible CBR Simulator" is expanded from "8 item of proof" size cases to "10 item of proof" size ones!

Lots of people, no doubt thinking of the wildly popular "Miller-Sanjurjo Turing Machine" (MSTM), have been writing asking if a version of the CBR simulator will be made available for home use by CCPB subscribers... Stay tuned!


Cultural "fact polarization" trumps cultural "value" polarization -- a fragment

Working on this.  Rest "tomorrow."

1. The new politics of “fact polarization”

Polarization over questions of fact is one of the signature features of contemporary democratic political life.  Citizens divided over the relative weight of “liberty” and “equality” are less sharply divided today over the justice of progressive taxation (Moore 2015) than over the evidence that human  CO2 emissions are driving up global temperatures (Frankovic 2015).  Democrats and Republicans argue less strenuously about whether states should be permitted to require the "reading of the Lord's prayer" in school than whether permitting citizens to carry concealed handguns in public increases homicide rates—by multiplying the number of firearms in society—or instead decreases them by equipping law-abiding citizens to protect themselves from predation (Newport 2015).

Members of cultural groups that confer status to women for their mastery of domestic roles love their daughters as much as members of those who celebrate the world of commerce and public affairs as status-conferring arenas for men and women alike (Luker 1984). Yet the two cannot agree about the consequences of universally immunizing middle-school girls against the human papilloma virus: does that policy promote the girls’ health by protecting them later in life from an extremely prevalent sexually  transmitted disease linked to cervical cancer; or endanger them by lulling them into unprotected sex right now, thereby increasing their risks of becoming pregnant and of contracting other, even more deadly STDs (Kahan, Braman, Cohen, Gastil & Slovic 2010)?

These are admittedly complex questions.  But they are empirical ones. Values can’t supply the answers; only evidence can. The evidence that is relevant to any one of these factual issues, moreover, is completely distinct from the evidence relevant to any of the others.  There is simply no logical reason, in sum, for positions on these and various other policy-relevant facts (the safety of deep geologic isolation of nuclear wastes, the deterrent impact of the death penalty, the efficacy of invasive forms of surveillance to combat terrorism, etc.) to cluster at all, much less to form packages of beliefs that so strongly unite citizens of shared cultural commitments and so persistently divide citizens of opposing ones.

But there is a psychological explanation for today’s politics of “fact polarization.”  Or at least a very strong candidate explanation, the emergence of which has supplied an energizing focus for research and debate in the decision sciences over the course of the last decade. . . . 


Frankovic, K. Most republicans do not think humans are causing climate change. YouGov. (2015).

General Social Survey (2014).

Luker, K. Abortion and the politics of motherhood (University of California Press, Berkeley, 1984).



Weekend update: Is critical reasoning domain independent or domain specific?... a fragment of an incomplete rumination

An adaptation of a piece of correspondence--one no longer, really, than this-- w/ a thoughtful person who proposed that people have "corrective mechanisms" for the kind of "likelihood ratio cascade" that I identified with "coherence based reasoning" and that I  asserted makes "rules of evidence" impossible:

What are these corrective mechanisms?

I ask not because I doubt they exist but because I suspect that they do -- & that their operation has evaded full understanding because of a mistaken assumption central to the contemporary study of cognition.

That assumption is that reasoning proficiencies--the capacity to recognize covariance, give proper effect to base rates, distinguish systematic relationships from chance co-occurrences, & perform like mental operations essential to making valid inferences--are more or less discrete, stand-alone "modules" within a person's cognitive repertoire.

If the modules are there, and are properly calibrated, a person will reliably summon them for any particular task that she happens to be doing that depends on that sort of mental operation.

Call this the "domain independent" conception (DI) of cognitive proficiency. DI is presupposed by standardized assessments like the Cognitive Reflection Test (Frederick 2005) and Numeracy (Peters et al. 2006), which purport to measure the specified latent reasoning capacities "in general," that is, abstracted from anything in particular one might use them for.

Another conception sees cognitive proficiency as intrinsically domain specific. On this view--call it the DS conception--it's not accurate to envision reasoning abilities of the sort I described as existing independently of the activities that people use them for (cf. Heatherington 2011).

Accordingly, a person who performs miserably in a context-free assessment of, say, the kind of logical-reasoning proficiency measured by an abstract version of a the Wason Selection Task-- one involving cards with vowels and numbers on either side -- might in fact always (or nearly always!) perform that sort of mental operation correctly in all the real-world contexts that she is used to encountering that require it. In fact, people do very well at the Wason Selection Task when it is styled as something more familiar--like detecting a norm violator (Gigenrenzer & Hug 1992).

In sum, reasoning proficiencies are not stand-alone modules but integral components of action-enabling mental routines that are reliably summoned to mind by a person's perception of the sorts of recurring problem situations those routines, including their embedded reasoning proficiencies, help her to negotiate.

DS is suspicious of standardized assessments, including the usual stylized word problems that are thought by decision scientists to evince one or another type of "cognitive bias."  By (very deliberately) effacing the contextual cues that summon to mind the mental routines and embedded reasoning proficiencies necessary to address recurring problem situations, such tests grossly overstate the "boundedness" of human rationality (Gigenrenzer 2000).

Indeed, by abstracting from any particular use to which people might put the reasoning proficiencies they are evaluating, such assessments and problems are actually measuring only how good people are at doing tests. In fact, people can train themselves to become very proficient at a difficult type of reasoning task for purposes of taking an exam on it and then evince complete innocence of that same sort of knowledge in the real-world settings where it actually applies (DiSessa 1982)!

DI and DS have different accounts of "expertise" in fields that involve reasoning tasks that are vulnerable to recurring cognitive biases. DI  identifies that expertise with the cultivation of general, context-free habits of mind that evince the disposition to use "conscious, effortful" ("system 2") forms of information processing (Sunstein 2005).

DS, in contrast, asserts that "expertise" consists in the possession of  mental routines, and their embedded reasoning proficiencies, specifically suited for specialized tasks. Those mental routines  include the calibration of rapid, intuitive, pre-conscious, affective forms of cognition (or better, recognition) that reliably alert the expert to the need to bring certain conscious, effortful mental operations to bear on the problem at hand. The proper integration of reciprocal forms of intuitive and conscious forms of cognition tailored to specialized tasks is the essence of professional judgment.

Nonexperts can be expected to display one or another bias when confronted with those same problems.  But the reason isn't that the nonexpert "thinks differently" from the expert; it's that the expert has acquired through training and experience mental routines suited to do things that are different from anything the ordinary person has occasions to do in his or her life  (Margolis 1987, 1993, 1996). 

Indeed, if one confronts an expert with a problem divorced from all the cues that reliably activate the cognitive proficiencies she uses when she performs professional tasks, one is likely to find that the expert, too, is vulnerable to all manner of cognitive bias.

But if one infers from that that the expert therefore can't be expected to resist those biases in her professional domain, one is making DI's signature mistake of assuming that reasoning proficiencies are stand-alone modules that exist independent of mental routines specifically suited for doing particular things  (cf. Kahan, Hoffman, Evans,Luci, Devins & Cheng in press) ....

Or that at leas is what a DS proponent would say.

She might, then, too agree that the reason-eviscerating quality of "coherence based reasoning" supplies us with grounds to professionalize fact-finding in legal proceedings.

Not because "jurors" or other "nonexperts" are "stupid." But because it is stupid to think that doing what is required to make accurate findings of fact in legal proceedings does not depend on the cultivation of habits of mind specifically suited for that task.

I tend to think the DS proponent comes closer to getting it right. But of course, I'm not really sure.


DiSessa, A.A. Unlearning Aristotelian Physics: A Study of Knowledge‐Based Learning. Cognitive science 6, 37-75 (1982).

Frederick, S. Cognitive Reflection and Decision Making. Journal of Economic Perspectives 19, 25-42 (2005).

Gigerenzer, G. Adaptive thinking : rationality in the real world (Oxford University Press, New York, 2000).

Gigerenzer, G. & Hug, K. Domain-specific reasoning: Social contracts, cheating, and perspective change. Cognition 43, 127-171 (1992). 

Hetherington, S.C. How to know : a practicalist conception of knowledge (J. Wiley, Chichester, West Sussex, U.K. ; Malden, MA, 2011).

Kahan, D.M., Hoffman, D.A., Evans, D., Devins, N., Lucci, E.A. & Cheng, K. 'Ideology'or'Situation Sense'? An Experimental Investigation of Motivated Reasoning and Professional Judgment. U. Pa. L. Rev. 164 (in press).

Margolis, H. Dealing with risk : why the public and the experts disagree on environmental issues (University of Chicago Press, Chicago, IL, 1996).

Margolis, H. Paradigms and Barriers (1993).

Margolis, H. Patterns, thinking, and cognition : a theory of judgment (University of Chicago Press, Chicago, 1987).

Peters, E., Västfjäll, D., Slovic, P., Mertz, C.K., Mazzocco, K. & Dickert, S. Numeracy and Decision Making. Psychol Sci 17, 407-413 (2006).

Sunstein, C.R. Laws of fear : beyond the precautionary principle (Cambridge University Press, Cambridge, UK ; New York, 2005). 



"Inherent internal contradictions" don't cause bad institutions to collapse; they just suck ... "Rules of evidence are impossible," part 3 (another report for Law & Cognition seminar)

Nope. Can't be done. Impossible.Time for part 3 of this series: Are Rules of Evidence Impossible?

The answer is yes, as I said at the very beginning.

But I didn’t say why & still haven’t.

Instead, I spent the first two parts laying the groundwork necessary for explanation.  Maybe you can build the argument on top of it yourself at this point?! If so, skip ahead to “. . . guess what?”—or even skip the rest of this post altogether & apply your reason to something likely to teach you something new!

But in the event you can’t guess the ending, or simply need your “memory refreshed” (see Fed. R. Evid. 612), a recap:

Where were we? In the first part, I described a conception of the practice of using “rules of evidence”—the Bayesian Cognitive Correction Model (BCCM). 

BCCM conceives of rules of evidence as instruments for “cognitively fine tuning” adjudication. By selectively admitting and excluding items of proof, courts can use the rules to neutralize the accuracy-diminishing impact of one or another form of of biased information processing--from identity-protective reasoning to the availability effect, from hindsight bias to baserate neglect, etc.  The threat these dynamics pose to accurate factfinding is their tendency to induce the factfinder to systematically misestimate the weight, or in Bayesian terms the “likelihood ratio” (LR), to be assigned items of proof (Kahan 2015). 

In part 2, I discussed a cognitive dynamic that has that sort of consequence: “coherence based reasoning” (CBR).

Monte carlo simulation of CBR! check it out!Under CBR (Simon 2004; Simon, Pham, Quang & Holyoak 2001; Carlson & Russo 2001), the factfinder’s motivation to find “coherence” in the trial proof creates a looping feedback effect. 

Once the factfinder forms the perception that the accumulated weight of the evidence supports one side, he begins to inflate or discount the weight of successive items of proof as necessary to conform them to that position.  He also turns around and revisits already-considered items of proof and reweights them to make sure they fit that position, too. 

His reward is an exaggerated degree of confidence in the correctness of that outcome—and thus the piece of mind that comes from never ever having to worry that maybe, just maybe he got the wrong answer.

The practical consequences are two.  First, by virtue of the exaggerated certainty the factfinder has in the result, he will sometimes rule in favor of a party that hasn’t carried its burden under a heightened standard of proof like, say, “beyond a reasonable doubt,” which reflects the law’s aversion to “Type 1” errors when citizens’ liberty is at stake.

Second, what position the factfinder comes to be convinced is right will be arbitrarily sensitive to the order of proof.  The same strong piece of evidence that a factfinder dismisses as inconsistent with what she is now committed to believing is true could have triggered a “likelihood ratio” cascade” in exactly the opposite direction had that item of proof appeared “sooner”-- in which case the confidence it instilled in its proponent's case would have infected the factfinder's evaluation of all the remaining items of proof.

If you hung around after class last time for the “extra credit”/“optional” discussion, I used a computer simulation to illustrate these chaotic effects, and to show why we should expect the accuracy-eviserating consequences of them to be visited disproportionately on innocent defendants in criminal proceedings.

This is definitely the sort of insult to rational-truth-seeking that BCCM was designed to rectify!

But guess what?

It can’t! The threat CBR poses to accuracy is one the BCCM conception of “rules of evidence” can’t possibly couneract!

As I explained in part 1, BCCM consists of three basic elements:

  1. Rule 401, understood as a presumption that evidence with LR ≠ 1 is admissible (Lempert 1977);

  2. a conception of “unfair prejudice” under Rule 403 that identifies it as the tendency of a piece of relevant evidence to induce a flesh-and-blood factfinder to assign incorrect LRs to it or other items of proof (Lempert 1977); and
  3. a strategy for Rule 403 weighing that directs the court to exclude “relevant” evidence when the tendency it has to induce the factfinder to assign the wrong LR to that or other pieces of evidence diminishes accurate assessment of the trial proof to a greater extent than constraining the factfinder to effectively treat the evidence in question as having no weight at all, or LR = 1 (Kahan 2010).

The problem is that CBR injects this “marginal probative value vs. marginal prejudice” apparatus with a form of self-contradiction, both logical and practical.

There isn’t normally any such contradiction. 

Imagine, e.g., that a court was worried that evidence of a product redesign intended to avoid a harmful malfunction might trigger “hindsight bias,” which consists in the tendency to inflate the LRs associated with items of proof that bear on how readily one might have been able to predict the need for and utility of such a design ex ante (Kamin & Rachlinski 1995).  (Such evidence is in theory—but not in practice— “categorically excluded” under Rule 407, when the correction was made after the injury to the plaintiff; but in any case, Rule 407 wouldn’t apply, only Rule 403 would, if the change in product design were made after injuries to third parties but before the plaintiff herself was injured by the original product—even though the same “hindsight bias” risk would be presented).

“All” the judge has to do in that case is compare the marginal accuracy-diminishing impact of [1] giving no weight at all to the evidence (LR = 1) on the "facts of consequence"  it should otherwise have made "more probable" (e.g, the actual existence of alternative designs and their cost-effectiveness) and [2] the inflationary effect of admitting it on the LRs assigned to the evidence bearing on every other fact of consequence (e.g., what a reasonable manufacturer would have concluded about the level of risk and feasibility of alternative designs at the time the original product was designed).

The BCCM conception of 403 "marginal probity vs. marginal prejudice" balancing! A thoughtful person might wonder about the capacity of a judge to make that determination accurately, particularly because weighing the “marginal accuracy diminishing impact” associated with admission and with exclusion, respectively,  actually requires the judge to gauge the relative strength of all the remaining evidence in the case. See Old Chief v. U.S., 519 U.S. 127, 182-85 (1997).

But making such a determination is not, in theory at least, impossible.

What is is doing this same kind of analysis when the source of the “prejudice” is CBR.  When a judge uses BCCM to manage the impact of hindsight bias (or any other type of dynamic inimical to rational information-processing), “marginal probative value” and “marginal prejudice”—the quantities she must balance—are independent.

But when the bias the judge is trying to contain is CBR, “marginal probative value” and “marginal prejudice” are interdependent—and indeed positively correlated.

What triggers the “likelihood ratio cascade” that is characteristic of CBR as a cognitive bias is the correct LR the factfinder assigned whatever item of proof induced the factfinder to form the impression that one side’s position was stronger than the other’s. Indeed, the higher (or lower) the “true” LR of that item of proof, the more confident the facftinder will be in the position that evidence supports, and hence the more biased the factfinder will thereafter be in assessment of the weight due other pieces of evidence (or equivalently, the more indifferent she'll become to the risk of erring in the direction of that position (Scurich 2012)).

To put it plainly, CBR creates a war between the two foundational “rules of evidence”: the more relevant evidence is under Rule 401 the more unfairly prejudicial it becomes for purposes of Rule 403.  To stave off the effects of CBR on accurate factfinding, the court would have to exclude from the case the evidence most integral to reaching an accurate determination of the facts.

Maybe an illustration would be useful?

This is one case plucked from the sort of simulation that I ran yesterday:

It shows how, as a result of CBR, a case that was in fact a “dead heat” can transmute into one in which the factfinder forms a supremely confident judgment that the facts supporting one side’s case The sad result of trying to do BCCM 403 balancing here...are “true.”

The source of the problem, of course, is that the very “first” item of proof had LR = 25, initiating a “likelihood ratio cascade” as reflected in the discrepancy between the "true" LRs—tLRs—and "biased" perceived LRs—pLRs—for each subsequent item of proof.

A judge applying the BCCM conception of Rule 403 would thus recognize that "item of proof No. 1" is injecting a huge degree of “prejudice” into the case. She should thus exclude proof item No. 1, but only if she concludes that doing so will diminish the accuracy of the outcome less than preventing the factfinder from giving this highly probative piece of evidence any effect whatsoever.

When the judge engages in this balancing, she will in fact observe that the effect of excluding that evidence distorts the accuracy of the outcome just as much as admitting it does--but in the opposite direction. In this simulated case, assigning item No. 1 an LR = 1—the formal effect of excluding it—now induces the factfinder to conclude that the odds against that party’s position being true are 5.9x10^2:1, or that that there is effectively a 0% chance that that party’s case is well-founded.

That’s because the very next item of proof has LR = 0.04 (the inverse of LR = 25), and thus triggers a form of “rolling confirmation bias” that undervalues every subsequent item of proof.

So if the judge were to exclude item No. 1 b/c of its tendency to excite CBR, she’d now face the same issue confronts her again in ruling on a motion to exclude item No. 2.

And guess what? If she assesses the impact of excluding that super probative piece of evidence (one that favored one party’s position 25x more than the other’s), she’ll again find that the “accuracy diminishing impact” of doing so is as high as not excluding: the remaining evidence in the case is configured so that the factfinder is impelled to a super-confident conclusion in favor of the first party once more!

And so forth and so on.

As this illustration should remind you, CBR also has the effect of making outcomes arbitrarily sensitive to the order of proof. 

Imagine item 1 and item 2 had been “encountered” in the opposite “order” (whether by virtue of the point at which they were introduced at trial, the relative salience of them to the factfinder as he or she reflected on the proof as a whole, or the role that post-trial deliberations had in determining the sequence with which particular items of proof were evaluated). 

The factfinder in that case would indeed have formed just as confident a judgment--but one in support of the opposite party:

Again, the judge will be confronted with the question whether the very “first” item of proof—what was item No. 2  in the last version of this illustration—should be excluded under Rule 403. When she works this out, moreover, she’ll end up discovering that Again, 403 balancing is impossible here--it is self-contradictory!the consequence of excluding it is the same as was the consequence of excluding item No. 1—LR = 25—in our alternative-universe version of the case: a mirror-image degree of confidence on the factfinder's part about the strength of the opposing party’s case.  And so  on and so forth.

See what’s going on?

The only way for the judge to assure that this case gets decided “accurately” is to exclude every single piece of evidence from the trial, remitting the jury to its priors—1:1—which, by sheer accident, just happened to reflect the posterior odds a “rational factfinder” would have ended up with after fairly assigning each piece of evidence its “true” LR.

Not much point having a trial at all under those circumstances!

Of course, the evidence, when properly considered, might have more decisively supported one side or the other.  But what a more dynamic simulation--one that samples from all the various distributions of case strength one cares to imagine-- shows us is that there’s still no guarantee the factfinder would have formed an accurate impression of the strength of the evidence in that cirucmstance either.

To assure an accurate result in such a cse, the judge, under the BCCM conception of the rules of evidence, would still have been obliged to try to deflect the accuracy-vitiating impact of CBR away from the factfinder’s appraisal of the evidence by Rule 403 balancing. 

And the pieces of evidence that the judge would be required in such a case to exclude would be the ones most entitled to be given a high degree of weight by a rational factfinder!  The impact of doing so would be to skew consideration of the remainder of the evidence without offsetting exclusions of similarly highly relevant pieces of proof. . . . 

Again, no point in even having  a trial if that’s how things are going to work. The judge should just enter judgment for the party she thinks “deserves” to win.

There is of course no reason to believe a judge could “cognitively fine-tune” a case with the precision that this illustration envisions.  But all that means is that the best a real judge can ever do will always generate an outcome that we have less reason to be confident is “right” than we would have had had the judge just decided the stupid case herself on the basis of her own best judgment of the evidence.

Of course, why should we assume the judge herself could make an accurate assessment, or reasonably accurate one, of the trial proof?  Won’t she be influenced by CBR too—in a way that distorts her capacity to do the sort of “marginal probative value vs. marginal prejudice” weighing that the BCCM conception of Rule 403 imagines?

If you go down this route, then you again ought to conclude that “rules of evidence are impossible” even without contemplating the uniquely malicious propensities of CBR.  Because if this is how you see things (Schauer 2006), there will be just as much reason to think that the judge’s performance of such balancing will be affected by all the other forms of cognitive bias that she is trying to counteract by use of BCCM’s conception of Rule 403 balancing.

I think that anxiety is in fact extravagant—indeed silly.

There is plenty of evidence that judges, by virtue of professionalization, develop habits of mind that reasonably insulate them from one or another familiar form of cognitive bias when the judges are making in-domain decisions—i.e., engaging in the sort of reasoning they are supposed to as judges (Kahan, Hoffman, et al. in press; Guthrie, Rachlinksi & Wistrich 2007) .

That’s how professional judgment works generally!

But now that I’ve reminded you of this, maybe you can see what the “solution” is to the “impossibility” of the rules of evidence?

Even a jurist with exquisite professional judgment cannot conceivably perform the kind of “cognitive fine-tuning” ‘envisioned by the “rules of evidence” -- the whole enterprise is impossible.

But what makes such fine tuning necessary in the first place is the law’s use of  non-professional decisionmakers divorced from any of the kinds of insights and tools that professional legal truthseekers would actually use.

Jurors aren’t stupid.  They are equipped with all the forms of practical judgment that they need to be successful in their everyday lives.

What's stupid is to think that making reliable assessments of fact in the artificial environment of a courtroom advesarial proceeding is one of the things everday life equips them to do. 

Indeed, it's absurd to think that that environment is conducive to the accurate determination of facts by anyone.

A procedural mechanism that was suited for accurately determining the sorts of facts relevant to legal determinations would have to look different from anything we see in in everyday life, b/c making those sorts of determinations isn't something that everyday life requires.

No more than than having to practice medicine, repair foreign automobiles, or write publicly accessible accounts of relativity is (btw, happy birthday Die Feldgleichungen der Gravitation).

Ordinary, sensible people rely on professionals -- those who dedicate themselves to acquiring expert knowledge and corresponding forms of reasoning proficiency -- to perform specialized tasks like these.

The “rules of evidence” are impossible because the mechanism we rely on to determine the “truth” in legal proceedings—an adversary system with lay factfinders—is intrinsically flawed. 

No amount of fine-tuning by “rules of evidence” will  ever make that system capable of delivering the accurate determinations of their rights and obligations that citizens of an enlightened democratic state are entitled to.

We need to get rid of the current system of adjudication and replace it with a professionalized system that avails itself of everything we know about how the world works, including how human beings reason and how they can be trained to reason when doing  specialized tasks.

And we need to replace, too, the system of legal scholarship that generates the form of expertise that consists in being able to tell  soothing, tranquilizing, narcotizing just-so stories about how well suited the “adversary system” would be for truth-seeking with just a little bit  more "cognitive fine tuining" to be implemented through the rules of evidence.

That element of our legal culture is as antagonistic to the goal of truth-seeking as any the myriad defects of the adversary system itself. . . .

The end!


Guthrie, C., Rachlinski, J.J. & Wistrich, A.J. Blinking on the bench: How judges decide cases. Cornell Law Rev 93, 1-43 (2007).

Kahan, D.M. The Economics—Conventional, Behavioral, and Political—of "Subsequent Remedial Measures" Evidence. Columbia Law Rev 110, 1616-1653 (2010).

Kahan, D.M., Hoffman, D.A., Evans, D., Devins, N., Lucci, E.A. & Cheng, K. 'Ideology'or'Situation Sense'? An Experimental Investigation of Motivated Reasoning and Professional Judgment. U. Pa. L. Rev. 164 (in press).

Kahan, D.M. Laws of cognition and the cognition of law. Cognition 135, 56-60 (2015).

Kamin, K.A. & Rachlinski, J.J. Ex Post ≠ Ex Ante - Determining Liability in Hindsight. Law Human Behav19, 89-104 (1995).

Lempert, R.O. Modeling Relevance. Mich. L. Rev. 75, 1021-57 (1977).

Pennington, N. & Hastie, R. A Cognitive Theory of Juror Decision Making: The Story Model. Cardozo L. Rev. 13, 519-557 (1991).

Schauer, F. On the Supposed Jury-Dependence of Evidence Law. U. Pa. L. Rev. 155, 165-202 (2006).

Scurich, N. The Dynamics of Reasonable Doubt. (Ph.D. dissertation, University of Southern California, 2012). 

Simon, D. A Third View of the Black Box: Cognitive Coherence in Legal Decision Making. Univ. Chi. L.Rev. 71, 511-586 (2004).

Simon, D., Pham, L.B., E, Q.A. & Holyoak, K.J. The Emergence of Coherence over the Course of Decisionmaking. J. Experimental Psych. 27, 1250-1260 (2001).


Check out wild & crazy "coherence based reasoning"! Are rules of evidence "impossible"?, part 2 (another report from Law & Cognition seminar)m 

If you want to do BCCM, you definitely should draw lots of little diagrams like thisThis is part 2 in a 3-part series, the basic upshot of which is that “rules of evidence” are impossible.

A recap. Last time I outlined a conception of “the rules of evidence” I called the “Bayesian Cognitive Correction Model” or BCCM.  BCCM envisions judges using the rules to “cognitively fine-tune” trial proofs in the interest of simulating/stimulating jury fact-finding more consistent with a proper Bayesian assessment of all the evidence in a case. 

Cognitive dynamics like hindsight bias and identity-protective cognition can be conceptualized as inducing the factfinder to over- or undervalue evidence relative to its “true” weight—or likelihood ratio (LR).  Under Rule 403, Judges should thus exclude an admittedly “relevant” item of proof (Rule 401: LR ≠ 1) when the tendency of that item of proof to induce jurors to over- or undervalue of other items of proof (i.e., to assign them LRs that differ from 1 more than they actually do) impedes verdict accuracy more than constraining the factfinder to assign the item of proof in question no weight at all (LR = 1).

“Coherence based reasoning”—CBR—is one of the kinds of cognitive biases a judge would have to use the BCCM strategy to contain..  This part in the series describes CBR and the distinctive threat it poses to rational factfinding in adjudication.

Today's episode. CBR can be viewed as an information-processing dynamic rooted in aversion to residual uncertainty.

Good study!A factfinder, we can  imagine, might initiate her assessment of the evidence in a reasonably unbiased fashion, assigning modestly probative pieces of evidence more or less the likelihood ratios they are due.

But should she encounter a piece of evidence that is much more consistent with one party’s position, the resulting confidence in that party’s case (a state that ought to be only provisional, in a Bayesian sense) will dispose her to assign the next piece of evidence a likelihood ratio supportive of the same inference—viz., that that party’s position is “true.”  As a result, she’ll be all the more confident in the merit of that party’s case—and thus all the more motivated to adjust the weight assigned the next piece of evidence to fit her “provisional” assessment, and so forth and so on  (Carlson & Russo 2001). 

Once she has completed her evaluation of trial proof, moreover, she will be motivated to revisit earlier-considered pieces of evidence, readjusting the weight she assigned them so that they now fit with what has emerged as the more strongly supported position ( (Simon, Pham, Quang & Holyoak 2001; Holyoak & Simon; Pennington & Hastie 1991). When she concludes, she will necessarily have formed an inflated assessment of the probability of the facts that support the party whose “strong” piece of evidence initiated this “likelihood ratio cascade.”

What does this matter?

Well, to start, in the law, the party who bears the “burden of proof” will often be entitled to win only if she establishes the facts essential to her position to a heightened degree of certainty like “beyond a reasonable doubt.”  One practical consequence of the overconfidence associated with CBR, then, will be to induce the factfinder to decide in favor of a party whose evidence, if evaluated in an unbiased fashion, would not have satisfied the relevant proof standard (Simon 2004).  Indeed, one really cool set of experiments (Scurich 2012) suggests that "coherence based reasoning" effects might actually reflect a dissonance-avoidance mechanism that manifests itself in factfinders reducing the standard of proof after exposure to highly probative items of proof! 

But even more disconcertingly, CBR makes the outcome sensitive to the order in which critical pieces of evidence are considered (Carlson, Meloy & Russo 2006). 

A  piece of evidence that merits considerable weight might be assigned a likelihood ratio of  1 or < 1 if the factfinder considers it after having already assigned a low probability to the position it supports.  In that event, the evidence will do nothing to shake the factfinder’s confidence in the opposition position.

But had the factfinder considered that same piece of evidence “earlier”—before she had formed a confident estimation of the cumulative strength of the previously considered proof—she might well have given that piece of evidence the greater weight it was due. 

Once a BCCM practioner draws *this* diagram, though, she'll freak outIf that had happened, she would then have been motivated to assign subsequent pieces of proof likelihood ratios higher than they in fact merited. Likewise, to achieve a “coherent” view of the evidence as a whole, she would have been motivated to revisit and revise upward the weight assigned to earlier considered, equivocal items of proof.  The final result would thus have been a highly confident determination in exactly the opposite direction from the one she in fact reached.

This not the way things should work if one is engaged in Bayesian information processing—or at least any normatively defensible understanding of Bayesian information processing geared to reaching an accurate result!

Indeed, this is the sort of spectacle that BCCM directs the judge to preempt by the judicious use of Rule 403 to exclude evidence the “prejudicial” effect of which “outweighs” its “probative value.”

But it turns out that using the rules of evidence to neutralize CBR in that way is IMPOSSIBLE!

Why? I’ll explain that in Part 3!

# # #

But right now I’d like to have some more, “extra-credit”/“optional” fun w/ CBR! It turns out it is possible & very enlightening to create a simulation to model the accuracy-annihilating effects I described above.

Actually, I’m just going to model a “tame” version of CBR—what Carlson & Russo call “biased predecisional processing.” Basically, it’s the “rolling confirmation bias” of CBR without the “looping back” that occurs when the factfinder decides for good measure to reassess the more-or-less unbiased LRs she awarded to items of proof before she became confident enough to start distorting all the proof to fit one position. 

Imagine that a factfinder begins with the view that the “truth” is equally likely to reside in either party’s case—i.e., prior odds of 1:1. The case consists of eight “pieces” of evidence, four pro-prosecutor (likelihood ratio > 1) and four pro-defendant (likelihood ratio <1). 

The factfinder makes an unbiased assessment of the “first” piece of evidence she considers, and forms a revised assessment of the odds that reflects its “true” likelihood ratio.  As a result of CBR, however, her assessment of the likelihood ratio of the next piece of evidence—and every piece thereafter—will be biased by her resulting perception that one side’s case is in fact “stronger” than the other’s.

To operationalize this, we need to specify a “CBR factor” of some sort that reflects the disposition of the factfinder to adjust the likelihood ratios of successive pieces of proof up or down to match her evolving (and self-reinforcing!) perception of the strength disparity in the parties’  the party’s case.

Imagine the factfinder misestimates the likelihood ratio of all pieces evidence by a continuous amount that results in her over-valuing or under-valuing an item of proof by a factor of 2 at the point she becomes convinced that the odds in favor of one party’s position rather than the other’s position being “true” has reached 10:1.

What justifies selecting this particular “CBR factor”? Well, I suppose nothing, really, besides that it supplies a fairly tractable starting point for thinking critically about the practical upshot of CBR. 

But also, it’s cool to use this function b/c it reflects a “weight of the evidence” metric developed by Turing and Good to help them break the Enigma code! 

For Turing and Good, a piece of evidence with a likelihood ratio of 10 was judged to have a weight of “1 ban.” They referred to a piece of proof that had a likelihood ratio 1/10 that big as a “deci-ban”—and were motivated to use that as the fundamental unit of evidentiary currency in their code-breaking system based on their seat-of-the-pants conjecture that a “deciban” was the smallest shift in the relative likelihoods of two hypotheses that human beings could plausibly perceive (Good 1985). 

So with this “CBR factor,” I am effectively imputing to the factfinder a disposition to “add to”  (or subtract from) an item of proof one “deciban”—the smallest humanly discernable “evidentiary weight,” in Turing and Good’s opinion—for every 1-unit increase (1:1 to 2:1; 2:1 to 3:1, etc.) or (decrease--1:1 to 1:2; 1:3 to 1:4) in the “odds” of that party’s position being true.

And this figure illustrates how this distorting potential can be affected by CBR generally:

In the “unbiased” table, “prior” reflects the factfinder’s current estimate of the probability of the “prosecutor’s” position being true, and “post odds” the revised estimate based on the weight of the current “item” of proof, which is assigned the likelihood ratio indicated in the “LR” column.  The “post %” column transforms the revised estimate of the probability of “guilt” into a percentage. 

I’ve selected an equal number of pro-prosecution (LR >1) and pro-defense (LR<1) items of proof, and arranged them so they are perfectly offsetting—resulting in a final estimate of guilt of 1:1 or 50%.

In the “coherence based reasoning” table, “tLR” is the “true likelihood ratio” and “pLR” the perceived likelihood ratio assigned the current item of proof. The latter is derived by applying the CBR factor to the former.  When the odds are 1:1, CBR is 1, resulting in no adjustment of the weight of the evidence. But as soon as the odds shift in one party’s favor, the CBR factor biases the assessment of the next item of proof accordingly.

As can be seen, the impact of CBR in this case is to push the factfinder to an inflated estimate of the strength of the prosecution’s  position being true, which the factfinder puts at 29:1 or 97% by the “end” of the case.

But things could have been otherwise. Consider:

I’ve now swapped the “order” of proof items “4” and “8,” respectively.  That doesn't make any difference, of course, if one is "processing" the evidence they way a Bayesian would; but it does if one is CBRing.

The reason is that the factfinder now “encounters” the defendant’s strongest item of proof -- LR = 0.1—earlier than the prosecution’s strongest—LR = 10.0.

Indeed, it was precisely because the factfinder encountered the prosecutor’s best item of proof “early” in the previous case that she was launched into a self-reinforcing spiral of overvaluation that made her convinced that a dead-heat case was a runaway winner for the prosecutor.

The effect when the proof is reordered this way is exactly the opposite: a devaluation cascade that convinces the factfinder that the odds in favor of the prosecutor’s case are infinitesimally small!

These illustrations are static, and based on “pieces” of evidence with stipulated LRs “considered” in a specified order (one that could reflect the happenstance of when particular pieces register in the mind of the factfinder, or are featured in post-trial deliberations, as well as when they are “introduced” into evidence at trial—who the hell knows!).

But we can construct a simulation that randomizes those values in order to get a better feel for the potentially chaotic effect that CBR injects into evidence assessments. 

The simulation constructs trial proofs for 100 criminal cases, each consisting of eight pieces of evidence. Half of the 800 pieces of evidence reflect LRs drawn randomly from a uniform distribution between 0.05 and 0.95; these are “pro-defense” pieces of evidence. Half reflect LRs drawn randomly from a uniform distribution between 1.05 and 20. They are “pro-prosecution” pieces.

We can then compare the “true” strength of the evidence in the 100 cases —the probability of guilt determined by Bayesian weighting of each one’s eight pieces of evidence—to the “biased” assessment generated when the likelihood ratios for each piece of evidence are adjusted in a manner consistent with CBR.

This figure compares the relative distribution of outcomes in the 100 cases:


As one would expect, a factfinder whose evaluation is influenced by CBR will encounter many fewer “close” cases than will one that engages in unbiased Bayesian updating.

This tendency to form overconfident judgments will, in turn, affect the accuracy of case outcomes.  Let’s assume, consistent with the “beyond a reasonable doubt” standard, that the prosecution is entitled to prevail only when the probability of its case being “true” is ≥ 0.95.  In that case, we are likely to see this sort of divergence between outcomes informed by rational information processing and outcomes informed by CBR:


The overall “error rate” is “only” about 0.16.  But there are 7x as many incorrect convictions as incorrect acquittals.  The "false conviction" rate is 0.21, wheras the "false acquittal" rate is 0.04....

The reason for the asymmetry between false convictions and false acquittals is pretty straightforward. In the CBR-influenced cases, there are a substantial number of “close” cases that factfinder concluded “strongly” supported one side or the other. Which side—prosecution or defendant—got the benefit of this overconfidence is roughly equally divided.  However, a defendant is no less entitled to win when the factfinder assesses the strength of the evidence to be 0.5 or 0.6 than when the factfinder assesses the strength of the evidence as 0.05 or 0.06.  Accordingly, in all the genuinely “close” cases in which CBR induced the factfinder to form an overstated sense of confidence in the weakness of the prosecution’s case, the resulting judgment of “acquittal” was still the correct one.  But by the same token, the result was incorrect in every close case in which CBR induced the factfinder to form an exaggerated sense of confidence in the strength of the prosecution’s case.  The proportion of cases, in sum, in which CBR can generate a “wrong” answer is much higher in ones that defendants deserve to win than in ones in which the prosecution does.

This feature of the model is an artifact of the strong “Type 1” error bias of the “beyond a reasonable doubt” standard.  The “preponderance of the evidence” standard, in contrast, is theoretically neutral between “Type 1” and “Type 2” errors.  Accordingly, were we treat the simulated cases as “civil” rather than “criminal” ones, the false “liability” outcomes and false “no liability” ones would be closer to the overall error rate of 16%.

Okay, I did this simulation once for 100 cases.  But let’s do it 1,000 times for 100 cases—so that we have a full-blown Monte Carlo simulation of the resplendent CBR at work!

These are the kernel distributions for the “accurate outcome” “false acquittal,” and “false conviction” rates over 1000 trials of 100 cases each:

Okay—see you later!


Carlson, K.A., Meloy, M.G. & Russo, J.E. Leader‐driven primacy: using attribute order to affect consumer choice. Journal of Consumer Research 32, 513-518 (2006).

Carlson, K.A. & Russo, J.E. Biased interpretation of evidence by mock jurors. Journal of Experimental Psychology: Applied 7, 91-103 (2001)

I.J. Good, Weight of Evidence: A Brief Survey, in Bayesian Statistics 2: Proceedings of the Second Valencia International Meeting (J.M. Bernardo, et al. eds., 1985).

Keith J. Holyoak & Dan Simon, Bidirectional Reasoning in Decision Making by Constraint Satisfaction,  J. Experimental Psych. 128, 3-31 (1999).

Kahan, D.M. Laws of cognition and the cognition of law. Cognition 135, 56-60 (2015). 

Pennington, N. & Hastie, R. A Cognitive Theory of Juror Decision Making: The Story Model. Cardozo L. Rev. 13, 519-557 (1991).

Simon, D. A Third View of the Black Box: Cognitive Coherence in Legal Decision Making. Univ. Chi. L.Rev. 71, 511-586 (2004).

Scurich, N. The Dynamics of Reasonable Doubt. (Ph.D. dissertation, University of Southern California, 2012). 

Simon, D., Pham, L.B., E, Q.A. & Holyoak, K.J. The Emergence of Coherence over the Course of Decisionmaking. J. Experimental Psych. 27, 1250-1260 (2001).

CBR ... frankenstein's monster of law & psychology...



Report from "Law & Cognition" class: Are “rules of evidence impossible”? Part 1 

Well, I didn't do a good job of sharing the to & fro of this semester's Law & Cognition seminar w/ the 14 billion of you who signed up to take the coure on-line. I'm happy to refund your enrollment fees--I actually parleyed them into a sum 10^3 x as large by betting incredulous behavioral economists that P(H|HHH) < P(H) when sampling from finite sequences w/o replacement-- but stay tuned & I'll try to fill you in over time...

If you’re a Bayesian, you’ll easily get how the Federal Rules of Evidence work. 

But if you accept that “coherence based reasoning” characterizes juries’ assessments of facts (Simon, Pham, Quang & Holyoak 2001; Carlson & Russo 2001), you’ll likely conclude that administering the Rules of of Evidence is impossible.

Or so it seems to me.  I’ll explain but it will take some time—about 3 posts’ worth.

The "Rules of Evidence Impossibility Proof"--Paaaaaaart 1!

There are really only two major rules of evidence. There are a whole bunch of others but they are just variations on a theme.

The first is Rule 401, which states that evidence is “relevant” (and hence presumptively admissible under Rule 402) if it “has any tendency to make a fact  [of consequence to the litigation] more or less probable” in the assessment of a reasonable factfinder.

As Richard Lempert observed (1977) in his classic paper Modeling Relevance, Rule 401 bears a natural Bayesian interpretation.

The “likelihood ratio” rendering of Bayes’s Theorem—Posterior odds = Prior odds x Likelihood Ratio—says that one should update one’s existing or “prior” assessment of the probability of some hypothesis (expressed in odds) by a factor that reflects how much more consistent the new information is with that hypothesis than with some rival hypothesis.  If this factor—the likelihood ratio—is greater than one, the probability of the hypothesis increases; if it is less than one, it decreases.

Accordingly, by defining as “relevant” any evidence that gives us reason to treat a “fact of consequence” as “more or less probable,” Rule 401 indicates that evidence should be treated as relevant (and thus presumptively admissible) so long as it has a likelihood ratio different from 1—the factor by which one should revise one’s prior odds when new evidence is equally consistent with the hypothesis and with its negation.


Second is Rule 403, which states that “relevant evidence” should be excluded if its “probative value is substantially outweighed by . . . unfair prejudice.”  Evidence is understood to be “unfairly prejudicial” when (the Advisory Committee Notes tell us) it has a “tendency to suggest decision on an improper basis.” 

There’s a natural Bayesian rendering of this concept, too: because the proper basis for decision reflects the updating of one’s priors by a factor equal to the product of the likelihood ratios associated with all the (independent) items of proof, evidence is prejudicial when it induces the factfinder to weight items of proof inconsistent with their true likelihood ratios

Lempert crica 1977 (outside Studio 54, during break from forensic science investigation of then-still unsolved Son of Sam killing spree)An example would be evidence that excites a conscious intention—born perhaps of animus, or alternatively of sympathy—to reach a particular result regardless of the Bayesian import of the proof in the case.

More interestingly, a piece of evidence might be “unfairly prejudicial” if it triggers some unconscious bias that skews the assignment of the likelihood ratio to that or another piece of evidence (Gold 1982).

E.g., it is sometimes said (I think without much basis) that jurors “overvalue” evidence of character traits—that is, that they assign to a party’s disposition a likelihood ratio, or degree of weight, incommensurate with what it is actually due when assessing the probability that the party acted in a manner that reflected such a disposition on a particular occasion (see Fed. R. Evid. 404).

Or the “unfairly prejudicial effect” might consist in the tendency of evidence to excite cognitive dynamics that bias the weight assigned other pieces of evidence (or all of it).  Evidence that an accident occurred, e.g., might trigger  “hindsight bias,” causing the factfinder to assign more weight than is warranted to evidence that bears on how readily that accident could have been foreseen before its occurrence (Kaman & Rachlinski 1995).

By the same token, evidence that excites “identity-protective cognition” might unconsciously motivate a factfinder to selectively credit or dismiss (i.e., opportunistically adjust the likelihood ratio of) all the evidence in the case in a manner geared to reaching an outcome that affirms rather than denigrates the factfinder’s cultural identity (Kahan 2015).

Rule 403 directs the judge to weigh probity and prejudice.

Again, there’s a Bayesian rendering: a court should exclude a “relevant” item of proof as “unfairly prejudicial” when the marginal distortion of accuracy associated with the incorrect likelihood ratio that admitting it will induce the factfinder to assign to that or any other items of proof is bigger than the marginal distortion of accuracy associated with constraining the factfinder to assign that item of proof a likelihood ratio of 1, which is the practical effect of excluding it (Kahan 2010).  

click me & behold what it looks like to do Bayesian analysis of evidence rules *after* emerging from a night of partying at Studio 54 circa 1977!If you work this out, you’ll see (perhaps counterintuitively, perhaps not!) that courts should be much more reluctant to exclude evidence on Rule 403 grounds in otherwise close cases. As cases become progressively closer, the risk of error associated with under-valuing (by failing to consider) relevant evidence increases faster than the risk of error associated with over-valuing that or other pieces of evidence: from the point of view of deciding a case, being “ovderconfident” is harmless so long as one gets the right result. Likewise the risk that admitting "prejudicial" evidence will result in error increases more rapidly as the remaining proof becomes weaker: that's the situation in which a facfinder is most likely to decide for a party that she wouldn't have but for her biased over-valuing of that item of proof or others (Kahan 2010).

For an alternative analysis, consider Friedman (2003). I think he's wrong but for sure maybe I am! You tell me!

The point is how cool it is-- how much structure & discipline it adds to the analysis-- to conceptualize Rules of Evidence as an instrument for closing the gap between what a normatively desirable Bayesian assessment of trial proof would yield and what a psycholigically realistic account of human information processing tells us to expect (someday, of coures, we'll replace human legal decisionmakers with AI evidence-rule robots! but we aren't quite there yet ...).

Let's call this approach to understanding/perfecing evidence law the "Bayesian Cognitive Correction Model" (BCCM).

But is BCCM itself psychologically realistic?  

Is it plausible to to think a court can reliably “maximize” the accuracy of adjudication by this sort of cognitive fine-tuning of the trial proof?

Not if you think that coherence-based reasoning  (CBR) is one of the reasoning deficiencies that a court needs to anticipate and offset by this strategy.

I’ll describe how CBR works in part 2 of this series—and then get to the “impossibility proof” in part 3!


Carlson, K.A. & Russo, J.E. Biased interpretation of evidence by mock jurors. Journal of Experimental Psychology: Applied 7, 91-103 (2001).

Friedman, R.D. Minimizing the Jury Over-valuation Concern. Mich. State L. Rev. 2003, 967-986 (2003).

Gold, V.J. Federal Rule of Evidence 403: Observations on the Nature of Unfairly Prejudicial Evidence. Wash. L. Rev. 58, 497 (1982).

Kahan, D.M. The Economics—Conventional, Behavioral, and Political—of "Subsequent Remedial Measures" Evidence. Columbia Law Rev 110, 1616-1653 (2010).

Kahan, D.M. Laws of cognition and the cognition of law. Cognition 135, 56-60 (2015).

Kamin, K.A. & Rachlinski, J.J. Ex Post ≠ Ex Ante - Determining Liability in Hindsight. Law Human Behav 19, 89-104 (1995).

Lempert, R.O. Modeling Relevance. Mich. L. Rev. 75, 1021-57 (1977).

Simon, D., Pham, L.B., E, Q.A. & Holyoak, K.J. The Emergence of Coherence over the Course of Decisionmaking. J. Experimental Psych. 27, 1250-1260 (2001).


My remote post-it notes for my HLS African-American teachers


ISO: A reliable & valid public "science literacy" measure

From revision to “Ordinary Science Intelligence”: A Science-Comprehension Measure for Study of Risk and Science Communication, with Notes on Evolution and Climate Change . . . .

 2. What and why?

The validity of any science-comprehension instrument must be evaluated in relation to its purpose. The quality of the decisions ordinary individuals make in myriad ordinary roles—from consumer to business owner or employee, from parent to citizen—will depend on their ability to recognize and give proper effect to all manner of valid scientific information (Dewey 2010; Baron 1993). It is variance in this form of ordinary science intelligence—and not variance in the forms or levels of comprehension distinctive of trained scientists, or the aptitudes of prospective science students—that OSI_2.0 is intended to measure.

This capacity will certainly entail knowledge of certain basic scientific facts or principles. But it will demand as well various forms of mental acuity essential to the acquisition and effective use of additional scientific information. A public science-comprehension instrument cannot be expected to discern proficiency in any one of these reasoning skills with the precision of an instrument dedicated specifically to measuring that particular form of cognition. It must be capable, however, of assessing the facility with which these skills and dispositions are used in combination to enable individuals to successfully incorporate valid scientific knowledge into their everyday decisions.

A valid and reliable measure of such a disposition could be expected to contribute to the advancement of knowledge in numerous ways. For one thing, it would facilitate evaluation of science education across societies and within particular ones over time (National Science Board 2014). It would also enable scholars of public risk perception and science communication to more confidently test competing conjectures about the relevance of public science comprehension to variance in—indeed, persistent conflict over—contested risks, such as climate change (Hamilton 2011; Hamilton, Cutler & Shaefer 2012), and controversial science issues such as human evolution (Miller, Scott & Okamoto 2006). Such a measure would also promote ongoing examination of how science comprehension influences public attitudes toward science more generally, including confidence in scientific institutions and support for governmental funding of basic science research (e.g., Gauchat 2011; Allum, Sturgis, Tabourazi, & Brunton-Smith 2008). These results, in turn, would enable more critical assessments of the sorts of science competencies that are genuinely essential to successful everyday decisionmaking in various domains—personal, professional, and civic (Toumey 2011).

In fact, it has long been recognized that a valid and reliable public science-comprhension instrument would secure all of these benefits. The motivation for the research reported in this paper is widespread doubt among scholars that prevailing measures of public “science literacy” possess the properties of reliability and validity necessary to attain these ends (e.g., Stocklmayer & Bryant 2012; Roos 2012; Guterbock et al. 2011; Calvo & Pardo 2004). OSI_2.0 was developed to remedy these defects.

The goal of this paper is not only to apprise researchers of OSI_2.0’s desirable characteristics in relation to other measures typically featured in studies of risk and science communication. It is also to stimulate these researchers and others to adapt and refine OSI_2.0, or simply devise a superior alternative from scratch, so that researchers studying how risk perception and science communication interact with science comprehension can ultimately obtain the benefit of a scale more distinctively suited to their substantive interests than are existing ones.


Allum, N., Sturgis, P., Tabourazi, D. & Brunton-Smith, I. Science knowledge and attitudes across cultures: a meta-analysis. Public Understanding of Science 17, 35-54 (2008).

Baron, J. Why Teach Thinking? An Essay. Applied Psychology 42, 191-214 (1993).

Dewey, J. Science as Subject-matter and as Method. Science 31, 121-127 (1910).

Gauchat, G. The cultural authority of science: Public trust and acceptance of organized science. Public Understanding of Science 20, 751-770 (2011).

Hamilton, L.C. Education, politics and opinions about climate change evidence for interaction effects. Climatic Change 104, 231-242 (2011).

Hamilton, L.C., Cutler, M.J. & Schaefer, A. Public knowledge and concern about polar-region warming. Polar Geography 35, 155-168 (2012).

Miller, J.D., Scott, E.C. & Okamoto, S. Public acceptance of evolution. Science 313, 765 (2006).

National Science Board. Science and Engineering Indicators, 2014 (National Science Foundation, Arlington, Va., 2010).

Pardo, R. & Calvo, F. The Cognitive Dimension of Public Perceptions of Science: Methodological Issues. Public Understanding of Science 13, 203-227 (2004).

Roos, J.M. Measuring science or religion? A measurement analysis of the National Science Foundation sponsored science literacy scale 2006–2010. Public Understanding of Science (2012).

Stocklmayer, S. M., & Bryant, C. Science and the Public—What should people know?, International Journal of Science Education, Part B, 2(1), 81-101 (2012)


The "living shorelines" science communication problem: individual cognition situated in collective action

Extending its Southeast Florida Evidence-based Science Communication Initiative, CCP is embarking on a field-research project on "living shoreline" alternatives/supplements to "hardened armoring" strategies for offsetting the risks of of rising sea levels. The interesting thing about the project (or one of the billion interesting things about it) is that it features the interaction of knowledge and expectations.  

"Living shorelines" offer the potential for considerable collective benefits.  But individuals who learn of these potential benefits will necessarily recognize that the benefit they can expect to realize from taking or supporting action to implement this strategy is highly contingent on the intention of others to do the same. Accordingly, "solving" this "communication problem" necessarily involves structuring acommunication process in which parties learn simultaneously about both the utility of "living shorelines" and the intentions of other parties to contribute to implementing them.

The project thus highlights one of the central features of the "science of science communication" as a "new political science": its focus not only on promoting clarity of exposition and public comprehension but on attending as well to the myriad social processes by which members of the public come to know what's known by science and give it due effect in their lives.

Elevating “Living Shorelines” with Evidence-based Science Communication

1. Overview. The urgency of substantial public investments to offset the impact of rising sea levels associated with climate change is no longer in a matter of contention for coastal communities in Florida.  What remains uncertain is only the precise form of such undertakings.

This project will use evidence-based science communication to enrich public engagement with “living shoreline” alternatives (e.g., mangrove habitats, oyster beds, dune and wetland restoration)  for “hardened armoring” strategies (concrete seawalls, bunkers, etc.). “Living shorelines” offer comparable protection while avoid negative environmental effects--beachfront erosion, the loss of shoreline vegetation, resulting disruption of natural ecosystems, and visual blight—that themselves diminish community wellbeing.  The prospect that  communities in Southern Florida will make optimal use of “living shorelines,” however, depends on cultivating awareness of their myriad benefits among a diffuse set of interlocking public constituencies.  The aim of the proposed initiative is to generate the forms of information and community interactions necessary to enable “living shorelines” to assume the profile that it should in ongoing democratic deliberations over local climate adaptation. . . .

3. Raising the profile of “living shorelines.” There are numerous living shoreline” alternatives to hardened armoring strategies. Mangroves—densely clumped shrubs of thick green shoots atop nests of partially submerged roots—have traditionally combatted the impact of rising sea levels by countering erosion and dissipating storm surges. Coral reefs furnish similar protection. Sand dunes provide a natural fortification, while wetland restorations create a buffer. There are also many “hybrid” strategies such as rutted walls congenial to vegetation, and rock sills supportive of oyster beds.  These options, too, reduce reliance on the forms of hardened armoring that impose the greatest ecological costs.

As a policy option, however, living shoreline strategies face two disadvantages. The first is the longer time horizon for return on investment. A concrete seawall begins to generate benefits immediately, while natural-shoreline alternatives attain maximum benefit only after a period of years.  This delay in value is ultimately offset by the need to augment or replace hardened armoring as sea levels continue to rise; the protective capacity of natural barriers “rise” naturally along with sea-level and thus have a longer lifespan. However, the natural bias of popular political processes to value short over long-term gains and to excessively discount future costs handicaps “living shorelines” relative to its competitors.

The second is the diffuse benefits that living shorelines confer. Obviously, they protect coastal property residents. But they also confer value on a wide-range of third parties—individuals who enjoy natural beach habitats, but also businesses such as tourism and the fishing that depended on the ecological systems disrupted by armoring. 

In addition, the value of coastal property will often be higher in a community that makes extensive use of “living shorelines”, which tend to be more aesthetically pleasing then concrete barriers and bunkers.  But the individual property owner who invests in erecting and maintaining a living shoreline alternative won’t enjoy this benefit unless other owners in his or her residential area take the same action.  As with any public good, the private incentive to contribute will lag behind the social benefit.

The remedy for overcoming these two challenges is to simultaneously widen and target public appreciation of the benefits of  natural shoreline protections. The constituencies that would enjoy the externalized benefits of natural shoreline strategies—particularly the commercial ones—must be alerted to the stake they have in the selection of this form of coastal property protection.  Likewise, business interests, including construction firms, must furnished with a vivid appreciation of the benefits they could realize by servicing the demand for “living shorelines” protections, including both their creation and their maintenance.  Recognizing that local coastal property owners lack adequate incentives to invest in natural coastline protections on their own, these interests could be expected to undertake the burden of advocating supplemental public investments. The voice of these groups in public deliberations will help to offset the natural tendency of democratic processes to overvalue short- over longer-term interests—as would the participation of financial institutions and other actors that naturally discount the current value of community assets and business appropriately based on the anticipated need for future infrastructure support. The prospect of public subsidies can in turn be used to reinforce the incentives of local property owners, whose consciousness of the prospect of widespread use of natural shoreline protections will supply them with motivation to support public provisioning and to make the necessary personal investments necessary to implement this form of climate adaptation.

The project is geared toward stimulating these processes of public engagement.  By furnishing the various constituencies involved with the forms of information most suited to enabling their recognition of the benefits of natural shoreline strategies, the project will elevate the profile of this strategy and put it on an equal footing with hardened armoring in public deliberations aimed at identifying the best, science-informed policies for protecting communities from rising sea levels and other climate impacts.

4.  Evidence-based science communication and living shorelines. . . . .

[T]he challenge of elevating the profile of “living shorelines” features the same core structural elements that have been the focus of CCP’s science-communication support research on behalf of Southeast Florida Regional Climate Compact. Science communication, this work suggests, should be guided by a “multi-public” model.  First are proximate information evaluators: typically government decisionmakers, their primary focus is on the content of policy-relevant science. Next are intermediate evaluators, who consist largely of organized nongovernmental groups, including ones representing formal and informal networks of local businesses, local property owners, and environmental and conservation organizations: their focus is primarily on how proposed policies affect their distinctive goals and interests. Finally there are remote evaluators: ordinary citizens, whose engagement with policy deliberations is only intermittent and who use heuristic strategies to assure themselves of the validity of the science that informs proposed policies.

The current project will use this model to guide development of communication materials suited to the public constituencies whose engagement is essential to elevating the deliberative profile of “living shorelines.”  Proximate evaluators here comprise the government officials—mainly county land use staff but also elected municipal officials—and also homeowners, including homeowner associations, in a position to make personal investments in “living shorelines” protections. With respect to these information consumers, the project would focus on maximizing comprehension  of the information embodied in TNC’s computer simulations. Existing research identifies systematic differences in how people engage quantitative information. Experimental studies would be conducted to fashion graphic presentation modes that anticipate these diverse information-processing styles.

The intermediate evaluators in this context consists of the wide range of private groups that stand to benefit indirectly from significant investment in “living shorelines.”  These groups will be furnished information in structured deliberations that conform to validated protocols for promoting open-minded engagement with scientific information. 

These sessions, moreover, will themselves be used to generate materials that can be used to develop information appropriate for remote evaluators. Research conducted by CCP in field-based science communication initiatives suggests that the most important cue that ordinary citizens use to assess major policy proposals is the position of other private citizens they view as social competent and informed and whose basic outlooks they share.  In particular, the attitude that these individuals evince through their words and actions vouches for the validity of policy-relevant science that ordinary members of the public do not have either the time or expertise to assess on their own.

From experience in previous evidence-based science communication projects, CCP has learned that interactions taking the form of the proposed structured deliberations among intermediate evaluators furnish a rich source of material for fashioning materials that can be used to perform this vouching function.  The participants in such deliberations are highly likely to possess the characteristics and backgrounds associated with the socially competent, knowledgeable sources whose vouching for policy-relevant science helps orient ordinary citizens.

Moreover, the participants in such sessions are likely to be socially diverse.  This feature of such sessions is highly desirable because the identity of individuals who perform this critical vouching function, work in and outside the lab confirms, varies across diverse cultural subcommunities. In addition, being able to see individuals who perform this role within one community deliberating constructively with their counterparts in others assures ordinary citizens from all of these communities that positions on the issue at hand are not associated with membership in competing cultural groups. This effect, CCP field research suggests, has been instrumental to the success of the diverse member communities of the Southeast Florida Climate Compact in protecting their deliberations from the influences that polarize citizens generally over climate change science.

Accordingly, using methods developed in earlier field work, CCP will use the intermediate evaluator deliberations to develop video and other materials that can be used to test how members of the public react as they learn about “living shorelines” as a policy option for their communities. The results of such tests can then be incorporated into communication materials geared to generating positive, self-reinforcing forms of interactions among the members of those communities.

Finally, evidence of the positive interactions of all these groups can be used to help form the state of shared expectations necessary to assure that “living shorelines” receive attention in public deliberation commensurate with the value they can confer on the well-being of communities that use this option. . . .


CCP Lab Meeting # 9073 ... 


Another day, another lecture

This one at Annenberg Public Policy Center last week, to discuss progress in one of our collaborative initiatives: evidence-based science documentary filmmaking.

We got to talk about the Pakistani Dr & Kentucky Farmer, of course, and also how much Krista would like a cool documentary on evolution.

Slides here.


Making sense of the " 'hot hand fallacy' fallacy," part 1

It never fails! My own best efforts (here & here) to explain the startling and increasingly notorious paper by Miller & Sanjurjo have prompted the authors to step forward and try to restore the usual state of perfect comprehension enjoyed by the 14.3 billion regular readers of this blog. They have determined, in fact, that it will take three separate guest posts to undo the confusion, so apparently I've carried out my plan to a [GV]T. 

As cool as the result of the M&S paper is, I myself remain fascinated by what it tells us about cognition, particularly among those with exquisitely fine-tuned statistical intuitions.  How did the analytical error they uncovered in the classic "hot hand fallacy" studies remain undetected for some thirty years, and why does it continue to provoke stubborn resistance on the part of very very smart people??  To Miller & Sanjurjo's credit, they have happily and persistently shouldered the immense burden of explication necessary to break the grip of the pesky intuition that their result "just can't be right!"

 Joshua B. Miller & Adam Sanjurjo

Thanks for the invitation to post here Dan!

Here’s our plan for the upcoming 3 posts:

  1.  Today’s plan: A bit of the history of the hot hand fallacy, then clearly stating the bias we find, explaining why it invalidates the main conclusion of the original hot hand fallacy study (1985), and further, showing that correcting for the bias flips the conclusion of the original data, so that it now can be used as evidence supporting the existence of meaningfully large hot hand shooting.

  2. Next post: Provide a deeper understanding of how the bias emerges.

  3. Final post: Go deeper into potential implications for research on the hot hand effect, hot hand beliefs, and the gambler’s fallacy.

Part I

In the seminal hot hand fallacy paper, Gilovich, Vallone and Tversky (1985; “GVT”, also see the 1989 Tversky & Gilovich “Cold Facts” summary paper) set out to conduct a truly informative scientific test of hot hand shooting. After studying two types of in game shooting data, they conducted a controlled shooting study (experiment) with the Cornell University men’s and women’s basketball teams. This was an effective "...method for eliminating the effects of shot selection and defensive pressure" that were present as confounds in their analysis of game data (we will return to the issue of game data in a follow up post; for now click to the first page of Dixit & Nalebuff’s 1991 classic book “Thinking Strategically”, and this comment on Andrew Gelman’s blog).  While the common use of the term “hot hand” shooting is vague and complex, everybody agrees that it refers to a temporary elevation in a player’s ability, i.e. the probability of a successful shot.  Because hot state is unobservable to the researcher (perhaps not the player/teammate/coach!), we cannot simply measure a player’s probability of success in the hot state; we need an operational definition.  A natural idea is to take a streak of sufficient length as a good signal of whether or not a player is in the hot state, and define a player as having the hot hand if his/her probability of success is greater after a streak of successful shots (hits), than after a streak of unsuccessful shots (misses).  GVT designed a test for this.

Adam Sanjurjo enjoying snacks in green room before Oprah Winfrey show appearanceSuppose we wanted to test whether Stephen Curry has the hot hand; how would we apply GVT’s test to Curry?  The answer is that we would have Curry attempt 100 shots at locations from which he is expected to have a 50% chance of success (like a coin).  Next, we would calculate Curry’s field goal percentage on those shots that immediately follow a streak of successful shots (hits), and test whether it is bigger than his field goal percentage on those shots that immediately follow a streak of unsuccessful shots (misses); the larger the difference that we observe, the stronger the evidence of the hot hand.  GVT performed this test on the Cornell players, and found that this difference in field goal percentages was statistically significant for only one of the 26 players (two sample t-test), which is consistent with the chance variation that the coin model predicts.

Now, one can ask oneself: if Stephen Curry doesn’t get hot, that is, for each of his 100 shot attempts he has exactly a 50% chance of hitting his next shot, then what would I expect his field goal percentage to be when he is on a streak of three (or more) hits? Similarly, what would I expect his field goal percentage to be when he is on a streak of three (or more) misses?

Following GVT’s analysis, one can form two groups of shots:

Group “3hits”: all shots in which the previous three shots (or more) were a hit,

Group “3misses”: all shots in which the previous three shots (or more) were a miss,

M&S working paper (5000th printing; currently sold out)From here, it is natural to reason as follows: if Stephen Curry always has the same chance of success, then he is like a coin, so we can consider each group of shots as independent; after all, each shot has been assigned at random either to one of three groups: “3hits,” “3misses,” or neither.  So far this reasoning is correct.  Now, GVT (implicitly) took this intuitive reasoning one step further: because all shots, which are independent, have been assigned at random to each of the groups, we should expect the field goal percentages to be the same in each group.  This is the part that is wrong.

Where does this seemingly fine thinking go wrong?  The first clue that there is a problem is that the variable that is being used to assign shots to groups is also showing up as a response variable in the computation of the field goal percentage, though this does not fully explain the problem.  The key issue is that there is a bias in how shots are being selected for each group.  Let’s see this by first focusing on the “3hits” group. Under the assumptions of GVT’s statistical test, Stephen Curry has a 50% chance of success on each shot, i.e. he is like a coin: heads for hit, and tails for miss.  Now, suppose we plan on flipping a coin 100 times, then selecting at random among the flips that are immediately preceded by three consecutive heads, and finally checking to see if the flip we selected is a heads, or a tails. Now, before we flip, what is the probability that the flip we end up selecting is a heads?  The answer is that this probability is not 0.50, but 0.46!  Herein lies the selection bias.  The flips that are being selected for analysis are precisely Joshua Miller, in Las Vegas after winning $5 million from economists who accepted his challenge to bet against P(H|HHH) < P(H) when sampling from finite sequence of coin tossesthe flips that are immediately preceded by three consecutive heads.  Now, returning to the world of basketball shots, this way of selecting shots for analysis implies that for the “3hits” group, there would be a 0.46 chance that the shot we are selecting is a hit, and for the “3misses” group, there would be a 0.54 chance that the shot we are selecting is a hit.

Therefore, if Stephen Curry does not get hot, i.e. if he always has a 50% chance of success for the 100 shots we study, we should expect him to shoot 46% after a streak of three or more hits, and 54% after a streak of three or more misses.  This is the order of magnitude of the bias that was built into the original hot hand study, and this is the bias that is depicted in Figure 2 on page 13 of our new paper, and a simpler version of this figure is below. This bias is large in basketball terms: a difference of more than 8 percentage points is nearly the difference between the median NBA Three Point shooter, and the very best.   Another way to look at this bias is to imagine what would happen if we were to invite 100 players to participate in GVT’s experiment, with each player shooting from positions in which the chance of success on each shot were 50%.  For each player check to see if his/her field goal percentage after a streak of three or more hits is higher than his/her field goal percentage after a streak of three or more misses.  For how many players should we expect this to be true? Correct answer: 40 out of 100 players. 

This selection bias is large enough to invalidate the main conclusion of GVT's original study, without having to analyze any data.  However, beyond this “negative” message, there is also a way forward.  Namely, we can re-analyze the original Cornell dataset, but in a way invulnerable to the bias.  It turns out that when  we do this, we find considerable evidence of the hot hand in this data. First, if we look at Table 4 in GVT (page 307), we see that, on average, players shot around 3.5 percentage points better when on a hit streak of three or more shots, and that 64% of the players shot better when on a hit streak than when on a miss streak. While GVT do not directly analyze these summary averages, given our Adam Sanjurjo Hermida, professional tennis player currently ranked 624th in world. Very hot hand predicted by M&S sometime in April 2016knowledge of the bias, they are telling (in fact, you can do much more with Table 4; see Kenny LJ respond to his own question here).  With the correct analysis (described in the next post), there is statistically significant evidence of the hot hand in the original data set, and, as can be seen in Table 2 on page 23 of our new paper, the point estimate of the average hot hand effect size is large (further details in our “Cold Shower” paper here). If one adjusts for the bias, what one now finds is that: (1) hitting a streak of three or more