follow CCP

Recent blog entries
popular papers

What Is the "Science of Science Communication"?

Climate-Science Communication and the Measurement Problem

Ideology, Motivated Cognition, and Cognitive Reflection: An Experimental Study

'Ideology' or 'Situation Sense'? An Experimental Investigation of Motivated Reasoning and Professional Judgment

A Risky Science Communication Environment for Vaccines

Motivated Numeracy and Enlightened Self-Government

Ideology, Motivated Cognition, and Cognitive Reflection: An Experimental Study

Making Climate Science Communication Evidence-based—All the Way Down 

Neutral Principles, Motivated Cognition, and Some Problems for Constitutional Law 

Cultural Cognition of Scientific Consensus
 

The Tragedy of the Risk-Perception Commons: Science Literacy and Climate Change

"They Saw a Protest": Cognitive Illiberalism and the Speech-Conduct Distinction 

Geoengineering and the Science Communication Environment: a Cross-Cultural Experiment

Fixing the Communications Failure

Why We Are Poles Apart on Climate Change

The Cognitively Illiberal State 

Who Fears the HPV Vaccine, Who Doesn't, and Why? An Experimental Study

Cultural Cognition of the Risks and Benefits of Nanotechnology

Whose Eyes Are You Going to Believe? An Empirical Examination of Scott v. Harris

Cultural Cognition and Public Policy

Culture, Cognition, and Consent: Who Perceives What, and Why, in "Acquaintance Rape" Cases

Culture and Identity-Protective Cognition: Explaining the White Male Effect

Fear of Democracy: A Cultural Evaluation of Sunstein on Risk

Cultural Cognition as a Conception of the Cultural Theory of Risk

Thursday
Dec102015

Disentanglement principle corollary no. 16a: "You don't have to choose ... between being a reality tv star & being excited to learn what science knows (including what it knows about how people come to know what's known by science)"

Sometimes 1 or 2 of the 14 billion regular followers of this blog ask, "are there really 14 billion reglar followers of this blog?..." 

Yeah. There really are!

Tuesday
Dec082015

"Hey Joe": "Practical scholarship" on climate "science communication"

Sorry for lack of context here, but my guess is that it will become clear enough after a few sentences.

Dear Joe:

I apologize for disparaging your work at the Society for Risk Analysis session yesterday.  You perceived my remarks that way, and on reflection I can see why you did, & why others likely formed the same impression.  I truly regret that.

In fact, it wasn’t your work that I meant to be criticizing. 

My intention was to respond to the argument you presented (with the admirable degree of clarity I wish I had been able to summon in response) in favor of “practical scholarship.”  Because you see, I don’t think the sort of work you defended is either practical or scholarly.

You  proposed to those in the room that the empirical study of climate science communication should be evaluated in light of its contribution to a “goal” of promoting a “world war II scale mobilization” of public opinion (I encourage you to post your slides; they were very well done). 

Research aimed at identifying the significance of values & science comprehension for public conflict on climate change (the subject of the panel we were both on; great new research unveiled by the Shi, Visschers, Siegrist team!) doesn’t meet this criterion, you made clear. Indeed, it detracts from it, because, in your opinion, it implies change will take a “long time” (I disagree it implies any such thing but that’s another matter).

As an example of research that is “practical,” you offered your own, which you characterized as aimed at convincing democratic representatives that their prospects for re-election depend on honoring the sorts of “public preferences” revealed by the structured preference-elicitation  methods you described.

You also stated that your work, along with that of others, is intended to “create cover” for officials to take positions supportive of climate change policies (a common refrain among researchers who generate endless streams of public opinion polls purporting to find that there is fact widespread public consensus for one or another climate change mitigation initiative). 

We should all pitch in to help acehieve this result, you exhorted.

Again, to be clear, my point is that this vision of empirical work on science communication is neither “scholarly” nor “practical.”

Scholarship—of the empirical variety, in any event—tries to help people figure out what’s true, particularly under conditions in which there are multiple plausible understandings of phenomena of consequence.  That’s what the scholarship on the relationship between “values” and “science literacy” that you disparaged is about.  The occasion for that scholarly inquiry is a practical one: to figure out what sorts of dynamics are blocking public engagement with the best available evidence on climate change.

What’s definitely not practical (as Theda Skocpol has noted) is to think that public opinion researchers can be mobilized into a project to “show” elected officials what the public “really” wants.

Elected officials are in the profession of satisfying the expectations of their constituents. They invest plenty of money, most of the time wisely, to figure out how to do that.

They know that surveys purporting to show that a “majority” of Republicans support “the EPA's greenhouse gas emission standards” are measuring non-opinion.   They know too that the sort of preference-elicitation methods you demonstrated—however truly valuable they might be for learning about cognition—are not modeling the decisionmaking dynamics that determine election outcomes. 

Most importantly, they know—because those who agree with your conception of “practical scholarship” are constantly proclaiming this-- that your goal is to create an impression in these actors for your own purposes: to help “shove” them into supporting a particular set of policies (enough with these “nudges” already, you inspiringly proclaimed: we are facing the moral equivalent of Hitler invading Europe!), not help them get re-elected. 

They know, in short, that “non-opinion” survey methods are actually intended to message them!  And I would have sort of thought this was obvious, but it’s not a very good “messaging strategy” to incessantly go on & on within earshot of Republicans about “strategies” for “overcoming” the “Republicans' cognitive resistance to climate mitigation.”

The targeted politicians (Democrat and Republican) therefore sensibly discount (ignore really) everything produced by researchers who are following this "message the politicians" strategy.  They listen instead to the professionals, who tell them something very different from what these "practical scholars" are saying (over & over & over; “keep repeating—that it hasn't worked yet is proof that we just need to do it for longer!,”--another refrain inside this bubble) .  Politicians who take what these researchers say at face value, they’ve observed, get knocked out of office. 

I believe there is plenty that science communication researchers  can do to help actual people, including elected officials, promote science-informed decisionmaking relating to climate change by collaborating with them to adapt and test lab insights to their real-world problems. 

The form of research that I think is best for that aims to help those decisionmakers change the meaning of climate change in their communities, so that discussions of it no longer are perceived as being about “whose side are you on” but instead about “what do we know, what more do we need to know, and what should we do.”

That research doesn't try to conjure a new world into existence by disseminatng "studies" that constantly purport to find it already exists. 

It tries to supply people who actually are acting to make such a world with empirical information that they can use to exercise their judgment as best as they can.

Indeed, what motivated my rebuke of you yesterday was frustration at how closely aligned the program you defended (very clearly, very articulately) is with divisive forms of partisan advocacy that actually perpetuate the social meanings that make climate change a “struggle for the soul of America” rather than a practical problem that all Americans, regardless of the cultural identities, have a common interest in fighting. 

Frustration too at how much the sort of "practical" "scholarship" you called for is distracting and diverting and confusing people who are looking to empirical researchers for help.

At how self-defeating it obviously is ever to propose that a criterion other than “figuring out & sharing one’s best understanding of the truth on contested empirical issues” could possibly be practical.   

How twisted it is to call that singularly unscientific orientation  “science communication” research!

It's pretty simple really: Tell people what they need to know, not what they want to hear

That’s both ethical and practical.

Again, sorry I disparaged your scholarly work, which I think can teach people a lot about how people think. 

The intended target was your conception of “practical scholarship.”  And I did very much intend to be critical of that view and of those who are propogating the mindset you very much evinced in your talk.

Yours,

Dan

 

p.s. My slides from talk on the challenge of "unconfounding" knowledge & identity in measuring "climate change science comprehension."

Wednesday
Dec022015

Mine goes to 11 ... or 10, at least, for now

What to do when stuck in Ft. Lauderdale airport b/c missing connecting flight to Keys?....

See what happens when the "Rules of Evidence Are Impossible CBR Simulator" is expanded from "8 item of proof" size cases to "10 item of proof" size ones!

Lots of people, no doubt thinking of the wildly popular "Miller-Sanjurjo Turing Machine" (MSTM), have been writing asking if a version of the CBR simulator will be made available for home use by CCPB subscribers... Stay tuned!

Tuesday
Dec012015

Cultural "fact polarization" trumps cultural "value" polarization -- a fragment

Working on this.  Rest "tomorrow."

1. The new politics of “fact polarization”

Polarization over questions of fact is one of the signature features of contemporary democratic political life.  Citizens divided over the relative weight of “liberty” and “equality” are less sharply divided today over the justice of progressive taxation (Moore 2015) than over the evidence that human  CO2 emissions are driving up global temperatures (Frankovic 2015).  Democrats and Republicans argue less strenuously about whether states should be permitted to require the "reading of the Lord's prayer" in school than whether permitting citizens to carry concealed handguns in public increases homicide rates—by multiplying the number of firearms in society—or instead decreases them by equipping law-abiding citizens to protect themselves from predation (Newport 2015).

Members of cultural groups that confer status to women for their mastery of domestic roles love their daughters as much as members of those who celebrate the world of commerce and public affairs as status-conferring arenas for men and women alike (Luker 1984). Yet the two cannot agree about the consequences of universally immunizing middle-school girls against the human papilloma virus: does that policy promote the girls’ health by protecting them later in life from an extremely prevalent sexually  transmitted disease linked to cervical cancer; or endanger them by lulling them into unprotected sex right now, thereby increasing their risks of becoming pregnant and of contracting other, even more deadly STDs (Kahan, Braman, Cohen, Gastil & Slovic 2010)?

These are admittedly complex questions.  But they are empirical ones. Values can’t supply the answers; only evidence can. The evidence that is relevant to any one of these factual issues, moreover, is completely distinct from the evidence relevant to any of the others.  There is simply no logical reason, in sum, for positions on these and various other policy-relevant facts (the safety of deep geologic isolation of nuclear wastes, the deterrent impact of the death penalty, the efficacy of invasive forms of surveillance to combat terrorism, etc.) to cluster at all, much less to form packages of beliefs that so strongly unite citizens of shared cultural commitments and so persistently divide citizens of opposing ones.

But there is a psychological explanation for today’s politics of “fact polarization.”  Or at least a very strong candidate explanation, the emergence of which has supplied an energizing focus for research and debate in the decision sciences over the course of the last decade. . . . 

Refs

Frankovic, K. Most republicans do not think humans are causing climate change. YouGov. (2015).

General Social Survey (2014).

Luker, K. Abortion and the politics of motherhood (University of California Press, Berkeley, 1984).

 

Saturday
Nov282015

Weekend update: Is critical reasoning domain independent or domain specific?... a fragment of an incomplete rumination

An adaptation of a piece of correspondence--one no longer, really, than this-- w/ a thoughtful person who proposed that people have "corrective mechanisms" for the kind of "likelihood ratio cascade" that I identified with "coherence based reasoning" and that I  asserted makes "rules of evidence" impossible:

What are these corrective mechanisms?

I ask not because I doubt they exist but because I suspect that they do -- & that their operation has evaded full understanding because of a mistaken assumption central to the contemporary study of cognition.

That assumption is that reasoning proficiencies--the capacity to recognize covariance, give proper effect to base rates, distinguish systematic relationships from chance co-occurrences, & perform like mental operations essential to making valid inferences--are more or less discrete, stand-alone "modules" within a person's cognitive repertoire.

If the modules are there, and are properly calibrated, a person will reliably summon them for any particular task that she happens to be doing that depends on that sort of mental operation.

Call this the "domain independent" conception (DI) of cognitive proficiency. DI is presupposed by standardized assessments like the Cognitive Reflection Test (Frederick 2005) and Numeracy (Peters et al. 2006), which purport to measure the specified latent reasoning capacities "in general," that is, abstracted from anything in particular one might use them for.

Another conception sees cognitive proficiency as intrinsically domain specific. On this view--call it the DS conception--it's not accurate to envision reasoning abilities of the sort I described as existing independently of the activities that people use them for (cf. Heatherington 2011).

Accordingly, a person who performs miserably in a context-free assessment of, say, the kind of logical-reasoning proficiency measured by an abstract version of a the Wason Selection Task-- one involving cards with vowels and numbers on either side -- might in fact always (or nearly always!) perform that sort of mental operation correctly in all the real-world contexts that she is used to encountering that require it. In fact, people do very well at the Wason Selection Task when it is styled as something more familiar--like detecting a norm violator (Gigenrenzer & Hug 1992).

In sum, reasoning proficiencies are not stand-alone modules but integral components of action-enabling mental routines that are reliably summoned to mind by a person's perception of the sorts of recurring problem situations those routines, including their embedded reasoning proficiencies, help her to negotiate.

DS is suspicious of standardized assessments, including the usual stylized word problems that are thought by decision scientists to evince one or another type of "cognitive bias."  By (very deliberately) effacing the contextual cues that summon to mind the mental routines and embedded reasoning proficiencies necessary to address recurring problem situations, such tests grossly overstate the "boundedness" of human rationality (Gigenrenzer 2000).

Indeed, by abstracting from any particular use to which people might put the reasoning proficiencies they are evaluating, such assessments and problems are actually measuring only how good people are at doing tests. In fact, people can train themselves to become very proficient at a difficult type of reasoning task for purposes of taking an exam on it and then evince complete innocence of that same sort of knowledge in the real-world settings where it actually applies (DiSessa 1982)!

DI and DS have different accounts of "expertise" in fields that involve reasoning tasks that are vulnerable to recurring cognitive biases. DI  identifies that expertise with the cultivation of general, context-free habits of mind that evince the disposition to use "conscious, effortful" ("system 2") forms of information processing (Sunstein 2005).

DS, in contrast, asserts that "expertise" consists in the possession of  mental routines, and their embedded reasoning proficiencies, specifically suited for specialized tasks. Those mental routines  include the calibration of rapid, intuitive, pre-conscious, affective forms of cognition (or better, recognition) that reliably alert the expert to the need to bring certain conscious, effortful mental operations to bear on the problem at hand. The proper integration of reciprocal forms of intuitive and conscious forms of cognition tailored to specialized tasks is the essence of professional judgment.

Nonexperts can be expected to display one or another bias when confronted with those same problems.  But the reason isn't that the nonexpert "thinks differently" from the expert; it's that the expert has acquired through training and experience mental routines suited to do things that are different from anything the ordinary person has occasions to do in his or her life  (Margolis 1987, 1993, 1996). 

Indeed, if one confronts an expert with a problem divorced from all the cues that reliably activate the cognitive proficiencies she uses when she performs professional tasks, one is likely to find that the expert, too, is vulnerable to all manner of cognitive bias.

But if one infers from that that the expert therefore can't be expected to resist those biases in her professional domain, one is making DI's signature mistake of assuming that reasoning proficiencies are stand-alone modules that exist independent of mental routines specifically suited for doing particular things  (cf. Kahan, Hoffman, Evans,Luci, Devins & Cheng in press) ....

Or that at leas is what a DS proponent would say.

She might, then, too agree that the reason-eviscerating quality of "coherence based reasoning" supplies us with grounds to professionalize fact-finding in legal proceedings.

Not because "jurors" or other "nonexperts" are "stupid." But because it is stupid to think that doing what is required to make accurate findings of fact in legal proceedings does not depend on the cultivation of habits of mind specifically suited for that task.

I tend to think the DS proponent comes closer to getting it right. But of course, I'm not really sure.

References

DiSessa, A.A. Unlearning Aristotelian Physics: A Study of Knowledge‐Based Learning. Cognitive science 6, 37-75 (1982).

Frederick, S. Cognitive Reflection and Decision Making. Journal of Economic Perspectives 19, 25-42 (2005).

Gigerenzer, G. Adaptive thinking : rationality in the real world (Oxford University Press, New York, 2000).

Gigerenzer, G. & Hug, K. Domain-specific reasoning: Social contracts, cheating, and perspective change. Cognition 43, 127-171 (1992). 

Hetherington, S.C. How to know : a practicalist conception of knowledge (J. Wiley, Chichester, West Sussex, U.K. ; Malden, MA, 2011).

Kahan, D.M., Hoffman, D.A., Evans, D., Devins, N., Lucci, E.A. & Cheng, K. 'Ideology'or'Situation Sense'? An Experimental Investigation of Motivated Reasoning and Professional Judgment. U. Pa. L. Rev. 164 (in press).

Margolis, H. Dealing with risk : why the public and the experts disagree on environmental issues (University of Chicago Press, Chicago, IL, 1996).

Margolis, H. Paradigms and Barriers (1993).

Margolis, H. Patterns, thinking, and cognition : a theory of judgment (University of Chicago Press, Chicago, 1987).

Peters, E., Västfjäll, D., Slovic, P., Mertz, C.K., Mazzocco, K. & Dickert, S. Numeracy and Decision Making. Psychol Sci 17, 407-413 (2006).

Sunstein, C.R. Laws of fear : beyond the precautionary principle (Cambridge University Press, Cambridge, UK ; New York, 2005). 

 

Wednesday
Nov252015

"Inherent internal contradictions" don't cause bad institutions to collapse; they just suck ... "Rules of evidence are impossible," part 3 (another report for Law & Cognition seminar)

Nope. Can't be done. Impossible.Time for part 3 of this series: Are Rules of Evidence Impossible?

The answer is yes, as I said at the very beginning.

But I didn’t say why & still haven’t.

Instead, I spent the first two parts laying the groundwork necessary for explanation.  Maybe you can build the argument on top of it yourself at this point?! If so, skip ahead to “. . . guess what?”—or even skip the rest of this post altogether & apply your reason to something likely to teach you something new!

But in the event you can’t guess the ending, or simply need your “memory refreshed” (see Fed. R. Evid. 612), a recap:

Where were we? In the first part, I described a conception of the practice of using “rules of evidence”—the Bayesian Cognitive Correction Model (BCCM). 

BCCM conceives of rules of evidence as instruments for “cognitively fine tuning” adjudication. By selectively admitting and excluding items of proof, courts can use the rules to neutralize the accuracy-diminishing impact of one or another form of of biased information processing--from identity-protective reasoning to the availability effect, from hindsight bias to baserate neglect, etc.  The threat these dynamics pose to accurate factfinding is their tendency to induce the factfinder to systematically misestimate the weight, or in Bayesian terms the “likelihood ratio” (LR), to be assigned items of proof (Kahan 2015). 

In part 2, I discussed a cognitive dynamic that has that sort of consequence: “coherence based reasoning” (CBR).

Monte carlo simulation of CBR! check it out!Under CBR (Simon 2004; Simon, Pham, Quang & Holyoak 2001; Carlson & Russo 2001), the factfinder’s motivation to find “coherence” in the trial proof creates a looping feedback effect. 

Once the factfinder forms the perception that the accumulated weight of the evidence supports one side, he begins to inflate or discount the weight of successive items of proof as necessary to conform them to that position.  He also turns around and revisits already-considered items of proof and reweights them to make sure they fit that position, too. 

His reward is an exaggerated degree of confidence in the correctness of that outcome—and thus the piece of mind that comes from never ever having to worry that maybe, just maybe he got the wrong answer.

The practical consequences are two.  First, by virtue of the exaggerated certainty the factfinder has in the result, he will sometimes rule in favor of a party that hasn’t carried its burden under a heightened standard of proof like, say, “beyond a reasonable doubt,” which reflects the law’s aversion to “Type 1” errors when citizens’ liberty is at stake.

Second, what position the factfinder comes to be convinced is right will be arbitrarily sensitive to the order of proof.  The same strong piece of evidence that a factfinder dismisses as inconsistent with what she is now committed to believing is true could have triggered a “likelihood ratio” cascade” in exactly the opposite direction had that item of proof appeared “sooner”-- in which case the confidence it instilled in its proponent's case would have infected the factfinder's evaluation of all the remaining items of proof.

If you hung around after class last time for the “extra credit”/“optional” discussion, I used a computer simulation to illustrate these chaotic effects, and to show why we should expect the accuracy-eviserating consequences of them to be visited disproportionately on innocent defendants in criminal proceedings.

This is definitely the sort of insult to rational-truth-seeking that BCCM was designed to rectify!

But guess what?

It can’t! The threat CBR poses to accuracy is one the BCCM conception of “rules of evidence” can’t possibly couneract!

As I explained in part 1, BCCM consists of three basic elements:

  1. Rule 401, understood as a presumption that evidence with LR ≠ 1 is admissible (Lempert 1977);

  2. a conception of “unfair prejudice” under Rule 403 that identifies it as the tendency of a piece of relevant evidence to induce a flesh-and-blood factfinder to assign incorrect LRs to it or other items of proof (Lempert 1977); and
       
  3. a strategy for Rule 403 weighing that directs the court to exclude “relevant” evidence when the tendency it has to induce the factfinder to assign the wrong LR to that or other pieces of evidence diminishes accurate assessment of the trial proof to a greater extent than constraining the factfinder to effectively treat the evidence in question as having no weight at all, or LR = 1 (Kahan 2010).

The problem is that CBR injects this “marginal probative value vs. marginal prejudice” apparatus with a form of self-contradiction, both logical and practical.

There isn’t normally any such contradiction. 

Imagine, e.g., that a court was worried that evidence of a product redesign intended to avoid a harmful malfunction might trigger “hindsight bias,” which consists in the tendency to inflate the LRs associated with items of proof that bear on how readily one might have been able to predict the need for and utility of such a design ex ante (Kamin & Rachlinski 1995).  (Such evidence is in theory—but not in practice— “categorically excluded” under Rule 407, when the correction was made after the injury to the plaintiff; but in any case, Rule 407 wouldn’t apply, only Rule 403 would, if the change in product design were made after injuries to third parties but before the plaintiff herself was injured by the original product—even though the same “hindsight bias” risk would be presented).

“All” the judge has to do in that case is compare the marginal accuracy-diminishing impact of [1] giving no weight at all to the evidence (LR = 1) on the "facts of consequence"  it should otherwise have made "more probable" (e.g, the actual existence of alternative designs and their cost-effectiveness) and [2] the inflationary effect of admitting it on the LRs assigned to the evidence bearing on every other fact of consequence (e.g., what a reasonable manufacturer would have concluded about the level of risk and feasibility of alternative designs at the time the original product was designed).

The BCCM conception of 403 "marginal probity vs. marginal prejudice" balancing! A thoughtful person might wonder about the capacity of a judge to make that determination accurately, particularly because weighing the “marginal accuracy diminishing impact” associated with admission and with exclusion, respectively,  actually requires the judge to gauge the relative strength of all the remaining evidence in the case. See Old Chief v. U.S., 519 U.S. 127, 182-85 (1997).

But making such a determination is not, in theory at least, impossible.

What is is doing this same kind of analysis when the source of the “prejudice” is CBR.  When a judge uses BCCM to manage the impact of hindsight bias (or any other type of dynamic inimical to rational information-processing), “marginal probative value” and “marginal prejudice”—the quantities she must balance—are independent.

But when the bias the judge is trying to contain is CBR, “marginal probative value” and “marginal prejudice” are interdependent—and indeed positively correlated.

What triggers the “likelihood ratio cascade” that is characteristic of CBR as a cognitive bias is the correct LR the factfinder assigned whatever item of proof induced the factfinder to form the impression that one side’s position was stronger than the other’s. Indeed, the higher (or lower) the “true” LR of that item of proof, the more confident the facftinder will be in the position that evidence supports, and hence the more biased the factfinder will thereafter be in assessment of the weight due other pieces of evidence (or equivalently, the more indifferent she'll become to the risk of erring in the direction of that position (Scurich 2012)).

To put it plainly, CBR creates a war between the two foundational “rules of evidence”: the more relevant evidence is under Rule 401 the more unfairly prejudicial it becomes for purposes of Rule 403.  To stave off the effects of CBR on accurate factfinding, the court would have to exclude from the case the evidence most integral to reaching an accurate determination of the facts.

Maybe an illustration would be useful?

This is one case plucked from the sort of simulation that I ran yesterday:

It shows how, as a result of CBR, a case that was in fact a “dead heat” can transmute into one in which the factfinder forms a supremely confident judgment that the facts supporting one side’s case The sad result of trying to do BCCM 403 balancing here...are “true.”

The source of the problem, of course, is that the very “first” item of proof had LR = 25, initiating a “likelihood ratio cascade” as reflected in the discrepancy between the "true" LRs—tLRs—and "biased" perceived LRs—pLRs—for each subsequent item of proof.

A judge applying the BCCM conception of Rule 403 would thus recognize that "item of proof No. 1" is injecting a huge degree of “prejudice” into the case. She should thus exclude proof item No. 1, but only if she concludes that doing so will diminish the accuracy of the outcome less than preventing the factfinder from giving this highly probative piece of evidence any effect whatsoever.

When the judge engages in this balancing, she will in fact observe that the effect of excluding that evidence distorts the accuracy of the outcome just as much as admitting it does--but in the opposite direction. In this simulated case, assigning item No. 1 an LR = 1—the formal effect of excluding it—now induces the factfinder to conclude that the odds against that party’s position being true are 5.9x10^2:1, or that that there is effectively a 0% chance that that party’s case is well-founded.

That’s because the very next item of proof has LR = 0.04 (the inverse of LR = 25), and thus triggers a form of “rolling confirmation bias” that undervalues every subsequent item of proof.

So if the judge were to exclude item No. 1 b/c of its tendency to excite CBR, she’d now face the same issue confronts her again in ruling on a motion to exclude item No. 2.

And guess what? If she assesses the impact of excluding that super probative piece of evidence (one that favored one party’s position 25x more than the other’s), she’ll again find that the “accuracy diminishing impact” of doing so is as high as not excluding: the remaining evidence in the case is configured so that the factfinder is impelled to a super-confident conclusion in favor of the first party once more!

And so forth and so on.

As this illustration should remind you, CBR also has the effect of making outcomes arbitrarily sensitive to the order of proof. 

Imagine item 1 and item 2 had been “encountered” in the opposite “order” (whether by virtue of the point at which they were introduced at trial, the relative salience of them to the factfinder as he or she reflected on the proof as a whole, or the role that post-trial deliberations had in determining the sequence with which particular items of proof were evaluated). 

The factfinder in that case would indeed have formed just as confident a judgment--but one in support of the opposite party:

Again, the judge will be confronted with the question whether the very “first” item of proof—what was item No. 2  in the last version of this illustration—should be excluded under Rule 403. When she works this out, moreover, she’ll end up discovering that Again, 403 balancing is impossible here--it is self-contradictory!the consequence of excluding it is the same as was the consequence of excluding item No. 1—LR = 25—in our alternative-universe version of the case: a mirror-image degree of confidence on the factfinder's part about the strength of the opposing party’s case.  And so  on and so forth.

See what’s going on?

The only way for the judge to assure that this case gets decided “accurately” is to exclude every single piece of evidence from the trial, remitting the jury to its priors—1:1—which, by sheer accident, just happened to reflect the posterior odds a “rational factfinder” would have ended up with after fairly assigning each piece of evidence its “true” LR.

Not much point having a trial at all under those circumstances!

Of course, the evidence, when properly considered, might have more decisively supported one side or the other.  But what a more dynamic simulation--one that samples from all the various distributions of case strength one cares to imagine-- shows us is that there’s still no guarantee the factfinder would have formed an accurate impression of the strength of the evidence in that cirucmstance either.

To assure an accurate result in such a cse, the judge, under the BCCM conception of the rules of evidence, would still have been obliged to try to deflect the accuracy-vitiating impact of CBR away from the factfinder’s appraisal of the evidence by Rule 403 balancing. 

And the pieces of evidence that the judge would be required in such a case to exclude would be the ones most entitled to be given a high degree of weight by a rational factfinder!  The impact of doing so would be to skew consideration of the remainder of the evidence without offsetting exclusions of similarly highly relevant pieces of proof. . . . 

Again, no point in even having  a trial if that’s how things are going to work. The judge should just enter judgment for the party she thinks “deserves” to win.

There is of course no reason to believe a judge could “cognitively fine-tune” a case with the precision that this illustration envisions.  But all that means is that the best a real judge can ever do will always generate an outcome that we have less reason to be confident is “right” than we would have had had the judge just decided the stupid case herself on the basis of her own best judgment of the evidence.

Of course, why should we assume the judge herself could make an accurate assessment, or reasonably accurate one, of the trial proof?  Won’t she be influenced by CBR too—in a way that distorts her capacity to do the sort of “marginal probative value vs. marginal prejudice” weighing that the BCCM conception of Rule 403 imagines?

If you go down this route, then you again ought to conclude that “rules of evidence are impossible” even without contemplating the uniquely malicious propensities of CBR.  Because if this is how you see things (Schauer 2006), there will be just as much reason to think that the judge’s performance of such balancing will be affected by all the other forms of cognitive bias that she is trying to counteract by use of BCCM’s conception of Rule 403 balancing.

I think that anxiety is in fact extravagant—indeed silly.

There is plenty of evidence that judges, by virtue of professionalization, develop habits of mind that reasonably insulate them from one or another familiar form of cognitive bias when the judges are making in-domain decisions—i.e., engaging in the sort of reasoning they are supposed to as judges (Kahan, Hoffman, et al. in press; Guthrie, Rachlinksi & Wistrich 2007) .

That’s how professional judgment works generally!

But now that I’ve reminded you of this, maybe you can see what the “solution” is to the “impossibility” of the rules of evidence?

Even a jurist with exquisite professional judgment cannot conceivably perform the kind of “cognitive fine-tuning” ‘envisioned by the “rules of evidence” -- the whole enterprise is impossible.

But what makes such fine tuning necessary in the first place is the law’s use of  non-professional decisionmakers divorced from any of the kinds of insights and tools that professional legal truthseekers would actually use.

Jurors aren’t stupid.  They are equipped with all the forms of practical judgment that they need to be successful in their everyday lives.

What's stupid is to think that making reliable assessments of fact in the artificial environment of a courtroom advesarial proceeding is one of the things everday life equips them to do. 

Indeed, it's absurd to think that that environment is conducive to the accurate determination of facts by anyone.

A procedural mechanism that was suited for accurately determining the sorts of facts relevant to legal determinations would have to look different from anything we see in in everyday life, b/c making those sorts of determinations isn't something that everyday life requires.

No more than than having to practice medicine, repair foreign automobiles, or write publicly accessible accounts of relativity is (btw, happy birthday Die Feldgleichungen der Gravitation).

Ordinary, sensible people rely on professionals -- those who dedicate themselves to acquiring expert knowledge and corresponding forms of reasoning proficiency -- to perform specialized tasks like these.

The “rules of evidence” are impossible because the mechanism we rely on to determine the “truth” in legal proceedings—an adversary system with lay factfinders—is intrinsically flawed. 

No amount of fine-tuning by “rules of evidence” will  ever make that system capable of delivering the accurate determinations of their rights and obligations that citizens of an enlightened democratic state are entitled to.

We need to get rid of the current system of adjudication and replace it with a professionalized system that avails itself of everything we know about how the world works, including how human beings reason and how they can be trained to reason when doing  specialized tasks.

And we need to replace, too, the system of legal scholarship that generates the form of expertise that consists in being able to tell  soothing, tranquilizing, narcotizing just-so stories about how well suited the “adversary system” would be for truth-seeking with just a little bit  more "cognitive fine tuining" to be implemented through the rules of evidence.

That element of our legal culture is as antagonistic to the goal of truth-seeking as any the myriad defects of the adversary system itself. . . .

The end!

References

Guthrie, C., Rachlinski, J.J. & Wistrich, A.J. Blinking on the bench: How judges decide cases. Cornell Law Rev 93, 1-43 (2007).

Kahan, D.M. The Economics—Conventional, Behavioral, and Political—of "Subsequent Remedial Measures" Evidence. Columbia Law Rev 110, 1616-1653 (2010).

Kahan, D.M., Hoffman, D.A., Evans, D., Devins, N., Lucci, E.A. & Cheng, K. 'Ideology'or'Situation Sense'? An Experimental Investigation of Motivated Reasoning and Professional Judgment. U. Pa. L. Rev. 164 (in press).

Kahan, D.M. Laws of cognition and the cognition of law. Cognition 135, 56-60 (2015).

Kamin, K.A. & Rachlinski, J.J. Ex Post ≠ Ex Ante - Determining Liability in Hindsight. Law Human Behav19, 89-104 (1995).

Lempert, R.O. Modeling Relevance. Mich. L. Rev. 75, 1021-57 (1977).

Pennington, N. & Hastie, R. A Cognitive Theory of Juror Decision Making: The Story Model. Cardozo L. Rev. 13, 519-557 (1991).

Schauer, F. On the Supposed Jury-Dependence of Evidence Law. U. Pa. L. Rev. 155, 165-202 (2006).


Scurich, N. The Dynamics of Reasonable Doubt. (Ph.D. dissertation, University of Southern California, 2012). 

Simon, D. A Third View of the Black Box: Cognitive Coherence in Legal Decision Making. Univ. Chi. L.Rev. 71, 511-586 (2004).


Simon, D., Pham, L.B., E, Q.A. & Holyoak, K.J. The Emergence of Coherence over the Course of Decisionmaking. J. Experimental Psych. 27, 1250-1260 (2001).

Monday
Nov232015

Check out wild & crazy "coherence based reasoning"! Are rules of evidence "impossible"?, part 2 (another report from Law & Cognition seminar)m 

If you want to do BCCM, you definitely should draw lots of little diagrams like thisThis is part 2 in a 3-part series, the basic upshot of which is that “rules of evidence” are impossible.

A recap. Last time I outlined a conception of “the rules of evidence” I called the “Bayesian Cognitive Correction Model” or BCCM.  BCCM envisions judges using the rules to “cognitively fine-tune” trial proofs in the interest of simulating/stimulating jury fact-finding more consistent with a proper Bayesian assessment of all the evidence in a case. 

Cognitive dynamics like hindsight bias and identity-protective cognition can be conceptualized as inducing the factfinder to over- or undervalue evidence relative to its “true” weight—or likelihood ratio (LR).  Under Rule 403, Judges should thus exclude an admittedly “relevant” item of proof (Rule 401: LR ≠ 1) when the tendency of that item of proof to induce jurors to over- or undervalue of other items of proof (i.e., to assign them LRs that differ from 1 more than they actually do) impedes verdict accuracy more than constraining the factfinder to assign the item of proof in question no weight at all (LR = 1).

“Coherence based reasoning”—CBR—is one of the kinds of cognitive biases a judge would have to use the BCCM strategy to contain..  This part in the series describes CBR and the distinctive threat it poses to rational factfinding in adjudication.

Today's episode. CBR can be viewed as an information-processing dynamic rooted in aversion to residual uncertainty.

Good study!A factfinder, we can  imagine, might initiate her assessment of the evidence in a reasonably unbiased fashion, assigning modestly probative pieces of evidence more or less the likelihood ratios they are due.

But should she encounter a piece of evidence that is much more consistent with one party’s position, the resulting confidence in that party’s case (a state that ought to be only provisional, in a Bayesian sense) will dispose her to assign the next piece of evidence a likelihood ratio supportive of the same inference—viz., that that party’s position is “true.”  As a result, she’ll be all the more confident in the merit of that party’s case—and thus all the more motivated to adjust the weight assigned the next piece of evidence to fit her “provisional” assessment, and so forth and so on  (Carlson & Russo 2001). 

Once she has completed her evaluation of trial proof, moreover, she will be motivated to revisit earlier-considered pieces of evidence, readjusting the weight she assigned them so that they now fit with what has emerged as the more strongly supported position ( (Simon, Pham, Quang & Holyoak 2001; Holyoak & Simon; Pennington & Hastie 1991). When she concludes, she will necessarily have formed an inflated assessment of the probability of the facts that support the party whose “strong” piece of evidence initiated this “likelihood ratio cascade.”

What does this matter?

Well, to start, in the law, the party who bears the “burden of proof” will often be entitled to win only if she establishes the facts essential to her position to a heightened degree of certainty like “beyond a reasonable doubt.”  One practical consequence of the overconfidence associated with CBR, then, will be to induce the factfinder to decide in favor of a party whose evidence, if evaluated in an unbiased fashion, would not have satisfied the relevant proof standard (Simon 2004).  Indeed, one really cool set of experiments (Scurich 2012) suggests that "coherence based reasoning" effects might actually reflect a dissonance-avoidance mechanism that manifests itself in factfinders reducing the standard of proof after exposure to highly probative items of proof! 

But even more disconcertingly, CBR makes the outcome sensitive to the order in which critical pieces of evidence are considered (Carlson, Meloy & Russo 2006). 

A  piece of evidence that merits considerable weight might be assigned a likelihood ratio of  1 or < 1 if the factfinder considers it after having already assigned a low probability to the position it supports.  In that event, the evidence will do nothing to shake the factfinder’s confidence in the opposition position.

But had the factfinder considered that same piece of evidence “earlier”—before she had formed a confident estimation of the cumulative strength of the previously considered proof—she might well have given that piece of evidence the greater weight it was due. 

Once a BCCM practioner draws *this* diagram, though, she'll freak outIf that had happened, she would then have been motivated to assign subsequent pieces of proof likelihood ratios higher than they in fact merited. Likewise, to achieve a “coherent” view of the evidence as a whole, she would have been motivated to revisit and revise upward the weight assigned to earlier considered, equivocal items of proof.  The final result would thus have been a highly confident determination in exactly the opposite direction from the one she in fact reached.

This not the way things should work if one is engaged in Bayesian information processing—or at least any normatively defensible understanding of Bayesian information processing geared to reaching an accurate result!

Indeed, this is the sort of spectacle that BCCM directs the judge to preempt by the judicious use of Rule 403 to exclude evidence the “prejudicial” effect of which “outweighs” its “probative value.”

But it turns out that using the rules of evidence to neutralize CBR in that way is IMPOSSIBLE!

Why? I’ll explain that in Part 3!

# # #

But right now I’d like to have some more, “extra-credit”/“optional” fun w/ CBR! It turns out it is possible & very enlightening to create a simulation to model the accuracy-annihilating effects I described above.

Actually, I’m just going to model a “tame” version of CBR—what Carlson & Russo call “biased predecisional processing.” Basically, it’s the “rolling confirmation bias” of CBR without the “looping back” that occurs when the factfinder decides for good measure to reassess the more-or-less unbiased LRs she awarded to items of proof before she became confident enough to start distorting all the proof to fit one position. 

Imagine that a factfinder begins with the view that the “truth” is equally likely to reside in either party’s case—i.e., prior odds of 1:1. The case consists of eight “pieces” of evidence, four pro-prosecutor (likelihood ratio > 1) and four pro-defendant (likelihood ratio <1). 

The factfinder makes an unbiased assessment of the “first” piece of evidence she considers, and forms a revised assessment of the odds that reflects its “true” likelihood ratio.  As a result of CBR, however, her assessment of the likelihood ratio of the next piece of evidence—and every piece thereafter—will be biased by her resulting perception that one side’s case is in fact “stronger” than the other’s.

To operationalize this, we need to specify a “CBR factor” of some sort that reflects the disposition of the factfinder to adjust the likelihood ratios of successive pieces of proof up or down to match her evolving (and self-reinforcing!) perception of the strength disparity in the parties’  the party’s case.

Imagine the factfinder misestimates the likelihood ratio of all pieces evidence by a continuous amount that results in her over-valuing or under-valuing an item of proof by a factor of 2 at the point she becomes convinced that the odds in favor of one party’s position rather than the other’s position being “true” has reached 10:1.

What justifies selecting this particular “CBR factor”? Well, I suppose nothing, really, besides that it supplies a fairly tractable starting point for thinking critically about the practical upshot of CBR. 

But also, it’s cool to use this function b/c it reflects a “weight of the evidence” metric developed by Turing and Good to help them break the Enigma code! 

For Turing and Good, a piece of evidence with a likelihood ratio of 10 was judged to have a weight of “1 ban.” They referred to a piece of proof that had a likelihood ratio 1/10 that big as a “deci-ban”—and were motivated to use that as the fundamental unit of evidentiary currency in their code-breaking system based on their seat-of-the-pants conjecture that a “deciban” was the smallest shift in the relative likelihoods of two hypotheses that human beings could plausibly perceive (Good 1985). 

So with this “CBR factor,” I am effectively imputing to the factfinder a disposition to “add to”  (or subtract from) an item of proof one “deciban”—the smallest humanly discernable “evidentiary weight,” in Turing and Good’s opinion—for every 1-unit increase (1:1 to 2:1; 2:1 to 3:1, etc.) or (decrease--1:1 to 1:2; 1:3 to 1:4) in the “odds” of that party’s position being true.

And this figure illustrates how this distorting potential can be affected by CBR generally:

In the “unbiased” table, “prior” reflects the factfinder’s current estimate of the probability of the “prosecutor’s” position being true, and “post odds” the revised estimate based on the weight of the current “item” of proof, which is assigned the likelihood ratio indicated in the “LT” column.  The “post %” column transforms the revised estimate of the probability of “guilt” into a percentage. 

I’ve selected an equal number of pro-prosecution (LR >1) and pro-defense (LR<1) items of proof, and arranged them so they are perfectly offsetting—resulting in a final estimate of guilt of 1:1 or 50%.

In the “coherence based reasoning” table, “tLR” is the “true likelihood ratio” and “pLR” the perceived likelihood ratio assigned the current item of proof. The latter is derived by applying the CBR factor to the former.  When the odds are 1:1, CBR is 1, resulting in no adjustment of the weight of the evidence. But as soon as the odds shift in one party’s favor, the CBR factor biases the assessment of the next item of proof accordingly.

As can be seen, the impact of CBR in this case is to push the factfinder to an inflated estimate of the strength of the prosecution’s  position being true, which the factfinder puts at 29:1 or 97% by the “end” of the case.

But things could have been otherwise. Consider:

I’ve now swapped the “order” of proof items “4” and “8,” respectively.  That doesn't make any difference, of course, if one is "processing" the evidence they way a Bayesian would; but it does if one is CBRing.

The reason is that the factfinder now “encounters” the defendant’s strongest item of proof -- LR = 0.1—earlier than the prosecution’s strongest—LR = 10.0.

Indeed, it was precisely because the factfinder encountered the prosecutor’s best item of proof “early” in the previous case that she was launched into a self-reinforcing spiral of overvaluation that made her convinced that a dead-heat case was a runaway winner for the prosecutor.

The effect when the proof is reordered this way is exactly the opposite: a devaluation cascade that convinces the factfinder that the odds in favor of the prosecutor’s case are infinitesimally small!

These illustrations are static, and based on “pieces” of evidence with stipulated LRs “considered” in a specified order (one that could reflect the happenstance of when particular pieces register in the mind of the factfinder, or are featured in post-trial deliberations, as well as when they are “introduced” into evidence at trial—who the hell knows!).

But we can construct a simulation that randomizes those values in order to get a better feel for the potentially chaotic effect that CBR injects into evidence assessments. 

The simulation constructs trial proofs for 100 criminal cases, each consisting of eight pieces of evidence. Half of the 800 pieces of evidence reflect LRs drawn randomly from a uniform distribution between 0.05 and 0.95; these are “pro-defense” pieces of evidence. Half reflect LRs drawn randomly from a uniform distribution between 1.05 and 20. They are “pro-prosecution” pieces.

We can then compare the “true” strength of the evidence in the 100 cases —the probability of guilt determined by Bayesian weighting of each one’s eight pieces of evidence—to the “biased” assessment generated when the likelihood ratios for each piece of evidence are adjusted in a manner consistent with CBR.

This figure compares the relative distribution of outcomes in the 100 cases:

 

As one would expect, a factfinder whose evaluation is influenced by CBR will encounter many fewer “close” cases than will one that engages in unbiased Bayesian updating.

This tendency to form overconfident judgments will, in turn, affect the accuracy of case outcomes.  Let’s assume, consistent with the “beyond a reasonable doubt” standard, that the prosecution is entitled to prevail only when the probability of its case being “true” is ≥ 0.95.  In that case, we are likely to see this sort of divergence between outcomes informed by rational information processing and outcomes informed by CBR:

 

The overall “error rate” is “only” about 0.16.  But there are 7x as many incorrect convictions as incorrect acquittals.  The "false conviction" rate is 0.21, wheras the "false acquittal" rate is 0.04....

The reason for the asymmetry between false convictions and false acquittals is pretty straightforward. In the CBR-influenced cases, there are a substantial number of “close” cases that factfinder concluded “strongly” supported one side or the other. Which side—prosecution or defendant—got the benefit of this overconfidence is roughly equally divided.  However, a defendant is no less entitled to win when the factfinder assesses the strength of the evidence to be 0.5 or 0.6 than when the factfinder assesses the strength of the evidence as 0.05 or 0.06.  Accordingly, in all the genuinely “close” cases in which CBR induced the factfinder to form an overstated sense of confidence in the weakness of the prosecution’s case, the resulting judgment of “acquittal” was still the correct one.  But by the same token, the result was incorrect in every close case in which CBR induced the factfinder to form an exaggerated sense of confidence in the strength of the prosecution’s case.  The proportion of cases, in sum, in which CBR can generate a “wrong” answer is much higher in ones that defendants deserve to win than in ones in which the prosecution does.

This feature of the model is an artifact of the strong “Type 1” error bias of the “beyond a reasonable doubt” standard.  The “preponderance of the evidence” standard, in contrast, is theoretically neutral between “Type 1” and “Type 2” errors.  Accordingly, were we treat the simulated cases as “civil” rather than “criminal” ones, the false “liability” outcomes and false “no liability” ones would be closer to the overall error rate of 16%.

Okay, I did this simulation once for 100 cases.  But let’s do it 1,000 times for 100 cases—so that we have a full-blown Monte Carlo simulation of the resplendent CBR at work!

These are the kernel distributions for the “accurate outcome” “false acquittal,” and “false conviction” rates over 1000 trials of 100 cases each:

Okay—see you later!

Refs

Carlson, K.A., Meloy, M.G. & Russo, J.E. Leader‐driven primacy: using attribute order to affect consumer choice. Journal of Consumer Research 32, 513-518 (2006).

Carlson, K.A. & Russo, J.E. Biased interpretation of evidence by mock jurors. Journal of Experimental Psychology: Applied 7, 91-103 (2001)

I.J. Good, Weight of Evidence: A Brief Survey, in Bayesian Statistics 2: Proceedings of the Second Valencia International Meeting (J.M. Bernardo, et al. eds., 1985).

Keith J. Holyoak & Dan Simon, Bidirectional Reasoning in Decision Making by Constraint Satisfaction,  J. Experimental Psych. 128, 3-31 (1999).

Kahan, D.M. Laws of cognition and the cognition of law. Cognition 135, 56-60 (2015). 

Pennington, N. & Hastie, R. A Cognitive Theory of Juror Decision Making: The Story Model. Cardozo L. Rev. 13, 519-557 (1991).


Simon, D. A Third View of the Black Box: Cognitive Coherence in Legal Decision Making. Univ. Chi. L.Rev. 71, 511-586 (2004).

Scurich, N. The Dynamics of Reasonable Doubt. (Ph.D. dissertation, University of Southern California, 2012). 

Simon, D., Pham, L.B., E, Q.A. & Holyoak, K.J. The Emergence of Coherence over the Course of Decisionmaking. J. Experimental Psych. 27, 1250-1260 (2001).

CBR ... frankenstein's monster of law & psychology...


 

Sunday
Nov222015

Report from "Law & Cognition" class: Are “rules of evidence impossible”? Part 1 

Well, I didn't do a good job of sharing the to & fro of this semester's Law & Cognition seminar w/ the 14 billion of you who signed up to take the coure on-line. I'm happy to refund your enrollment fees--I actually parleyed them into a sum 10^3 x as large by betting incredulous behavioral economists that P(H|HHH) < P(H) when sampling from finite sequences w/o replacement-- but stay tuned & I'll try to fill you in over time...

If you’re a Bayesian, you’ll easily get how the Federal Rules of Evidence work. 

But if you accept that “coherence based reasoning” characterizes juries’ assessments of facts (Simon, Pham, Quang & Holyoak 2001; Carlson & Russo 2001), you’ll likely conclude that administering the Rules of of Evidence is impossible.

Or so it seems to me.  I’ll explain but it will take some time—about 3 posts’ worth.

The "Rules of Evidence Impossibility Proof"--Paaaaaaart 1!

There are really only two major rules of evidence. There are a whole bunch of others but they are just variations on a theme.

The first is Rule 401, which states that evidence is “relevant” (and hence presumptively admissible under Rule 402) if it “has any tendency to make a fact  [of consequence to the litigation] more or less probable” in the assessment of a reasonable factfinder.

As Richard Lempert observed (1977) in his classic paper Modeling Relevance, Rule 401 bears a natural Bayesian interpretation.

The “likelihood ratio” rendering of Bayes’s Theorem—Posterior odds = Prior odds x Likelihood Ratio—says that one should update one’s existing or “prior” assessment of the probability of some hypothesis (expressed in odds) by a factor that reflects how much more consistent the new information is with that hypothesis than with some rival hypothesis.  If this factor—the likelihood ratio—is greater than one, the probability of the hypothesis increases; if it is less than one, it decreases.

Accordingly, by defining as “relevant” any evidence that gives us reason to treat a “fact of consequence” as “more or less probable,” Rule 401 indicates that evidence should be treated as relevant (and thus presumptively admissible) so long as it has a likelihood ratio different from 1—the factor by which one should revise one’s prior odds when new evidence is equally consistent with the hypothesis and with its negation.

Simple!

Second is Rule 403, which states that “relevant evidence” should be excluded if its “probative value is substantially outweighed by . . . unfair prejudice.”  Evidence is understood to be “unfairly prejudicial” when (the Advisory Committee Notes tell us) it has a “tendency to suggest decision on an improper basis.” 

There’s a natural Bayesian rendering of this concept, too: because the proper basis for decision reflects the updating of one’s priors by a factor equal to the product of the likelihood ratios associated with all the (independent) items of proof, evidence is prejudicial when it induces the factfinder to weight items of proof inconsistent with their true likelihood ratios

Lempert crica 1977 (outside Studio 54, during break from forensic science investigation of then-still unsolved Son of Sam killing spree)An example would be evidence that excites a conscious intention—born perhaps of animus, or alternatively of sympathy—to reach a particular result regardless of the Bayesian import of the proof in the case.

More interestingly, a piece of evidence might be “unfairly prejudicial” if it triggers some unconscious bias that skews the assignment of the likelihood ratio to that or another piece of evidence (Gold 1982).

E.g., it is sometimes said (I think without much basis) that jurors “overvalue” evidence of character traits—that is, that they assign to a party’s disposition a likelihood ratio, or degree of weight, incommensurate with what it is actually due when assessing the probability that the party acted in a manner that reflected such a disposition on a particular occasion (see Fed. R. Evid. 404).

Or the “unfairly prejudicial effect” might consist in the tendency of evidence to excite cognitive dynamics that bias the weight assigned other pieces of evidence (or all of it).  Evidence that an accident occurred, e.g., might trigger  “hindsight bias,” causing the factfinder to assign more weight than is warranted to evidence that bears on how readily that accident could have been foreseen before its occurrence (Kaman & Rachlinski 1995).

By the same token, evidence that excites “identity-protective cognition” might unconsciously motivate a factfinder to selectively credit or dismiss (i.e., opportunistically adjust the likelihood ratio of) all the evidence in the case in a manner geared to reaching an outcome that affirms rather than denigrates the factfinder’s cultural identity (Kahan 2015).

Rule 403 directs the judge to weigh probity and prejudice.

Again, there’s a Bayesian rendering: a court should exclude a “relevant” item of proof as “unfairly prejudicial” when the marginal distortion of accuracy associated with the incorrect likelihood ratio that admitting it will induce the factfinder to assign to that or any other items of proof is bigger than the marginal distortion of accuracy associated with constraining the factfinder to assign that item of proof a likelihood ratio of 1, which is the practical effect of excluding it (Kahan 2010).  

click me & behold what it looks like to do Bayesian analysis of evidence rules *after* emerging from a night of partying at Studio 54 circa 1977!If you work this out, you’ll see (perhaps counterintuitively, perhaps not!) that courts should be much more reluctant to exclude evidence on Rule 403 grounds in otherwise close cases. As cases become progressively closer, the risk of error associated with under-valuing (by failing to consider) relevant evidence increases faster than the risk of error associated with over-valuing that or other pieces of evidence: from the point of view of deciding a case, being “ovderconfident” is harmless so long as one gets the right result. Likewise the risk that admitting "prejudicial" evidence will result in error increases more rapidly as the remaining proof becomes weaker: that's the situation in which a facfinder is most likely to decide for a party that she wouldn't have but for her biased over-valuing of that item of proof or others (Kahan 2010).

For an alternative analysis, consider Friedman (2003). I think he's wrong but for sure maybe I am! You tell me!

The point is how cool it is-- how much structure & discipline it adds to the analysis-- to conceptualize Rules of Evidence as an instrument for closing the gap between what a normatively desirable Bayesian assessment of trial proof would yield and what a psycholigically realistic account of human information processing tells us to expect (someday, of coures, we'll replace human legal decisionmakers with AI evidence-rule robots! but we aren't quite there yet ...).

Let's call this approach to understanding/perfecing evidence law the "Bayesian Cognitive Correction Model" (BCCM).

But is BCCM itself psychologically realistic?  

Is it plausible to to think a court can reliably “maximize” the accuracy of adjudication by this sort of cognitive fine-tuning of the trial proof?

Not if you think that coherence-based reasoning  (CBR) is one of the reasoning deficiencies that a court needs to anticipate and offset by this strategy.

I’ll describe how CBR works in part 2 of this series—and then get to the “impossibility proof” in part 3!

References

Carlson, K.A. & Russo, J.E. Biased interpretation of evidence by mock jurors. Journal of Experimental Psychology: Applied 7, 91-103 (2001).

Friedman, R.D. Minimizing the Jury Over-valuation Concern. Mich. State L. Rev. 2003, 967-986 (2003).

Gold, V.J. Federal Rule of Evidence 403: Observations on the Nature of Unfairly Prejudicial Evidence. Wash. L. Rev. 58, 497 (1982).


Kahan, D.M. The Economics—Conventional, Behavioral, and Political—of "Subsequent Remedial Measures" Evidence. Columbia Law Rev 110, 1616-1653 (2010).

Kahan, D.M. Laws of cognition and the cognition of law. Cognition 135, 56-60 (2015).

Kamin, K.A. & Rachlinski, J.J. Ex Post ≠ Ex Ante - Determining Liability in Hindsight. Law Human Behav 19, 89-104 (1995).

Lempert, R.O. Modeling Relevance. Mich. L. Rev. 75, 1021-57 (1977).

Simon, D., Pham, L.B., E, Q.A. & Holyoak, K.J. The Emergence of Coherence over the Course of Decisionmaking. J. Experimental Psych. 27, 1250-1260 (2001).

Friday
Nov202015

My remote post-it notes for my HLS African-American teachers

Monday
Nov162015

ISO: A reliable & valid public "science literacy" measure

From revision to “Ordinary Science Intelligence”: A Science-Comprehension Measure for Study of Risk and Science Communication, with Notes on Evolution and Climate Change . . . .

 2. What and why?

The validity of any science-comprehension instrument must be evaluated in relation to its purpose. The quality of the decisions ordinary individuals make in myriad ordinary roles—from consumer to business owner or employee, from parent to citizen—will depend on their ability to recognize and give proper effect to all manner of valid scientific information (Dewey 2010; Baron 1993). It is variance in this form of ordinary science intelligence—and not variance in the forms or levels of comprehension distinctive of trained scientists, or the aptitudes of prospective science students—that OSI_2.0 is intended to measure.

This capacity will certainly entail knowledge of certain basic scientific facts or principles. But it will demand as well various forms of mental acuity essential to the acquisition and effective use of additional scientific information. A public science-comprehension instrument cannot be expected to discern proficiency in any one of these reasoning skills with the precision of an instrument dedicated specifically to measuring that particular form of cognition. It must be capable, however, of assessing the facility with which these skills and dispositions are used in combination to enable individuals to successfully incorporate valid scientific knowledge into their everyday decisions.

A valid and reliable measure of such a disposition could be expected to contribute to the advancement of knowledge in numerous ways. For one thing, it would facilitate evaluation of science education across societies and within particular ones over time (National Science Board 2014). It would also enable scholars of public risk perception and science communication to more confidently test competing conjectures about the relevance of public science comprehension to variance in—indeed, persistent conflict over—contested risks, such as climate change (Hamilton 2011; Hamilton, Cutler & Shaefer 2012), and controversial science issues such as human evolution (Miller, Scott & Okamoto 2006). Such a measure would also promote ongoing examination of how science comprehension influences public attitudes toward science more generally, including confidence in scientific institutions and support for governmental funding of basic science research (e.g., Gauchat 2011; Allum, Sturgis, Tabourazi, & Brunton-Smith 2008). These results, in turn, would enable more critical assessments of the sorts of science competencies that are genuinely essential to successful everyday decisionmaking in various domains—personal, professional, and civic (Toumey 2011).

In fact, it has long been recognized that a valid and reliable public science-comprhension instrument would secure all of these benefits. The motivation for the research reported in this paper is widespread doubt among scholars that prevailing measures of public “science literacy” possess the properties of reliability and validity necessary to attain these ends (e.g., Stocklmayer & Bryant 2012; Roos 2012; Guterbock et al. 2011; Calvo & Pardo 2004). OSI_2.0 was developed to remedy these defects.

The goal of this paper is not only to apprise researchers of OSI_2.0’s desirable characteristics in relation to other measures typically featured in studies of risk and science communication. It is also to stimulate these researchers and others to adapt and refine OSI_2.0, or simply devise a superior alternative from scratch, so that researchers studying how risk perception and science communication interact with science comprehension can ultimately obtain the benefit of a scale more distinctively suited to their substantive interests than are existing ones.

References 

Allum, N., Sturgis, P., Tabourazi, D. & Brunton-Smith, I. Science knowledge and attitudes across cultures: a meta-analysis. Public Understanding of Science 17, 35-54 (2008).

Baron, J. Why Teach Thinking? An Essay. Applied Psychology 42, 191-214 (1993).

Dewey, J. Science as Subject-matter and as Method. Science 31, 121-127 (1910).

Gauchat, G. The cultural authority of science: Public trust and acceptance of organized science. Public Understanding of Science 20, 751-770 (2011).

Hamilton, L.C. Education, politics and opinions about climate change evidence for interaction effects. Climatic Change 104, 231-242 (2011).

Hamilton, L.C., Cutler, M.J. & Schaefer, A. Public knowledge and concern about polar-region warming. Polar Geography 35, 155-168 (2012).

Miller, J.D., Scott, E.C. & Okamoto, S. Public acceptance of evolution. Science 313, 765 (2006).

National Science Board. Science and Engineering Indicators, 2014 (National Science Foundation, Arlington, Va., 2010).

Pardo, R. & Calvo, F. The Cognitive Dimension of Public Perceptions of Science: Methodological Issues. Public Understanding of Science 13, 203-227 (2004).

Roos, J.M. Measuring science or religion? A measurement analysis of the National Science Foundation sponsored science literacy scale 2006–2010. Public Understanding of Science (2012).

Stocklmayer, S. M., & Bryant, C. Science and the Public—What should people know?, International Journal of Science Education, Part B, 2(1), 81-101 (2012)

Thursday
Nov122015

The "living shorelines" science communication problem: individual cognition situated in collective action

Extending its Southeast Florida Evidence-based Science Communication Initiative, CCP is embarking on a field-research project on "living shoreline" alternatives/supplements to "hardened armoring" strategies for offsetting the risks of of rising sea levels. The interesting thing about the project (or one of the billion interesting things about it) is that it features the interaction of knowledge and expectations.  

"Living shorelines" offer the potential for considerable collective benefits.  But individuals who learn of these potential benefits will necessarily recognize that the benefit they can expect to realize from taking or supporting action to implement this strategy is highly contingent on the intention of others to do the same. Accordingly, "solving" this "communication problem" necessarily involves structuring acommunication process in which parties learn simultaneously about both the utility of "living shorelines" and the intentions of other parties to contribute to implementing them.

The project thus highlights one of the central features of the "science of science communication" as a "new political science": its focus not only on promoting clarity of exposition and public comprehension but on attending as well to the myriad social processes by which members of the public come to know what's known by science and give it due effect in their lives.

Elevating “Living Shorelines” with Evidence-based Science Communication

1. Overview. The urgency of substantial public investments to offset the impact of rising sea levels associated with climate change is no longer in a matter of contention for coastal communities in Florida.  What remains uncertain is only the precise form of such undertakings.

This project will use evidence-based science communication to enrich public engagement with “living shoreline” alternatives (e.g., mangrove habitats, oyster beds, dune and wetland restoration)  for “hardened armoring” strategies (concrete seawalls, bunkers, etc.). “Living shorelines” offer comparable protection while avoid negative environmental effects--beachfront erosion, the loss of shoreline vegetation, resulting disruption of natural ecosystems, and visual blight—that themselves diminish community wellbeing.  The prospect that  communities in Southern Florida will make optimal use of “living shorelines,” however, depends on cultivating awareness of their myriad benefits among a diffuse set of interlocking public constituencies.  The aim of the proposed initiative is to generate the forms of information and community interactions necessary to enable “living shorelines” to assume the profile that it should in ongoing democratic deliberations over local climate adaptation. . . .

3. Raising the profile of “living shorelines.” There are numerous living shoreline” alternatives to hardened armoring strategies. Mangroves—densely clumped shrubs of thick green shoots atop nests of partially submerged roots—have traditionally combatted the impact of rising sea levels by countering erosion and dissipating storm surges. Coral reefs furnish similar protection. Sand dunes provide a natural fortification, while wetland restorations create a buffer. There are also many “hybrid” strategies such as rutted walls congenial to vegetation, and rock sills supportive of oyster beds.  These options, too, reduce reliance on the forms of hardened armoring that impose the greatest ecological costs.

As a policy option, however, living shoreline strategies face two disadvantages. The first is the longer time horizon for return on investment. A concrete seawall begins to generate benefits immediately, while natural-shoreline alternatives attain maximum benefit only after a period of years.  This delay in value is ultimately offset by the need to augment or replace hardened armoring as sea levels continue to rise; the protective capacity of natural barriers “rise” naturally along with sea-level and thus have a longer lifespan. However, the natural bias of popular political processes to value short over long-term gains and to excessively discount future costs handicaps “living shorelines” relative to its competitors.

The second is the diffuse benefits that living shorelines confer. Obviously, they protect coastal property residents. But they also confer value on a wide-range of third parties—individuals who enjoy natural beach habitats, but also businesses such as tourism and the fishing that depended on the ecological systems disrupted by armoring. 

In addition, the value of coastal property will often be higher in a community that makes extensive use of “living shorelines”, which tend to be more aesthetically pleasing then concrete barriers and bunkers.  But the individual property owner who invests in erecting and maintaining a living shoreline alternative won’t enjoy this benefit unless other owners in his or her residential area take the same action.  As with any public good, the private incentive to contribute will lag behind the social benefit.

The remedy for overcoming these two challenges is to simultaneously widen and target public appreciation of the benefits of  natural shoreline protections. The constituencies that would enjoy the externalized benefits of natural shoreline strategies—particularly the commercial ones—must be alerted to the stake they have in the selection of this form of coastal property protection.  Likewise, business interests, including construction firms, must furnished with a vivid appreciation of the benefits they could realize by servicing the demand for “living shorelines” protections, including both their creation and their maintenance.  Recognizing that local coastal property owners lack adequate incentives to invest in natural coastline protections on their own, these interests could be expected to undertake the burden of advocating supplemental public investments. The voice of these groups in public deliberations will help to offset the natural tendency of democratic processes to overvalue short- over longer-term interests—as would the participation of financial institutions and other actors that naturally discount the current value of community assets and business appropriately based on the anticipated need for future infrastructure support. The prospect of public subsidies can in turn be used to reinforce the incentives of local property owners, whose consciousness of the prospect of widespread use of natural shoreline protections will supply them with motivation to support public provisioning and to make the necessary personal investments necessary to implement this form of climate adaptation.

The project is geared toward stimulating these processes of public engagement.  By furnishing the various constituencies involved with the forms of information most suited to enabling their recognition of the benefits of natural shoreline strategies, the project will elevate the profile of this strategy and put it on an equal footing with hardened armoring in public deliberations aimed at identifying the best, science-informed policies for protecting communities from rising sea levels and other climate impacts.

4.  Evidence-based science communication and living shorelines. . . . .

[T]he challenge of elevating the profile of “living shorelines” features the same core structural elements that have been the focus of CCP’s science-communication support research on behalf of Southeast Florida Regional Climate Compact. Science communication, this work suggests, should be guided by a “multi-public” model.  First are proximate information evaluators: typically government decisionmakers, their primary focus is on the content of policy-relevant science. Next are intermediate evaluators, who consist largely of organized nongovernmental groups, including ones representing formal and informal networks of local businesses, local property owners, and environmental and conservation organizations: their focus is primarily on how proposed policies affect their distinctive goals and interests. Finally there are remote evaluators: ordinary citizens, whose engagement with policy deliberations is only intermittent and who use heuristic strategies to assure themselves of the validity of the science that informs proposed policies.

The current project will use this model to guide development of communication materials suited to the public constituencies whose engagement is essential to elevating the deliberative profile of “living shorelines.”  Proximate evaluators here comprise the government officials—mainly county land use staff but also elected municipal officials—and also homeowners, including homeowner associations, in a position to make personal investments in “living shorelines” protections. With respect to these information consumers, the project would focus on maximizing comprehension  of the information embodied in TNC’s computer simulations. Existing research identifies systematic differences in how people engage quantitative information. Experimental studies would be conducted to fashion graphic presentation modes that anticipate these diverse information-processing styles.

The intermediate evaluators in this context consists of the wide range of private groups that stand to benefit indirectly from significant investment in “living shorelines.”  These groups will be furnished information in structured deliberations that conform to validated protocols for promoting open-minded engagement with scientific information. 

These sessions, moreover, will themselves be used to generate materials that can be used to develop information appropriate for remote evaluators. Research conducted by CCP in field-based science communication initiatives suggests that the most important cue that ordinary citizens use to assess major policy proposals is the position of other private citizens they view as social competent and informed and whose basic outlooks they share.  In particular, the attitude that these individuals evince through their words and actions vouches for the validity of policy-relevant science that ordinary members of the public do not have either the time or expertise to assess on their own.

From experience in previous evidence-based science communication projects, CCP has learned that interactions taking the form of the proposed structured deliberations among intermediate evaluators furnish a rich source of material for fashioning materials that can be used to perform this vouching function.  The participants in such deliberations are highly likely to possess the characteristics and backgrounds associated with the socially competent, knowledgeable sources whose vouching for policy-relevant science helps orient ordinary citizens.

Moreover, the participants in such sessions are likely to be socially diverse.  This feature of such sessions is highly desirable because the identity of individuals who perform this critical vouching function, work in and outside the lab confirms, varies across diverse cultural subcommunities. In addition, being able to see individuals who perform this role within one community deliberating constructively with their counterparts in others assures ordinary citizens from all of these communities that positions on the issue at hand are not associated with membership in competing cultural groups. This effect, CCP field research suggests, has been instrumental to the success of the diverse member communities of the Southeast Florida Climate Compact in protecting their deliberations from the influences that polarize citizens generally over climate change science.

Accordingly, using methods developed in earlier field work, CCP will use the intermediate evaluator deliberations to develop video and other materials that can be used to test how members of the public react as they learn about “living shorelines” as a policy option for their communities. The results of such tests can then be incorporated into communication materials geared to generating positive, self-reinforcing forms of interactions among the members of those communities.

Finally, evidence of the positive interactions of all these groups can be used to help form the state of shared expectations necessary to assure that “living shorelines” receive attention in public deliberation commensurate with the value they can confer on the well-being of communities that use this option. . . .

Wednesday
Nov112015

CCP Lab Meeting # 9073 ... 

Tuesday
Nov102015

Another day, another lecture

This one at Annenberg Public Policy Center last week, to discuss progress in one of our collaborative initiatives: evidence-based science documentary filmmaking.

We got to talk about the Pakistani Dr & Kentucky Farmer, of course, and also how much Krista would like a cool documentary on evolution.

Slides here.

Monday
Nov092015

Making sense of the " 'hot hand fallacy' fallacy," part 1

It never fails! My own best efforts (here & here) to explain the startling and increasingly notorious paper by Miller & Sanjurjo have prompted the authors to step forward and try to restore the usual state of perfect comprehension enjoyed by the 14.3 billion regular readers of this blog. They have determined, in fact, that it will take three separate guest posts to undo the confusion, so apparently I've carried out my plan to a [GV]T. 

As cool as the result of the M&S paper is, I myself remain fascinated by what it tells us about cognition, particularly among those with exquisitely fine-tuned statistical intuitions.  How did the analytical error they uncovered in the classic "hot hand fallacy" studies remain undetected for some thirty years, and why does it continue to provoke stubborn resistance on the part of very very smart people??  To Miller & Sanjurjo's credit, they have happily and persistently shouldered the immense burden of explication necessary to break the grip of the pesky intuition that their result "just can't be right!"

 Joshua B. Miller & Adam Sanjurjo

Thanks for the invitation to post here Dan!

Here’s our plan for the upcoming 3 posts:

  1.  Today’s plan: A bit of the history of the hot hand fallacy, then clearly stating the bias we find, explaining why it invalidates the main conclusion of the original hot hand fallacy study (1985), and further, showing that correcting for the bias flips the conclusion of the original data, so that it now can be used as evidence supporting the existence of meaningfully large hot hand shooting.

  2. Next post: Provide a deeper understanding of how the bias emerges.

  3. Final post: Go deeper into potential implications for research on the hot hand effect, hot hand beliefs, and the gambler’s fallacy.

Part I

In the seminal hot hand fallacy paper, Gilovich, Vallone and Tversky (1985; “GVT”, also see the 1989 Tversky & Gilovich “Cold Facts” summary paper) set out to conduct a truly informative scientific test of hot hand shooting. After studying two types of in game shooting data, they conducted a controlled shooting study (experiment) with the Cornell University men’s and women’s basketball teams. This was an effective "...method for eliminating the effects of shot selection and defensive pressure" that were present as confounds in their analysis of game data (we will return to the issue of game data in a follow up post; for now click to the first page of Dixit & Nalebuff’s 1991 classic book “Thinking Strategically”, and this comment on Andrew Gelman’s blog).  While the common use of the term “hot hand” shooting is vague and complex, everybody agrees that it refers to a temporary elevation in a player’s ability, i.e. the probability of a successful shot.  Because hot state is unobservable to the researcher (perhaps not the player/teammate/coach!), we cannot simply measure a player’s probability of success in the hot state; we need an operational definition.  A natural idea is to take a streak of sufficient length as a good signal of whether or not a player is in the hot state, and define a player as having the hot hand if his/her probability of success is greater after a streak of successful shots (hits), than after a streak of unsuccessful shots (misses).  GVT designed a test for this.

Adam Sanjurjo enjoying snacks in green room before Oprah Winfrey show appearanceSuppose we wanted to test whether Stephen Curry has the hot hand; how would we apply GVT’s test to Curry?  The answer is that we would have Curry attempt 100 shots at locations from which he is expected to have a 50% chance of success (like a coin).  Next, we would calculate Curry’s field goal percentage on those shots that immediately follow a streak of successful shots (hits), and test whether it is bigger than his field goal percentage on those shots that immediately follow a streak of unsuccessful shots (misses); the larger the difference that we observe, the stronger the evidence of the hot hand.  GVT performed this test on the Cornell players, and found that this difference in field goal percentages was statistically significant for only one of the 26 players (two sample t-test), which is consistent with the chance variation that the coin model predicts.

Now, one can ask oneself: if Stephen Curry doesn’t get hot, that is, for each of his 100 shot attempts he has exactly a 50% chance of hitting his next shot, then what would I expect his field goal percentage to be when he is on a streak of three (or more) hits? Similarly, what would I expect his field goal percentage to be when he is on a streak of three (or more) misses?

Following GVT’s analysis, one can form two groups of shots:

Group “3hits”: all shots in which the previous three shots (or more) were a hit,

Group “3misses”: all shots in which the previous three shots (or more) were a miss,

M&S working paper (5000th printing; currently sold out)From here, it is natural to reason as follows: if Stephen Curry always has the same chance of success, then he is like a coin, so we can consider each group of shots as independent; after all, each shot has been assigned at random either to one of three groups: “3hits,” “3misses,” or neither.  So far this reasoning is correct.  Now, GVT (implicitly) took this intuitive reasoning one step further: because all shots, which are independent, have been assigned at random to each of the groups, we should expect the field goal percentages to be the same in each group.  This is the part that is wrong.

Where does this seemingly fine thinking go wrong?  The first clue that there is a problem is that the variable that is being used to assign shots to groups is also showing up as a response variable in the computation of the field goal percentage, though this does not fully explain the problem.  The key issue is that there is a bias in how shots are being selected for each group.  Let’s see this by first focusing on the “3hits” group. Under the assumptions of GVT’s statistical test, Stephen Curry has a 50% chance of success on each shot, i.e. he is like a coin: heads for hit, and tails for miss.  Now, suppose we plan on flipping a coin 100 times, then selecting at random among the flips that are immediately preceded by three consecutive heads, and finally checking to see if the flip we selected is a heads, or a tails. Now, before we flip, what is the probability that the flip we end up selecting is a heads?  The answer is that this probability is not 0.50, but 0.46!  Herein lies the selection bias.  The flips that are being selected for analysis are precisely Joshua Miller, in Las Vegas after winning $5 million from economists who accepted his challenge to bet against P(H|HHH) < P(H) when sampling from finite sequence of coin tossesthe flips that are immediately preceded by three consecutive heads.  Now, returning to the world of basketball shots, this way of selecting shots for analysis implies that for the “3hits” group, there would be a 0.46 chance that the shot we are selecting is a hit, and for the “3misses” group, there would be a 0.54 chance that the shot we are selecting is a hit.

Therefore, if Stephen Curry does not get hot, i.e. if he always has a 50% chance of success for the 100 shots we study, we should expect him to shoot 46% after a streak of three or more hits, and 54% after a streak of three or more misses.  This is the order of magnitude of the bias that was built into the original hot hand study, and this is the bias that is depicted in Figure 2 on page 13 of our new paper, and a simpler version of this figure is below. This bias is large in basketball terms: a difference of more than 8 percentage points is nearly the difference between the median NBA Three Point shooter, and the very best.   Another way to look at this bias is to imagine what would happen if we were to invite 100 players to participate in GVT’s experiment, with each player shooting from positions in which the chance of success on each shot were 50%.  For each player check to see if his/her field goal percentage after a streak of three or more hits is higher than his/her field goal percentage after a streak of three or more misses.  For how many players should we expect this to be true? Correct answer: 40 out of 100 players. 

This selection bias is large enough to invalidate the main conclusion of GVT's original study, without having to analyze any data.  However, beyond this “negative” message, there is also a way forward.  Namely, we can re-analyze the original Cornell dataset, but in a way invulnerable to the bias.  It turns out that when  we do this, we find considerable evidence of the hot hand in this data. First, if we look at Table 4 in GVT (page 307), we see that, on average, players shot around 3.5 percentage points better when on a hit streak of three or more shots, and that 64% of the players shot better when on a hit streak than when on a miss streak. While GVT do not directly analyze these summary averages, given our Adam Sanjurjo Hermida, professional tennis player currently ranked 624th in world. Very hot hand predicted by M&S sometime in April 2016knowledge of the bias, they are telling (in fact, you can do much more with Table 4; see Kenny LJ respond to his own question here).  With the correct analysis (described in the next post), there is statistically significant evidence of the hot hand in the original data set, and, as can be seen in Table 2 on page 23 of our new paper, the point estimate of the average hot hand effect size is large (further details in our “Cold Shower” paper here). If one adjusts for the bias, what one now finds is that: (1) hitting a streak of three or more shots in a row is associated with an expected 10 percentage points boost in a player’s field goal percentage, (2) 76% of players have a higher field goal percentage when on a hit vs. miss streak, (3) and 4 out of 26 players have a large enough effect to be individually significant by conventional statistical standards (p<.05), which itself is a statistically significant result on the number of significant effects, by conventional standards. 

In a later post, we will return to the details of GVT’s paper, and talk about the evidence for the hot hand found across other datasets. If you prefer not to wait, please take a look at our Cold Shower paper, and related comments on Gelman’s blog).

In the next installment, we will discuss the counter-intuitive probability problem that reveals the bias, and explain what is driving the selection bias there.  We will then discuss some common misconceptions about the nature of the selection bias, and some very interesting connections with classic probability paradoxes.

Sunday
Nov012015

Weekend update: talking it up & listening too

Reports on road shows:

1. Carnegie Melon PCR series:

Great event! Passionate, curious, excited audience eager to contribute to the project of fixing the science communication problem.

This is the future of the Liberal Republic of Science: a society filled with culturally diverse citizens whose common interest in enjoying the benefit of all the knowledge their way of life makes possible is secured by scientists, science communication professionals, educators, and public officials using and extending the "new political science" of science communication.

 

Slides here.

2. 10th Annual Conference on Empirical Legal Studies:

I did presentation on "'Ideology' or 'Situation Sense?'," the CCP study on interaction of cultural worldviews and legal reasoning in public, law students, lawyers & judges, respectively.  Lots of great feedback.

Slides here.

A small selection of other papers definitely worth taking look at (very frustrating element of a conference like this is having to choose between concurrent sessions featuring really interesting stuff):

Chen, Moskowitz & Shue, Decision-Making Under the Gambler's Fallacy: Evidence from Asylum Judges, Loan Officers, and Baseball Umpires
Thorley, Green et al., Please Recuse Yourself: A Field Experiment Exploring the Relationship between Campaign Donations and Judicial Recusal
MacDonald, Fagan & Geller, The Effects of Local Police Surges on Crime and Arrests in New York City
Ramsayer, Nuclear Power and the Mob: Extortion and Social Capital in Japan
Scurich, Jurors’ Presumption of Innocence: Impact on Cumulative Evidence Evaluation and Verdicts
Sommers, Perplexing Public Attitudes Toward Consent: Implications for Sex, Law, and Society
Robertson, 535 Felons? An Empirical Investigation into the Law of Political Corruption 
Baker & Malani, Do Judges Really Care About Law? Evidence from Circuit Split Data 

 

Wednesday
Oct282015

On the road *again*...

talk today at CMU, 5:30:

 

 

Monday
Oct262015

What's the deal w/ Norwegian public opinion on climate change?? What's the deal with ours?

Was just reading a really cool article, Aasen, M. The polarization of public concern about climate change in Norway., Climate Policy (2015), advance online publication.

Constructing Individualism and Egalitarian scales with items from Norwegian Gallup polls conducted between 2003-11, Aasen does find that both dispositions predict differences in concern w/ climate change -- less for former, more for latter.  

Climate change concern was measured with the single item ‘How concerned are you about climate change?’ The response categories were ‘Quite concerned’, ‘Very concerned’, ‘A little concerned’, and ‘Not at all concerned.'" Assuming, as seems certain!, that Norwegians have attitudes about climate change, it's pretty safe to expect a single item like this to tap into it in the same that the Industrial Strength Risk Perception Measure would.  Aasen likely handicapped her detection of the strength of the influences she measured, however, by dichotomizing this measure ("Quite concerned" & "very concerned" vs. "a little concerned" & "Not at all") rather than treating it as a 4 point ordinal one.

Aasen's "individualism" scale was apparently substantially more reliable than her "egalitarianism" one  (the α's are reported as "> 0.70" and "> 0.30," respectively).  But assuming the indicators have the requisite relationship with the underlying disposition, low reliability doesn't bias results; it just attenuates the strength of them.

So it's pretty cool to now see evidence of the same sorts of cultural divisions in Norway as we see in the US (Kahan et al. 2012), UK (Kahan et al. 2015), Australia (Guy, Kashima & O'Neill 2014), & Switzerland (Yi et al. 2015), etc.  Maybe Aasen will follow up by adapting the "cultural cognition worldview" scales for Norwegian sample!

But what really got my attention was the overall level of concern in the sample:

Yes, "individualism" and "Hierarchy" (the attitude opposite in valence to "egalitarianism") predict a steeper decline in concern after 2007, and obviously explain a lot more variance in 2011 than in 2003.

But look, first,  at how modest" concern" was even for most "egalitarian" and "communitarian" (opposite of individualistic) respondents; and, second, the universality of the decline in concern since 2007.

Hmmm.

The climate-concern item seems to be the international equivalent of a Gallup item that asks U.S. respondents "how worried" they are about "global warming" or "climate change" ("great deal," "fair amount," "only a little," or "not at all").  Here's what U.S. responses (combining the equivalent response categories) look like (with the period the overlaps w/ Aasen's data bounded by dotted lines):

 

You can see that the divide along "individualist-communitarian" and "egalitarian-hierarchy" lines in Norway is less extreme than the Democrat-Republican one in the U.S.  Actually, if we had data for the U.S. respondents' cultural worldviews, the greater degree of polarization in the U.S. would be shown to be even more substantial. 

But again, that's not as intriguing to me is what the data show about the relative levels of "concern"/"worry" in the two nations.  The U.S. population is not particularly "worried" on average, but apparently Norwegians are even less "concerned," as can be see by this composite graphic, which charts the corresponding sets of responses for both nations, respectively, in the years for which there are data (note: Aasen supplied me with the Norwegian means; this Figure supercedes a slightly but not materially different one reflecting estimates from the model presented in the paper):

The trends are very comparable, and maybe the question wording or some cross-cultural exchange rate in how respondents indicate their attitudes explains the gap.

But clearly (by this measure at least) Norway is not more concerned than the U.S., which according to common wisdom "leads the world in climate denial."  

Indeed, the segment of society most culturally predisposed to worry about climate change in Norway is no more concerned than the "average" American.

So what's going on in that country?!

Maybe we can entice Aasen into a guest post.  I've already offered her the standard MOP$50,000.00 fee (payable in future stock options in CCP, Inc.), but I'm confident she, like other guests, will waive the fee to affirm that enlarging human knowledge is their only motivation for being a scholar  (of course, there is still ambiguity, given the fame & celebrity endorsements, particularly in Macao, that comes with being a CPP Blog guest poster).

We'll see what she says!

But for meantime, this very interesting & cool paper supplies material for a fresh lesson about the dangers of "selecting on the dependent variable" in the science of science communication: If one tests one's theory of U.S. public opinion on climate change by considering only how well it "fits" the data in the U.S., then obviously one will be excluding the possibility of observing both comparable states of public opinion in societies where the asserted explanation ("balanced media norms," a creeping public "anti-science" sensibility, Republican brains, etc.) doesn't apply and divergent states of public opinion in societies in which the asserted explanation applies just as well (Shehata & Hopmann 2012).

REfs

Aasen, M. The polarization of public concern about climate change in Norway., Climate Policy (2015), advance online publication.

Guy, S., Kashima, Y., Walker, I. & O'Neill, S. Investigating the effects of knowledge and ideology on climate change beliefs. European Journal of Social Psychology 44, 421-429 (2014).

Kahan, D.M., Hank, J.-S., Tarantola, T., Silva, C. & Braman, D. Geoengineering and Climate Change Polarization: Testing a Two-Channel Model of Science Communication. Annals of the American Academy of Political and Social Science 658, 192-222 (2015).

Kahan, D.M., Peters, E., Wittlin, M., Slovic, P., Ouellette, L.L., Braman, D. & Mandel, G. The polarizing impact of science literacy and numeracy on perceived climate change risks. Nature Climate Change 2, 732-735 (2012).

Shehata, A. & Hopmann, D.N. Framing Climate Change: a Study of US and Swedish Coverage of Global Warming. Journalism Studies 13, 175-192 (2012).

Shi, J., Visschers, V.H.M. & Siegrist, M. Public Perception of Climate Change: The Importance of Knowledge and Cultural Worldviews. Risk Analysis 2015, advance on line.

 

Friday
Oct232015

Is there diminishing utility in the consumption of the science of science communication?

Apparently not!

Or at least not at Cornell University, where I gave 3 lectures Thurs. & had follow up meetings w/ folks Friday.

This is a university that gets the importance of integrating the practice of science and science-informed policymaking with the science of science communication.  The number of scholars across various departments in both the natural and social sciences who are applying themselves to this objective in their scholarship and pedagogy is pretty amazing.

Brief report:

No. 1 was a tallk for the Gloal Leadership Fellows affiliated with the Cornell Alliance for Science (“a global initiative for science-based communication”).  B/c the Fellows--an amazingly smart & talented group of science communication professionals & students-- were going to tail me for the rest of the day, I thought I should pose a couple of questions that they could think about & that I’d answer in later lectures. Of course, I asked them for their own answers in the meantime. Since theirs answers were, predictably, better than the ones I was going to give, I just substituted theirs for mine later in the day--who would notice, right?

The questions were:

1. Do U.S. farmers believe in climate change? &

2. Do evolution non-believers enjoy watching documentaries on human evolution?

The Fellows were very curious about these issues.

Slides here.

No. 2 was lecture to class “The GMO Debate: Science, Society, and Global Impacts.”  Title of my talk was, “Are GMOs toxic for the science communication environment? Vice versa?”  I think I might have been the first person to break the news to them that there isn’t any public contestation over GM foods in the U.S.

Slides here.

No. 3 was public lecture.  Discussed the “science communication measurement problem,” “the disentanglement principle,” and “cognitive dualism & communicative pluralism.”

Slides here.

Wednesday
Oct212015

Can I make you curious about science curiosity? . . .

If so, then, maybe you'll staty tuned. An excerpt from something I'm working on:

. . . . As conceptualized here, science curiosity is not a transient state (see generally Lowenstein 1994), but instead a general disposition, variable in intensity across persons, that reflects the motivation to seek out and consume scientific information for personal pleasure.

A valid measure of this disposition could be expected to make to make myriad contributions to knowledge.  Such an instrument could be used to improve science education, for example, by facilitating investigation of the forms of pedagogy most likely to promote the development of science curiosity and harness it to promote learning (Blalock, Lichtenstein, Owen & Pruski 2008).  A science curiosity measure could likewise be used by science journalists, science filmmakers, and similar professionals to perfect the appeal of their work to those individuals who value it the most (Nisbet & Aufdheide 2009). Those who study the science of science communication (Fischhoff & Scheufele 2014; Kahan 2015) could also use a science curiosity measure to deepen their understanding of how public interest in science shapes the responsiveness of democratically accountable institutions to policy-relevant evidence.

Indeed, the benefits of measuring science curiosity are so numerous and so substantial that it would be natural to assume researchers must have created such a measure long ago.  But the plain truth is that they have not.  “Science attitude” measures abound. But every serious attempt to assess their performance has concluded that they are psychometrically weak and, more importantly, not genuinely predictive of what they are supposed to be assessing—namely, the disposition to seek out and consume scientific information for personal satisfaction.

We report the results of a reasearch measure consciously designed to remedy this research deficit....

References 

Blalock, C.L., Lichtenstein, M.J., Owen, S., Pruski, L., Marshall, C. & Toepperwein, M. In Pursuit of Validity: A comprehensive review of science attitude instruments 1935–2005. International Journal of Science Education 30, 961-977 (2008).

Fischhoff, B. & Scheufele, D.A. The science of science communication. Proceedings of the National Academy of Sciences 110, 14031-14032 (2013).

Loewenstein, G. The psychology of curiosity: A review and reinterpretation. Psychological bulletin 116, 75 (1994).
Nisbet, M.C. & Aufderheide, P. Documentary Film: Towards a Research Agenda on Forms, Functions, and Impacts. Mass Communication and Society 12, 450-456 (2009).


 

 

 

 

 

Friday
Oct162015

Coming soon ... the Science Curiosity Index/Ludwick Quotient

Been busy at work on CCP "Evidence-based Science Filmmaking Initiative" (ESFI), and hence neglecting the 14 billion readers of blog... Sorry!

Am hoping what we will have to say on the progress we've been making will compensate.  More on that soon-- very soon.

But just to feed you enough information to prevent utter starvation, the coolest thing so far is a behaviorally validated Science Curiosity Index (SCI), which measures the disposition to seek out & consume science information for personal satisfaction.  It's amazing what one learns about science curiosity, which is definitely not the same thing as the science-comprehension disposition measured by Ordinary Science Intelligence, tells us about how people process information about contested science issues.

Some of us in the lab have taken to calling the SCI measure the "Ludwick Quotient" (LQ).

But more soon-- very soon, I promise!

Page 1 ... 4 5 6 7 8 ... 37 Next 20 Entries »