follow CCP

Recent blog entries
popular papers

What Is the "Science of Science Communication"?

Climate-Science Communication and the Measurement Problem

Ideology, Motivated Cognition, and Cognitive Reflection: An Experimental Study

Motivated Numeracy and Enlightened Self-Government

Ideology, Motivated Cognition, and Cognitive Reflection: An Experimental Study

Making Climate Science Communication Evidence-based—All the Way Down 

Neutral Principles, Motivated Cognition, and Some Problems for Constitutional Law 

Cultural Cognition of Scientific Consensus

The Tragedy of the Risk-Perception Commons: Science Literacy and Climate Change

"They Saw a Protest": Cognitive Illiberalism and the Speech-Conduct Distinction 

Geoengineering and the Science Communication Environment: a Cross-Cultural Experiment

Fixing the Communications Failure

Why We Are Poles Apart on Climate Change

The Cognitively Illiberal State 

Who Fears the HPV Vaccine, Who Doesn't, and Why? An Experimental Study

Cultural Cognition of the Risks and Benefits of Nanotechnology

Whose Eyes Are You Going to Believe? An Empirical Examination of Scott v. Harris

Cultural Cognition and Public Policy

Culture, Cognition, and Consent: Who Perceives What, and Why, in "Acquaintance Rape" Cases

Culture and Identity-Protective Cognition: Explaining the White Male Effect

Fear of Democracy: A Cultural Evaluation of Sunstein on Risk

Cultural Cognition as a Conception of the Cultural Theory of Risk


The "judicial behavior" measurement problem: What does it *mean* to say that "ideology" explains judicial decisions? 

This is another excerpt from Tthe latest CCP paper, "Ideology" or "Situation Sense"? An Experimental Investigation of Motivated Reasoning and Professional Judgment, Univ. Pa. L. Rev. (in Press). It presents what I consider to be the major methodological defect in observational--or correlational--studies that purport to find that "ideological" motivations explain variation in judicial decisions: the failure to specify a cogent theory of what counts as an "ideological" as oppoosed to a legal or jurisprudential motivation, and a resulting failure to specify what sorts of evidence would support an inference of "ideological" motivations.

A. Observational studies

Associated with the disciplines of political science and economics, studies that use observational methods make up the largest share of the literature on the impact of ideological motivations on judicial decisionmaking. Such studies use correlational analyses—in the form of multivariate regression models—that treat the “ideology” of individual judges as an “independent variable” the impact of which on case outcomes is assessed after partialing out or “controlling for” additional influences represented by other “independent variables.”  

There are different methods for measuring judges’ “ideologies,” including (in the case of federal judges) the party of the appointing President  and (in the case of Supreme Court Justices) the covariance of votes among judges who can be understood to be aligned along some unobserved or latent ideological continuum.  Such studies tend to find that “ideology” so measured explains a “statistically significant” increment of variance in judicial determinations. Studies looking at the decisions of federal courts of appeals, which assign cases to three-judge panels for determination, also find that the impact of ideology so measured can be either accentuated or muted depending on the ideological composition of judges on the particular panel.

Critics of these studies identify methodological problems that they believe constrain the strength of the inferences that can be drawn from them.  The most obvious of these is the sampling bias introduced by parties’ self-conscious selection of cases for litigation. . . . 

Another, more subtle, but equally serious problem for observational studies of judicial ideology is the classification of “case outcomes.” In order to measure the impact of a judge’s “ideology” on decisionmaking, it is necessary to determine which outcomes are consistent with that judge’s ideology and which ones are not. Scholars doing observational studies generally classify outcomes as “liberal” or “conservative” based on the type of case and the prevailing party: for example, decisions favoring the government in “criminal” cases are deemed “conservative” and those the defendant “liberal”; in labor law cases, outcomes are “conservative” if they favor “management,” and “liberal” if they favor unions, and so forth.  

The crudeness of this scheme not only injects noise into empirical analyses of case outcomes but also biases it toward overstated estimates of the impact of “ideology” on judicial decisionmaking.  It is a well known feature of the Anglo-American system of law that it frequently demands that judges resort to normative reasoning.  There is no way for highly general concepts such as “fraud,” “unreasonable seizure,” “unlawful restraint of trade,” “fair use,” “materiality,” “freedom of speech,” and the like to be made operative in particular cases without specifying what states of affairs those legal provisions should be trying to promote.  Under “common law” style of reasoning dominant in Anglo-American law,  the sorts of moral judgments that judges exercise to supply content to these types of concepts is not unconstrained; shared understandings of the general aim of the enacting legislature or other law promulgator, the appropriate deference to be afforded to previous elaborations of the content of the legal concept in question, and conformity to broader normative precepts that structure the law (“notice and opportunity to be heard,” “due process,” “like cases treated alike” etc.) limit the available interpretive options. But in ruling out many solutions, the sources of valid normative inspiration that judges can draw on often do not rule only one in.  

In this environment, it is perfectly commonplace for judges who have competing “jurisprudential” orientations to disagree on what normative theory should animate a particular legal provision. It is not a surprise, either, that in those instances the competing orientations that guide judges will be correlated with alternative political philosophies or orientations on the part of the judges in question.  Justice Douglas had a populist “economic decentralization” conception of “restraint of trade” for purposes of the Sherman Act; Professor and then Judge Robert Bork subscribed to an economic, “consumer welfare” alternative.  These positions undoubtedly cohered with their respective political “ideologies,” too, and likely did as well with the “ideologies” of judges who championed one versus the other understanding of how U.S. antitrust law should be structured. But those who understand how the law works—and the contribution that judges, using normative theories play, in imparting content to it—would not characterize this debate as reflecting extralegal “ideological” considerations as opposed to the perfectly ordinary, acceptable exercise of jurisprudential judgments.  Multivariate regression models are not necessary to ferret out the contribution that value-laden theories make to how judges decide these cases; judges openly admit that they are using such theories. Regardless of which President appointed these judges to the federal bench, no lawyer understands judges engaged in this sort of reasoning to be invoking “personal political preferences.”

An entirely different matter would have been presented, however, had Justice Douglas or Judge Bork proposed deciding an antitrust, labor law, free speech, criminal law or any other sort of case based on the religious affiliation of the litigants or on the contribution a particular outcome would have made to the electoral prospects of a candidate for President. The Sherman Act, the Wagner Act, the First Amendment, and even myriad criminal law statutes  all demand the use of the form of guided normative theorizing we are describing. But the bare desire to use legal outcomes in particular cases (or in large classes of them) to disadvantage those who subscribe to a disfavored view of the best life or to advance the cause of a particular political party is plainly outside the range of considerations that can validly be appealed to in the exercise of normative reasoning intrinsic to law. Whether in the form of regression coefficient correlations, law-enforcement wiretaps, or anonymously leaked emails, evidence that judges of particular ideologies were being influenced by such considerations would be a ground for intense concern.

There is a distinction, in sum, between resort to normative considerations that are internal to law and ones external to it. The former are licit, the latter illicit, from the perspective that lawyers and judges in the U.S. system of justice share of what counts as valid legal reasoning.

The “prevailing party” outcome-classification scheme used in observational studies of judicial ideology is blind to the distinction. As a result, such studies will count in their estimates of the influence of “ideology” perfectly mundane associations between the jurisprudential philosophies of judges deciding cases on the basis of normative considerations internal to law and the party of the Presidents who appointed them or the voting records of those judges and judges who feel likewise about the normative theories that inform labor law, free speech cases, criminal cases and the like.  

The correlations that these researchers report could also be capturing judges’ reliance on illicit political considerations, external to the law. But (critics point out) there is no way to know whether this is the case, or to what extent, given the indiscriminate coding of outcome variables that these studies employ.

Some candid adherents to the “ideology thesis”  have acknowledged this point.  But they have not supplied a response to what critics would identify as the significance of this concession. When observational-study proponents declare that they are finding that “ideology” accounts for judges’ decisions, they say they are measuring the extent to which those judges are not deciding cases on the basis of “law.” That is what gives this entire body of literature its currency—its “shock value.” But to the extent that the observational-study scholars are finding that judges who have different judicial philosophies will sometimes validly interpret the law to support different conclusions, then they are telling us something that already is clear— something, in fact, that the very judges whose behavior is being "explained" plainly say when they justify their decisions—and that gives no one any reason to be concerned about the quality of judicial decisionmaking.


Fun webinar event on politicization of science-- c'mon, sign up! 

I don't have anytime today to say anything -- interesting or not -- b/c I'm so busy preparing for this cool "webinar" on politicization of science.

Sign up-- you can ask really hard questions & try to stump the participants (or easy ones--those are even harder to get right).  Plus its free!


If you think local action focused on adaptation is not the path for promoting engagement with climate-change policymaking at the national level, you are wrong. So wrong.




We are *all* Pakistani Drs/Kentucky Farmers, Part 2: Kant's perspective(s)

This is an excerpt from another bit of correspondence with a group of very talented and reflective scholars who are at the beginning of an important research program to explain "disbelief in" human evolution. In addition, because "we may [must] regard the present state of the universe as the effect of its past and the cause of its future," this post is also a companion to yesterday's, which responded to  Adam Laats' request for less exotic (or less exotic seeming) examples of people using cognitive dualism than furnished us by the Pakistani Dr & the Kentucky Farmer. No doubt it will be the progenitor of "tomorrow's" post too; but you know that will say more about me than it does about the "Big Bang...."

I agree of course that figuring out what people "know" about the rudiments of evolutionary science has to be part of any informative research program here.  But I understand your project to be how to "explain nonacceptance" of or "disbelief in" what is known.

So fine, go ahead and develop valid measures for assessing evolutionary science knowledge. But don't embark on the actual project until you have answered the question the unreflective disregard of which is exactly what has rendered previous “nonacceptance” research programs so utterly unsatifactorywhat is it exactly that is being explained?

Isn't the Pakistani Dr's (or the Kentucky Farmer's or Krista's) "cognitive dualism" just a special instance of the perspectival dualism that Kant understands to be integral to human reason?

In the Groundwork for the Metaphysics of Morals and in both the 1st and 2d Critiques, Kant distinguishes two “self” perspectives: the phenomenal one, in which which we regard ourselves and all other human beings, along with everything else in the universe, to be subjects to immutable and determinstic laws of nature; and the “noumenal” one, in which we regard ourselves (and all other human beings) as possessing an autonomous will that prescribes laws for itself independently of nature so conceived.  

No dummy, Kant obviously can see the "contradictory" stances on human autonomy embodied in the perspectives of our "phenomenal" and "nouemenal" (not to be confused w/ the admittedly closely related "Neumenal") selves.

But he is not troubled by it.

The respective “beliefs” about human autonomy associated with the phenomenal and noumenal perspectives are, for him, built-in components of mental routines that enable the 2 things reasoning beings use their reason for: to acquire knowledge of how the world works; and to live a meaningful life within it.

Because there’s no contradiction between these reason-informed activities, there’s no practical—no experienced, no real -- contradiction between the sets of action-enabling mental states associated with  them.

Obviously, Kant's dualism has a very big point of contact with debates about "free will" & "determinism," and the coherence of "compatibilist" solutions, and whatnot.  

But as I read Kant, his dualism implies these debates are ill-formed. The participants in them are engaging the question whether human beings are subject to deterministic natural laws in a manner that abstracts from from what the answer allows reasoning people to do.

That feature of the "determinism-free will" debate renders it "metaphysical" -- not in the sense Kant had in mind but in the sense sense that logical positivist philosophers did when they tried to clear from the field of science entangling conceptualist underbrush that served no purpose except to trip people up as they tried to advance knowledge by ordered and systematic thinking.

I strongly suspect that those who have dedicated their scholarly energy to "solving" the "problem" of "why the presentation of evolution in class frequently does not achieve acceptance of the evolutionary theory" among students who display comprehension of it are mired in exactly that sort of thicket.

Both the Pakistani Dr and Krista "reject" human evolution in converging with other free, reasoning persons on a particular shared account of what makes life meaningful.  They then both turn around and use evolutionary science (including its applicability to human beings because it simply "doesn't work," they both agree, to exempt human speciation from evolutionary dynamics—just as it doesn't work to exempt human beings from natural necessity generally if one is doing science) when they use their reason to be members of science-trained professions, the practice of which is enabled by evolutionary science.

In behaving in this way, they are doing nothing different from what any scientist or any other human being does in adopting Kant's "phenomenal perspective" to know what science knows about the operation of objects in the world while adopting Kant's "nouemanal one" to live meaningful lives as persons who make judgments of value.  

Only a very remarkable, and disturbing, form of selective perception can explain why so many people find the cognitive dualism of the Pakistani Dr or Krista so peculiar and even offensive.  Their reaction suggests a widespread deficit in the form of civic education needed to equip people to  honor their duty as citizens of a liberal democracy (or as subjects in Kant's "Kingdom of Ends") to respect the choices that other free and reasoning individuals make about how to live.

Is it really surprising, then, that those who have committed themselves to "solving" the chimera of Krista's "nonacceptance problem" can't see the very real problem with a conception of science education that tries to change who people are rather than enlarge what they know?



We are *all* Pakistani Drs/Kentucky Farmers, Part 1: Manny's perspective(s)

Quite reasonably, Adam Laats at "I love you but you are going to Hell" asked me if I could come up with additional, less exotic examples of people using cognitive dualism than the Pakistani Dr & the Kentucky Farmer. Here's a start...

Krista's boyfriend, MannySo I was talking with Krista, the high school senior and aspiring veterinarian featured in Hermann's "cognitive apartheid" study, about how puzzling it is to me & everyone else I know that she could get a perfect score on her evolutionary-science exam and still not believe in human evolution. She told me I should go ask her boyfriend Manny for help because he was "really good at explaining stuff."

It turned out that Manny, like Krista, had “aced” the AP physics course at their high school.  

I thus asked Manny how he reconciled what he had learned about the “Big Bang” with his religious conviction that God created the universe and everything in it. 

He replied, “What the hell are you talking about, dude? I’m an atheist!”

“Oh, sorry,” I said, “I just assumed that if you were Krista’s boyfriend, you must be religious too. . . .”

“Well, that was a pretty stupid assumption,” he replied. “Sure, we have different opinions about religion but it’s not like people around here cut each other's heads off over disagreements like that,” he said, fiddling with his iPhone as he spoke.

“Not only do I believe everything I learned in the AP Physics course,”  Manny continued with a demonic grin, “but I also believe that the course explains everything in the universe, including this conversation.”

“Seriously?,” I asked.

“Yes, seriously,” he replied. “In fact, one of the questions on the AP Physics exam was, ‘We may regard the present state of the universe as the effect of its past and the cause of its future—true or false?’ Pretty obvious the answer’s ‘true,’ don’t you agree?," he asked.

“Yes, I guess so,” I said.

“Of course you agree—you have no choice in the matter!,” he stated with a smirk.

“So I guess you are going to be a scientist when you grow up then?...,” I asked.

“Nope. A moral philosopher,” Manny answered.

A moral philosopher– how can that be?!,” I asked. “If human behavior, along with everything else, can be linked to the impact of natural laws acting on successive states of the universe all the way back to the Big Bang, isn’t it silly to sit around philosophizing about how we ought to live? What ‘choice in the matter’ do we have?”

“That’s the sort of argument that seems really really clever when you are in junior high,” he replied. “Obviously people make reasoned decisions about how to live all the time.”

“But aren’t you contradicting yourself?,” I asked.  “You said you believe ‘we may regard the present state of the universe as the effect of its past and the cause of its future’—and yet you also are now saying that humans reasonably form their own life plans . . . .”

“Yes, I’m saying both of those things, but if you think that’s a ‘contradiction,’ you really are dense,” Manny said. “What I believe about the impact of natural laws on human beings and everything else in the universe, on the one hand, and what I believe about the power of free and reasoning human beings to decide how to live, on the other, are entirely different things.

“I don’t get it,” I said.

“What sort of sorry ass excuse for an education did you receive?,” Manny asked. “Didn’t you ever read Kant’s Groundwork for the Metaphysic of Morals? Or how about either of his first two "Critiques"-- of Pure Reason & of Practical Reason?”

“Ummm … Sure, but remind me– it’s been a while — what did he say that’s relevant here?,” I asked.

Manny sighed. “Kant posits that that human rationality involves a fundamentally dualistic self-perspective: as a member of the ‘sensible world,’ we perceive our actions, like everything else, to be caused by external forces of nature; but as a member of the ‘intelligible world,’ we perceive our actions as the product of our autonomous or self-determining wills.”

“Oh, right,” I said.

“You sound like you don’t get it,” Manny sneered. “But if you want ’empirical proof,’ just look at how every scientist lives her life. Yes, she believes that ‘the present state of the universe is the effect of its past and the cause of its future’—at work, where that belief enables her to contribute to the advancement of scientific knowledge. Nor does she bother concocting some metaphysical 'humans are exempted' proviso as she's doing so, because obviously that doesn't work.” 

“But at home,” Manny continued, “that same scientist disbelieves human actions are caused by deterministic natural laws." "The belief that that human beings have the capacity to choose how to live is woven into the mosaic of desires, emotions, and moral evaluations that enables her to be a parent who takes pride in the accomplishments of her children; or to be a citizen who decides she should do something to fight the threat that global warming poses to her community or to humanity or whatever."

“Don’t you see,” Manny resumed after a pause, “we are all Pakistani Drs!  Actually, I know you don’t see that; perhaps that is something you’d like to study sometime. Now if you’ll excuse me, I’m supposed to meet Krista so we can watch the latest episode of Mythbusters.”

Boy, those teenagers--such "know-it-alls"!


Science of Science Communication 2.0, Session 11.2: Wilner questions "consensus messaging" pedagogy for climate-science education along with her own previous views on teaching students to "believe in," not just comprehend, evolution...

I was so foolish to think that Science of Science Communication 2.0 had reached the end or at least the summer intersession!  

"Science communication honeybadger" Tamar Wilner,  having only appeared to have lapsed into a state of permanent unconsciousness after chomping off the head of the "belief vs. knowledge" viper in teaching evolution, suddenly awoke and started in chewing again, this time on whether "consensus messaging" is an effective/appropriate form of climate-science education for secondary school students.... Yow--she's fearless! 

Her appetite was so voracious, moreover, that she ended up devouring at least part of her own previous (qualified) endorsement of teaching students to "believe in" evolution.

See for yourself! 

And add your own witty voice-over commentary below but only if you are willing to reciprocate the contribution she is making to the class's refletive engagement with complexity.



The law's "neutrality communication problem"

This is an excerpt from the latest CCP paper, "Ideology" or "Situation Sense"? An Experimental Investigation of Motivated Reasoning and Professional Judgment, Univ. Pa. L. Rev. (in Press).

We have suggested that the results of this study furnish evidence relevant to assessing whether identity-protective cognition affects expert scientists. We now want to show how understanding the role of identity-protective cognition in conflicts over policy-relevant science can be used to highlight the practical significance of our study results for the administration of justice.

There is an obvious sense in which the results of this study can be understood as good news for the justice system. The perception that judges are “just politicians in robes” is, as we noted, commonplace. The popular view that judges decide cases on the basis of political or cultural commitment extrinsic to law is both understandable and distressing. Yet in an experiment designed to avoid methodological limitations associated with studies that have purported to corroborate this anxiety, we found evidence that judges of diverse cultural outlooks can be expected to converge on results in cases that predictably divide the public. Their job is to decide those sorts of cases neutrally, and our evidence supports the inference that they have both the capacity and disposition to carry it out.

That such a result defies public perceptions should not come as any sort of surprise. Numerous studies have found that members of the general public themselves can be expected to conform their assessments of evidence and their interpretation of rules to the stake they have in legal outcomes that affirm the status of their groups and their own standing within them. These studies, we have emphasized, are not a reliable basis for drawing inferences about the in-domain reasoning processes of judges. But the one sort of inference that they do support is that members of the public can be expected to perceive judges to be biased in cases the outcomes of which are invested with antagonistic cultural meanings even when the outcomes of those cases reflect neutral decisionmaking.

That conclusion is, in fact, the bad news associated with our study results: the reliable convergence of culturally diverse judges on genuinely neutral outcomes has no connection at all to how untrained members of the public perceive the neutrality of those judges’ decisions. Again, because citizens lack the elements of professional judgment—the “situation sense”—that lawyers and judges acquire through their training and experience, citizens don’t have the capacity to discern those aspects of the case and the governing legal rules pertinent to assessing the neutrality or validity of judicial resolutions of them. On the contrary, in precisely those cases in which public anxiety about the cultural neutrality of the law is likely to be highest, identity-protective cognition will predictably disable members of the public from using their usually reliable lay prototypes of valid decisionmaking to assess cases outcomes. In that circumstance, no matter how expertly and impartially judges decide, one or another segment of the public will be disposed to see judges’ decisions as “politically biased” whenever courts are called upon to resolve culturally fraught controversies.

This problem is exactly parallel to the one that scientists face when empirical issues on which they possess expertise becomes entangled in culturally contested meanings. Obviously, doing valid science does not in itself communicate the validity of scientific research: people lack the expertise to see validity for themselves; they must rely on cues and processes that help them to reliably recognize who knows what about what. The capacity of members of the public to interpret those cues is compromised when propositions of risk or fact become symbols of the status of competing cultural groups. In that sort of “polluted science communication environment,” just doing valid science—including the part of valid science that consists in communicating validity to other scientists—will do nothing to silence public confusion and agitation.

Fixing this science communication problem is the aim of a new science of science communication. This subdivision of decision science uses empirical methods to identify the various dynamics that enable people to recognize as valid scientific insights that they could never verify for themselves. It aims, too, to understand, empirically, how those processes can be disrupted, and how society can effectively preempt such disruptions and counteract them when strategies of prevention fail.

Exploiting the benefits of the science of science communication will demand appropriate adjustments to myriad institutional practices. The sorts of conscious interventions necessary to protect the science communication environment from contamination are not self-executing. An integral part of the science of science communication, then, is to identify programs of implementation that appropriately reconfigure the processes for science-informed policymaking, the norms of science-generating and -consuming professions, and the structure of university training of scientists and public-policymaking professionals.

The law has a similar communication problem. Doing and communicating neutral decisionmaking are as different from one another as doing and communicating valid science. Just as solving the science communication problem demands scientific knowledge and appropriate institutional reforms, so solving the law’s neutrality communication problem will require appropriate acquisition and use of empirical knowledge of a sort aimed at expanding understanding of how people come to recognize the neutrality of the law and what law should do to make its neutrality fully recognizable.

There is one critical difference, however, between the science communication problem and the neutrality communication problem. Unlike scientists, judges are expected both to make valid decisions and communicate the validity of their work to the public. It is widely recognized that the experience of liberal neutrality in law depends on the public’s confidence that the law is genuinely impartial. The practice of reason-giving reflected in judicial opinions is understood to be intrinsic to the rule of law precisely because public assurance of the law’s neutrality depends on their access to a reasoned account of the neutral, impartial grounds for courts’ decisions.

The legal profession is doing well, our study suggests, in equipping those of its number who serve as judges to be neutral decisionmakers. But the very ubiquity and persistence of conflict over whether judges are in fact deciding cases on neutral grounds is a testament to how little the profession knows, and how poorly equipped its members are, to communicate the neutrality of the law. That deficit in lawyers’ “situation sense” is itself a barrier to citizens’ enjoyment of the value that neutral judicial decisionmaking confers on them.


Revisiting "cultural cognition as a conception of the cultural theory of risk"

I'm going to be giving a presentation at the 6th Annual Mary Douglas Seminar series at University College London next month.  I'm pretty psyched, b/c I've known about the series and always been really envious of the participants for their chance to exchange ideas with one another on the significance of Douglas's work for making sense of public conflict over risk and related topics.

I'll be presenting a paper-- which I'll post it in 2-3 weeks-- that extends/updates/qualifies an earlier one I did on relationship between "cultural cognition" &  Douglas & Wildavsky's cultural theory of risk.

Can ‘cultural cognition’ help solve CTR’s ‘mechanisms problem’?

My paper will address the contribution ‘cultural cognition’ makes to remedying a deficit in Cultural Theory relating to the psychological and behavioral mechanisms that connect cultural worldviews to individual risk perceptions. Indeed, ‘cultural cognition’ was self-consciously designed to forge the connection between the cultural and psychometric theories of risk that Douglas (1997) proposed in her essay ‘The Depoliticization of Risk.’ Prepared specifically for the conference, my paper will use this theme to animate a brief survey of ‘cultural cognition’ studies. It will also present new data suggesting how cultural cognition dynamics might be understood to support the so-called ‘mobility thesis’ (Rayner 1992), which sees institutions (or social contexts more generally) rather than individuals as the agents through which opposing worldviews operate to generate variance in risk perceptions. ‘Cultural cognition’ does not furnish a unique solution to Cultural Theory’s ‘mechanisms problem’; but without a solution, Cultural Theory, I will argue, cannot be expected to sustain a meaningful empirical research program for investigating societal conflict over risk.

Best thing: Steve Rayner will be my commentator-- maybe I'll just cede all my time to him so I don't mistake of talking too much & depriving myself & others of any of the benefit of hearing what he has to say.




*Look* at my data: Graphic data postcards from " 'Ideology' or 'Situation Sense' "

This post collects a bunch of graphic presentations of data from the latest CCP paper “Ideology” or “Situation Sense”? An Experimental Investigation of Motivated Reasoning and Professional Judgment.

Graphic presentation of data is the common carrier of reason in empirical studies.  It’s what makes it possible for any curious, reflective person to critically engage the study findings independently of their facility with statistics. 

In my opinion, scholars who rely on statistical analyses that are not accessible to all curious, reflective people are engaged in a species of intimidation, not communication.  There’s a very high likelihood, too, that they themselves don’t really get what they are doing.

1. Showing the data.  Pretty much always the first step in competent data reporting is to show the reader the “raw data.” 

If someone has done a valid experiment to test some hypothesis, then he or she should be able to show readers—just by holding the data out in front of their eyes—that the experiment either supports or undermines that hypothesis. 

Here's an example involving scatterplot w/ lowess lines superimposed--click it!The pont of applying a statistical model is to discipline and extend the infernce one is drawing from results one can actually see; it isn’t to magically cause to appear out of a mass of tangled observations a result that can’t otherwise be seen.

So how to “show the raw data”?  This is not as straightforward as it sounds!

If one does an experiment, e.g., in which one posits that there will be an interaction between predictors (say, “cognitive reflection” and “political outlooks,” or "religiosity" & "science comprehension") that varies in relation to experimental treatments, one has to figure out a way to display the observations that makes that pattern (or the lack thereof) visible to the naked eye. That can be darn tricky!  Maybe some sort of appropriately color-coded scatter plot will work, or, if there are too may observations or too many points of contrast to make that feasible, lowess regression lines will Too many observations & predictors here to do scatterplot, but the lowess lines furnish honest testimony on what the raw data look like. Click it!help.

“Showing the data” wasn’t very hard, though, in “Ideology” or “Situation Sense.”  

The rival hypotheses in that paper had to do with the relative responsiveness of different subject types—judges, lawyers, law students, and members of the public—to experimental manipulations designed to trigger identity-protective reasoning in two statutory interpretation problems. As discussed previously, the manipulations altered the identities in a manner expected to generate this form of bias among egalitarian communitarians and hierarchical individualists in one case, and among egalitarian individualists and hierarchical communitarians in the other.

So the simple thing to do was just to show for each subject type (judge, lawyer, student, and member of the public) the impact of the experimental assignment on the proportion of subjects with the relevant worldview (determined by their score on the two worldview scales) who construed the statute to have been violated: 

These results make it apparent that the experimental assignment affected the interpretations of members of the public of opposing worldviews, particularly in the “Littering” problem (which involved whether leaving reusable water containers in a desert constituted “discarding . . . deberis” in a wildlife preserve).

Judges and lawyers, in contrast, were not affected to a meaningful degree (or in the patterns suggestive of identity-protective cognition) in either the “Littering” problem or the “Disclosure” problem (the latter of which involved a statutory ambiguity relating to release of law enforcement investigatory information to a member of the public).

Law students were somewhere in between.

These results were consistent with the hypothesis that the sort of “professional judgment” lawyers and judges acquire through training and experience (and which law students possess in an incipient form) protect them from the impact of identity-protective cogntion in cases that predictably polarize culturally diverse members of the public.

2. Simulating the statistical model. The apparent corroboration of that hypothesis was probed more systamatically with a multivariate regression model designed to assess the respective impacts of subject-type, cultural worldview, and experimental assignment on the subjects’ responses.

One way in which the model enhances our insight relative to inspection of the raw data is by measuring the impact of cultural worldviews as continuous variables. So in addition to helping us overcome the anxiety that what looks like signal is just noise, the model measures the impact of the cultural worldviews in a manner more sensitive to the varying intensity of individuals’ commitments than does simply assigning individuals to “groups” based on their scores in relation to the means on the two scales.

go ahead click on this. Almost certainly a reasearcher *should* give you this; but if that's all he or she gives you, demand your $ backThe model is a monster, particularly given the number of cross-product interaction terms needed to estimate the effects of experimetnal assignment and worldview separately for each subject type.

Someone who understands multivariate regression analysis can, with patience and persistence, make sense of the coefficient and standard error for each predictor.

But even that person will not be able to assess from the face of the regression output what all this information signifies in relation to the study hypotheses.

Accordingly, a researcher who proclaims that his or her hypothesis is “confirmed” (or worse, “proven” etc.) by the signs and “statistical significance” of the regression model coeffricients (even one that is much simpler than this) is engaged in an embarrasing display of handwaiving (& someone who does that after reporting a pile of fit statistics of the sort associated with an ANOVA—ones that don’t convey anything about effect size --is not even getting that close to relating something of value).

The necessary information has to be extracted from the model by using it to genrate outcomes that reflect those combinations of predictor values relevant to the study hypotheses.

One way to do this is by monte carlo simulation.  In effect, a monte carlo simulation uses the specified predictor values to estimate the outcome a zillion times (1000 times actually is sufficient), adding to each estimate a random increment calibrated to the measurement error of the relevant predictors.

In the end, one gets a bell-curved distribution of values that indicates the relative probability of outcomes associated with the specified combination of predictors. The most likely outcome is the mean one, at the peak of the curve; values progressively larger or smaller are progressively less likely. One can, if one wants, figure out the 0.95 CI by identifying the values at the 2.5th and 97.5th percentiles.

But the best thing about using monte carlo simulations (particularly for logistic regression, which estimates the probability of one outcome of a dichotomous variable) is that the resulting probability distributions can be graphically displayed in a manner that enables any reflective, curious person to see exactly what the model has to say about the inference one is using it to assess (King, Tomz & Wittenberg 2000).

Here, e.g., it can be seen, from how spaced out the probability distributions are, how unlikely it is that an egalitarian communitarian member of the public is to agree with a hierarchical individualist one in a particular version of the “Littering” problem—or with a member of the public who shares his or her values but who was assigned to the other version.

Likewise, it can be seen from how bunched together the probability distributions are just how low the probability is that judges of opposing worldviews are to disagree. Same for lawyers.

Again, students are in the middle.

Here are some e.g.'s of using MC simulation to estimate effect size of experimental manipulation and difference of it across subject types. Click it--c'mon-- you know you can't resist!One can also use the model to estimate the size of the differences in the impact of the experimental manipulation on various types of subjects, or the average impact on one or another subject across the two problems.

This is not only 10^9 times more informative for any curious, reasoning being—one who actually would like to think for him or herself than be told what to think by someone who probably doesn’t really know what he or she is doing—than being shown a regression output with a bunch of asterisks; it’s 10^6 more informative than being told “the effect is x%, p < 0.05,” and 10^3 more than being told “p%, ± q% at 0.95” (Gelman, Pasaria & Dodhia 2002).

3. Likelihood ratios.  But in my view, the very best thing we did in in “Ideology” or “Situation Sense” was graphically display the likelihood ratios for opposing hypotheses relating to the effect of identity-protective cogntion on particular subject types.

I’ve already posted an excerpt from the paper that addresses what we were doing here.

But in sum, a likelihood ratio specifies how much more consistent a piece of evidence is with one hypothesis than another and is the factor in proportion to which one revises one’s assessment of the probability of that hypothesis under Bayes’s Theorem.

As such, it characterizes the weight of a piece of evidence—something that a p-value, contary to an obscenely prevalent misconception, does not do, no matter how friggin’ small it is (Good 1995).

Where one is doing an experiment or otherwise making an empirical estimate subject to measurement error, the likelihood ratio just is the relative probabilities of observing the experimental result under the relevant hypotheses (Goodman 1999a, 1999b, 2005).

One can visualize that by juxtaposing the probability distributions associated with the relevant hypotheses—and comparing how like the observed experimental result is under the respective distributions.

If we assume the distributions have the standard error (which determines the slope, basically, of the bell curve) as the experimental result, the ratio of the heights of the observed result on the two distributions is the likelihood ratio associated with the experiment for the rival hypotheses (Morey 2014).

In my view, researchers ought to convey the likelihood ratio or its conceptual equivalent. By doing that, they make it plain for the reader exactly what an empirical finding (if based on valid methods) truly is: not conclusive “proof” of any particular proposition, but evidence of some degree of probative force to be added, along with all the other evidence one has and ever will get one's hands on, to the scale one is using to weigh the relative strength of competing hypotheses.

The menagerie of fit statistics (p-values, chi-squares, omnibus F-statistics, etc) associated with conventional null hypothesis testing obscure that—indeed, necessarily fail to convey the information one would need to treat empirical data that way.

But even if a researcher is considerate and reflective enough to use a form of statistical analysis that yields the weight of the evidence, there is still the task of making that information comprehensible to the curious, reflective reader who is not trained in statistics.

Graphic display is the way to do that.

So, do you get it?

If not, and you are carrying through on your end of the bargain to apply your reason here (if, understandably, the discussion of the monte carlo simultions is too compact for you to fully grasp here, then go to the relevant discussion of them in the paper; for more on the logic of likelihood ratios and the graphic presentation of them, go back to the previous post; read all this closely, & think things through; you can’t learn anything if you don’t make the effort to teach yourself), then the inaccessibility of my statistics is my problem, not yours.

Tell me and I’ll try even harder.


Gelman, A., Pasarica, C. & Dodhia, R. Let's Practice What We Preach: Turning Tables into Graphs. Am Stat 56, 121-130 (2002).

Good, I.J. Weight of evidence: A brief survey. in Bayesian statistics 2: Proceedings of the Second Valencia International Meeting (ed. J.M. Bernardo, M.H. DeGroot, D.V. Lindley & A.F.M. Smith) 249-270 (Elsevier, North-Holland, 1985).

Goodman, S.N. Introduction to Bayesian methods I: measuring the strength of evidence. Clin Trials 2, 282 - 290 (2005).

Goodman, S.N. Towards Evidence-Based Medical Statistics. 1: The P Value Fallacy. Ann Int Med 130, 995 - 1004 (1999a).

Goodman, S.N. Toward evidence-based medical statistics. 2: The Bayes factor. Annals of internal medicine 130, 1005-1013 (1999b).

Kahan, D., Hoffman, D., Evans, D., Lucci, E., Devins, N., Cheng, K. “Ideology” or “Situation Sense”? An Experimental Investigation of Motivated Reasoning and Professional Judgment. Univ. Pa. L. Rev. (in press).

King, G., Tomz, M. & Wittenberg., J. Making the Most of Statistical Analyses: Improving Interpretation and Presentation. Am. J. Pol. Sci 44, 347-361 (2000).

Morey, R. Bayes factor t-tests, part 1. Bayes Factor Blog. (2014).


The making of a Pakistani Veterinarian in Kentucky: cognitive apartheid vs. cognitive dualism 

Krista is a high school senior who lives in a Southern U.S. state. As one might guess, the vast majority of her classmates identify themselves as religious and regularly attend church.

She excels in the study of science. She is one of a handful of students in her school who is enrolled in an Advanced Placement biology course. She also volunteers as a “peer tutor” for students in a basic science course that covers the origin of the universe and the natural history of living organisms on earth.

Her goal is to become a veterinarian.

But she "rejects" evolution as contrary to her faith: God made “man” in “his own image”; to believe “that apes and humans have a common ancestor,” she states, “would be wrong.”

Krista was one of the subjects interviewed in the qualitative component of a study conducted by Ronald Hermann (2012), a researcher interested in the attitudes of students who learn evolutionary science but don’t “believe in” or “accept it.” 

Hermann selected Krista for the interview, in fact, because she obtained a near-perfect score on an evolutionary-science test.  

The test was the principal element of the quantitative component of Hermann’s study.  His results in this respect corroborated what numerous previous studies have established: that there is no correlation between students’ “beliefs” about evolution and their comprehension of concepts such as natural selection, random mutation, and genetic variance.

Hermann’s motivating hypothesis was that students in Krista’s situation would display a form of intellectual resistance dubbed “cognitive apartheid” (Cobern 1996).

The “cognitive apartheid” thesis is an alternative to another position—“cognitive assimilation” let’s call it—that imagines that teaching non-believing students evolutionary science will “change their minds” about the role of divine agency in the creation of our species.

According to the “cognitive apartheid” view, religious students consciously and effortfully segregate evolutionary-science insights. They reliably summon them from some walled off mental “compartment” to pass their examinations but otherwise block integration of them into their mental lives and ultimately expel them altogether upon completion of their educations (Cobern 1996).

This account arguably fit the perspective of one of the students featured in the qualitative component of Hermann’s study.

“The science stuff we learn about evolution and stuff like that all the time,” explained Aidan, a star athlete with a 4.0 grade point average, “I understand it, but I definitely don’t believe in it.” “I just block it out and do it because, I mean, otherwise I fail or something like that, and I’m not going to sacrifice that.” 

But “cognitive apartheid” clearly didn’t capture the complexity of Krista’s thinking.

To be sure, she had elected, very self-consciously, to persist in her state of “disbelief” as a matter of religious conviction.

She recounted, for example,  her abortive attempt to reconcile evolutionary science with her faith by positing the applicability of evolution to animals but not human beings.  On reflection, she concluded that approach just “doesn’t work”—either for making sense of evolution or for preserving her “relationship with God” (“or whatever,” she adds; she is an honest-to-god teenager).

But at the same time, it was clear there was nothing about Krista’s adoption of this stance that entailed quarantining evolutionary science in some “exam use only” mental chamber or barring integration of it into her life goals generally.

Her willingness to tutor less advanced students, for example, hardly evinced the begrudging, “under protest” mindset that the “cognitive apartheid” model envisions.

Like Aidan, Krista did explain—in terms that showed she regarded the point as stunningly obvious—that she saw learning evolutionary science as essential to academic success: “For the AP bio test . . . you can’t write on there, God created humans and all the things cause they’ll just be, like, zero [score].”

But asked whether she therefore planned to put evolutionary science out of her mind once she had finished the course, her reply revealed that she viewed the answer to that question to be stunningly obvious, too: No, of course not, “cause I like animals” too much to “forget” evolution, and besides “I like learning about that stuff anyway.”

Both the “cognitive assimilation” and “cognitive apartheid” accounts envision "beliefs" as stand-alone mental objects that reflect simple “on/off,” “accept/reject” states in relation to states of affairs.

This picture makes little sense, though, as a psychological matter.

People's minds are not proposition registries.  

Rather they comprise multi-faceted ensembles of mental states—desires, emotions, moral appraisals, and the like—distinctly suited for enabling people to do things.  When embedded in such complexes, beliefs cannot be identified with reference solely to their objects; they can be individuated only in relation to the actions they enable  (cf. Hetherington 2011).

Krista’s life plan involves two goals: to be a person who has a particular religious identity; and to be a certain type of science-trained professional—a veterinarian.

A state of “disbelief in” evolution will be integral to the mental routines that enable her to achieve the former end: treating it as “wrong” to view apes and humans as having descended from a common ancestor will help her to maintain her “relationship with God” and, no doubt, a  larger community of people who share a sense of the best way to live.

At the same time, a “belief” that animals evolved—that it “makes sense” to view “cats and dogs” as having “share[d] a common ancestor at some point,” and that it “doesn’t work” to think of human beings as being uniquely exempted from the same dynamics of speciation—will reside in the cluster of intentional states that enable her to be a science-trained professional. 

In other words, like Everhart & Hameed's Pakistani Dr (2013), she will, disbelieve evolution “at home,” and believe it “at work.”  But she will experience these states as “entirely different things” because they cannot in fact be individuated independently of the action-enabling aggregations of mental states in which they are embedded.  

The "cognitive apartheid" framework misleadingly suggests that the "knowledge" of evolution that a "nonbelieving" student like Krista acquires reflects a less genuine and lasting engagement than does the form of "belief" to which the "cognitive assimilation" view aspires.  

The truth is that most of Krista's classmates who profess “belief” in evolution will indeed quickly forget what they learned about the modern synthesis in high school--assuming they learned anything to begin with.  Nor will they ever use that "belief" to do anything meaningful in their lives.

Krista, in contrast, will reliably use her retained comprehension of evolutionary science as necessary to be a good veterinarian.

Just as important, her genuine comprehension of the theory of evolution will inform her understanding of herself as a member of a profession whose expertise originates in the distinctive, scientific way of knowing that generated that theory, including its account of the natural history of human beings.

Kentucky Farmer sure is excited about the development of this climate-change resistant chicken!She'll carry on a conversation in the morning with the scholarly researcher about her disbelief in evolution while making use of evolutionary insights to determine whether to tolerate or suppress the fever of the researcher's ailing dog  (LeGrand & Brown 2002).

Later in the day, she'll nod agreeably as Kentucky Farmer explains why there's no evidence for climate change as she treats his climate-change resilient genetically engineered chickens.

And because she really does love animals and  “like[s] learning about that stuff anyway,” she'll prop herself up comfortably in her study to read Bolhuis & Girladeau's The Behavior of Animals: Mechanisms, Function and Evolution after returning home from church on Sunday.

This cognitive dualist stance toward evolution will not involve any contradiction in Krista's “beliefs” so long as the practical ends enabled by the mental routines in which those beliefs reside do not themselves interfere with one another.

They obviously don't have to. But they might.

To her immense disappointment, Krista might discover that she can’t both enjoy a religious identity in which denying evolution expresses her “relationship with God or whatever” and a professional one in which affirmation of evolution expresses her “love of animals” and her pleasure in “learning about stuff like” the “big bang” and natural selection.

If so, she tells the interviewer, she’ll “be upset.”

The source of this upsetting incompatibility, however, will not be any sort of logical or psychological contradiction.

Rather it will be an imperfection in the constitution of an aspiring Liberal Republic of Science that hasn’t yet acquired the knowledge, created the institutions, and cultivated the public mores necessary to quiet the forms of cultural status competition that force diverse citizens to choose between using their reason to know what is known by science and using it to express their defining moral commitments (Elsdon-Baker 2015; Hameed 2015; Kahan 2015; Long 2011; Kahan in press).


Bolhuis, J.J. & Giraldeau, L.-A. The behavior of animals: mechanisms, function, and evolution (Blackwell Malden MA, 2005).

Cobern, W.W. Worldview theory and conceptual change in science education. Science Education 80, 579-610 (1996).

Elsdon-Baker, F. Creating creationists: The influence of ‘issues framing’ on our understanding of public perceptions of clash narratives between evolutionary science and belief. Public Understanding of Science  (2015).

Everhart, D. & Hameed, S. Muslims and evolution: a study of Pakistani physicians in the United States. Evo Education Outreach 6, 1-8 (2013).

Hameed, S. Making sense of Islamic creationism in Europe. Public Understanding of Science 24, 388-399 (2015).

Hermann, R.S. Cognitive apartheid: On the manner in which high school students understand evolution without Believing in evolution. Evo Edu Outreach 5, 619-628 (2012).

Hetherington, S.C. How to know : a practicalist conception of knowledge (J. Wiley, Chichester, West Sussex, U.K. ; Malden, MA, 2011).

Kahan, D.M. Climate-Science Communication and the Measurement Problem. Advances in Political Psychology 36, 1-43 (2015).

Kahan, D.M. What is the "science of science communication?" J. Sci. Comm (in press).

LeGrand, E.K. & Brown, C.C. Darwinian medicine: applications of evolutionary biology for veterinarians. The Canadian Veterinary Journal 43, 556-559 (2002).

Long, D.E. Evolution and religion in American eduation : an ethnography (Springer, Dordrecht, 2011).


Special feature: Insights on S. Ct. prediction models from someone who knows what he is talking about

I did a couple of posts commenting (one here and another here) on the performance of computer models designed to predict the outcomes of Supreme Court cases. Taking the bait, someone who actually knows something about this issue felt obliged to step in and enlighten me, along with the 14 billion regular readers of this blog, 12 billion of whom rely exclusively on the site for information on all subjects.  So read and learn!  I've already updated my own views on the subject based on the analysis and will have something to say "tomorrow."

A Response: Computer Programs and Predicting Supreme Court Decisions

Justin Wedeking, University of Kentucky

In Professor Kahan’s recent post (hereafter Kahan) he tackles two Supreme Court forecasting models.  For clarity I’ll use the same labels. The first model – “Lexy” or “Lexy1” - refers to the forecasting challenge from the 2002 Term that pitted “machine” against legal experts (Martin, Quinn, Ruger, and Kim 2004; Ruger et al. 2004).  The second model – “Lexy2”- is the recent (and still ongoing) effort by Katz, Bommarito and Blackman (2014).[1]  The goal of this “reply” post is to offer some thoughts on Kahan’s critiques as well as on these forecasting models that will hopefully reshape how we think about Court forecasts. 

There appears to be two main issues in Kahan’s post.  First, Kahan’s primary concern appears to be that neither attempt at forecasting true, “out of sample” cases does “very well.” A related, and close secondary concern is that this failure to do well is problematic for various scholars’ claims made with respect to what he calls “the ideology thesis”- which can be thought of as the claim that judges’ decisions are driven more by their own ideology (or personal policy preferences) than “the law.”  In perceiving a lack of evidence for “the ideology thesis” this is potential damning evidence for scholars who believe that ideology is a major factor in Supreme Court decision making.  Namely, it suggests that we know relatively little about decision making.  

With respect to Kahan’s first point, I do not have any strong disagreements but rather three points that suggest more caution is needed before forming conclusions about forecasting models.  The rest of the post is divided into three sections:

  • In section one, I identify and discuss different criteria for determining when we have a successful prediction;
  • In section two, I take a closer look at what is being predicted (i.e., the dependent variable) and offer a few thoughts;
  • In the third section, I close with some thoughts about the models and machine learning algorithms used in Lexy1 and Lexy2.

Regarding Kahan’s argument on the ideology thesis, I will save my thoughts for a later date.

Keep reading (or else you will forever be denied enlightenment!)


Univ. of Colorado deja vu! (sort of): c'mon down & hear about what the "science of science communication" has to say about the "science communication problem"

I'm on my way back to Boulder. But am going to refrain, during the trip, from writing a blog (was working on one on graphic data presentation) so that I don't miss any connecting flights this time.

Anyway, if in area, come to tomorrow's lecture at Univ. of Colorado. This one won't be on professional judgment & motivated reasoning (as last one was) but on science of science communication.  Haven't figured out exactly what I'll talk about, but I have a feeling Pakistani Dr and Kentuck Farmer will insist on making an appearance.

If not in area, then sign up for webcast.

Also, I discovered last time that they serve these great brownies at the post-lecture reception! 


A question to be answered at the very beginning of an "acceptance of evolution" research program

From correspondence between me and a group of very accomplished and reflective researchers who are not examining "acceptance" of human evolution:

What does it actually mean for someone to "believe in" evolution or "accept" evolution independently of how that person answers the belief/acceptance question as it is typically posed in an opinion poll or (invalid) "science literacy" quiz?

In general, is it sensible-- philosophically or psychologically-- to characterize as being the "same type of mental phenonomon" (1) an intentional state that reflects assent to or affirmation of some factual proposition that has no connection to any activity other than exactly that -- a disposition to express assent or affirmation to a survey or quiz item; and (2) an intentional state that reflects assent to or affirmation of some factual proposition that enables some independent, goal-focused activity?

E.g., right now I "believe" or "accept" that I'm sitting in a chair in front of my computer. That belief is bundled in w/ a bunch of intentional states that enable me to correspond with you.  

At the same time, if I check some registry in my mind, I can confirm I "believe"  that "Columbus sailed to America in 1492."  But that "belief" isn't enabling me to do anything; I never use it to anything, in fact.  

I'm sure there's some meaning in the proposition "Columbus sailed to America in 1492" & some meaning in the proposition "I believe Columbus sailed to American in 1492."  But I think it is facile to say that the intentional state that characterizes my assent to that proposition is the "same kind" of intentional state that characterizes my assent to the proposition that I'm sitting in a chair right now.

If those are "different kinds" of intentional states, then which of those two or which third one are you interested in studying when you try to explain "nonacceptance" of evolution?  

If the intentional state you are interested in studying, moreover, isn't one that enables someone to do things (scientific research on the natural history of humans, practice certain types of medicine, educate science students, transmit scientific information etc) that can be done properly only with an "action-enabling" sort of assent in evolution-- why exactly do you want to explain that?

I'm not saying there can be no worthwhile answer to that last question -- just that, by hypothesis, the answer can't be that you are trying to explain variance in any sort of intentional state necessary to do anything that depends on "accepting" the best available evidence of the natural history of human beings.

Would it be bothersome to discover that the form of intentional state of "acceptance" that is measured by the "46% believe..."opinion poll finding is one that has nothing to do with enabling anything? Or anything other than conveying that one has the sort of cultural identity enabled by answering a survey  or "science literacy" quiz item in a particular way by persons who either never do anything that depends on using the best evidence of the natural history of human beings or who do assent to or believe in evolution when they are doing those things?

I'm pretty sure most scholars who conceive of  "nonacceptance of evolution" as a "problem" to be "solved" never think about these things.  I think that is itself a phenomenon that it would be interesting to study!  

But in any case, I am pretty sure it is not possible to chart a reliable course for a research program here w/o having satisfactory answers to these questions.



Science of Science Communication 2.0, Session 14: science documentaries!!!!

this is the end, my only friend(s): the last session of  "Science of Science Communication 2.0" In this session, we examine science documentarieis. Reading list here, & study/discussion questions below.  

Obviously, if you have contributions to make to previous sessions -- including the last one on science journalism or one before that on climate-science teaching-- the on-line sessions are still "ongoing."

But enlighten us, please, on this topic & others!


"Cognitive dualism" research program: a fragment ...

From something that collaborators and I are working on . . . .

We propose a multi-stage research program to investigate "cognitive dualism." This dynamic is marked by simultaneous states of apparent belief and disbelief of empirically derived facts. Conspicuous examples include scientists or doctors who reject evolution yet use evolutionary science in their research or professional decision-making, and farmers who dismiss evidence of climate change while using it to guide their commercial activities. Dominant psychological accounts attribute cognitive dualism to one or another reasoning deficit such as dissonance avoidance. Our project, in contrast, builds on work that links cognitive dualism to rational information processing. People use their reason for a plurality of ends enabled by distinct clusters of intentional states (emotions, desires, moral appraisals, and the like). The opposing beliefs characteristic of cognitive dualism, we surmise, exist only within these clusters; where the ends that they enable—such as belonging to a cultural community and occupying a professional role—are practically compatible, the characterization of the beliefs as “contradictory” reflects a psychologically arbitrary criterion of identity. The proposed program will test this hypothesis in relation to rival accounts of cognitive dualism and identify prescriptions for communicating science geared to accommodating rather than antagonizing this dynamic.



Weekend update: Science of Science Communication 2.0 -- Tamar Wilner confronts the "belief-comprehension" question in teaching evolutionary science

Showing why she has come to be kown as the "science communication honeybadger," Tamar Wilner bites into the #scicomm puff adder issue of what's the goal -- "belief" or "comprehension" -- question for teaching evolution.

This was one of the central questions posed by the Session 10 reading materials.

I've only posted an image of the beginning of her essay-- go to her site to read it.  I will leave the comment field open here, though, in case anyone wants to share their reaction (if they disagree with Tamar, I'm not responsible for her devouring them with the enthusiasm that a honeybadger displays for sticking her face into a hive of angry bees so she can have a nice larvae desert after her main course of poisonous snake).


Science of Science Communication 2.0, Session 12: science journalism!

Okay-- the   "Science of Science Communication 2.0" session all 7,937 real-space & on-line enrolees have been waiting for: science journalism! Reading list here, & study/discussion questions below.



Using likelihood ratios -- not pee values -- to weigh the evidence on judges & motivated reasoning

But here's another excerpt.

This one shows how we supplemented our use of conventional "statistical significance"/NHT testing of the study results with use of Bayesian likelihood ratios.

We did use former, but I think the latter are more useful generally for conveying practical strength of evidence & also for assessing the relative plausibility of competing hypotheses, an objective central to empirical inquiry for which NHT/"statistical significance" is ill-suited (see Goodman, S.N., Introduction to Bayesian methods I: measuring the strength of evidence, Clin Trials 2, 282 - 290 (2005); Edwards, W., Lindman, H. & Savage, L.J., Bayesian Statistical Inference in Psychological Research., Psych Rev 70, 193 - 242 (1963)). Anyone who disagrees that Likelihood ratios are cool is a Marxist!

Oh, BTW: "IPCI" refers to "identity-protective cognition impacat," which is the average percentage-point difference in the probability that a subject type (judge, lawyer, student, member of the public, or house pet) would be to find a statutory violation when doing so affirmed rather than defied his or her cultural worldview.

* * *

c. Judges vs. members of the public using Bayesian methods. As an alternative to assessing the improbability of the “null hypothesis,” one can use Bayesian methods to assess the strength of the evidence in relation to competing hypothesized IPCIs.  Under Bayes’s Theorem the likelihood ratio reflects how much more consistent an observed outcome is with one hypothesis than a rival one. It is the factor in proportion to which one should adjust one’s assessment of the relative probability (expressed in odds) of one hypothesis in relation to another.  

Imagine, for example, that we are shown two opaque canvas bags, labeled “B1”and “B2,” each of which is filled with marbles (we use “canvas bags” for this example in anticipation of the reasonable concern that Bayes’s Theorem might apply only to marble-filled urns).  We are not told which is which, but one bag, it is stipulated, contains 75% red marbles and 25% blue, and the other 75% blue and 25% red. We are instructed to “sample” the contents of the bags by drawing one marble from each, after which we should make our best estimate of the probability that B1 is the bag containing mostly blue marbles and B2 the one containing mostly red.  We extract a blue marble from B1 and a red one from B2.

Bayes’s Theorem furnishes logical instructions on how to use this “new evidence” to revise our estimates of the probability of the hypothesis that B1 is the bag containing mostly blue marbles (and hence B2 mostly red).  If we assume that that hypothesis is true, then the probability that we would have drawn a blue marble from B1 is 3/4 or 0.75, as is the probability that we would have drawn a red marble from B2.  The joint probability of these independent events—that is, the probability of the two occurring together, as they did—is 3/4 x 3/4 or 9/16. If we assume that hypothesis “B1 is the one that contains mostly blue marbles” is false, then the joint probability of drawing a blue marble from B1 followed by a red marble from B2 would be 1/4 x 1/4, or 1/16.  Other possible combinations of colors could have occurred, of course (indeed, there are four possible combinations for such a trial). But if we were to repeat this “experiment” over and over (with the marbles being replaced and the labels on the bags being randomly reassigned after each trial), then we would expect the sequence “blue, red” to occur nine times more often when the bag containing mostly blue marbles is the one labeled “B1” than when it is the bag labeled “B2.” Because “blue, red” is the outcome we observed in our trial, we should revise our estimate of the probability of the hypothesis “B1 contains mostly blue marbles” by a factor of 9—from odds of 1:1 (50%) to 9:1 (90%).

We can use precisely the same logic to assess the relative probability of hypothesized judge and  pubic IPCIs. In effect, one can imagine each subject-type as click me.... click opaque vessel containing some propensity to engage in identity-protective cognition.  The strengths of those propensities—the subject types’ “true” IPCIs—are not amenable to direct inspection, but we can sample observable manifestations of them by performing this study’s statutory interpretation experiment.  Calculating the relative likelihood of the observed results under competing hypotheses, we can construct a likelihood ratio that conveys how much more consistent the evidence is with one hypothesized subject-type IPCI than with another.

Figure 8 illustrates the use of this method to test two competing hypotheses about the public’s “true” IPCI: that members of the public would be 25 percentage points more likely to find a violation when doing so is culturally affirming, and alternatively that they would be only 15 percentage points more likely to do so. To make the rival hypothesis commensurable with the study results, we can represent each as a probability distribution with the predicted IPCI as its mean and a standard error equivalent to the one observed in the experimental results. Within any one such distribution, the relative probability of alternative IPCIs (e.g., 15% and 25%) can be determined by assessing their relative “heights” on that particular curve.  Likewise, the relative probability of observing any particular IPCI under alternative distributions another can be determined by comparing the ratio of the heights for the probability density distributions in question.  

The public IPCI was 22%. The probability of observing such a result (or any in close proximity to it) is eight times more likely under the more extreme “public IPCI = 25%” hypothesis than it is under the more modest “public IPCI = 15%” hypothesis (Figure 8).  This the Bayesian likelihood ratio, or the factor in proportion to which one should modify one’s assessment of the relative probability that the “true” public IPCI is 25 as opposed to 15 percentage points. 

We will use the same process to assess the weight of four competing hypothesis about the vulnerability of judges to identity protective cognition. The first is that judges will be “unaffected” (IPCI = 0%). This prediction, of course, appears similar to the “null hypothesis.” But whereas “null hypothesis testing” purports to specify only whether the null hypothesis can be “rejected,” Bayesian methods can be used to obtain a genuine assessment of the strength of the evidence in support of there being “no effect” if that is a genuine hypothesis of interest, as it is here.  The remaining three hypotheses, the plausibility of which will be tested relative to the “IPCI = 0%” hypothesis are that that judges will be “just as affected as the public” (IPCI = 22%); that judges will be moderately affected (IPCI = 10%); and that judges will be affected to only a comparatively mild degree (IPCI = 5%).

The results are reflected in Figure 9. Not surprisingly, the experimental data are much more supportive of the first hypothesis—that judges would be unaffected by the experimental manipulation—than with the second—that they would be “as affected as much as the public.” Indeed, because the probability that we would have observed the actual experimental result if the latter hypothesis is true are astronomically low, there is little practical value in assigning a likelihood ratio to how much more strongly the evidence supports the hypothesis that judges were “unaffected” by the experimental manipulation.

Of course, members of the public were influenced by their cultural predispositions to a strikingly large extent. To learn that the evidence strongly disfavors the inference that judges are that biased does not in itself give us much insight into whether judges possess the capacity for impartial decisionmaking that their duties demand. It was precisely for that reason that less extreme IPCIs were also hypothesized.

Even those predictions, however, proved to be less supported by the evidence than was the hypothesis that judges would be unaffected by identity-protective reasoning. The evidence was 20 times more consistent with the “judge IPCI = 0” hypothesis than the “judge IPCI = 10%” hypothesis. The weight of the evidence was not as decided but still favored—by a factor of about three—the “judge IPCI = 0” hypothesis over the “judge IPCI = 5%” hypothesis (Figure 9).



Why it is a *mistake* to draw inferences about the (in domain) reasoning of experts from studies of motivated reasoning on the part of the general public

This is an excerpt from “ ‘Ideology’ or ‘Situation Sense’? An Experimental Investigation of Motivated Reasoning and Professional Judgment.” That paper reports the results of a CCP study designed to test whether judges are vulnerable to motivated reasoning.

As described in more detail in a previous post, the answer turned out to be "yes and no": yes when they assessed societal risks like climate change and marijuana legalization, on which judges, like members of the public, polarized along cultural lines; but no when those same judges analyzed statutory interpretation problems that were designed to and did trigger ideologically biased reasoning in members of the public who shared those judges' values.

This excerpt discusses the implications of this finding for the question whether scientists should be viewed as vulnerable to ideologically motivated reasoning when they are making in-domain judgments relating to climate change and other societal risks.


* * *

C. Motivated reasoning, professional judgment & political conflict 

... Sensibly, citizens tend to treat “scientific consensus” on environmental risk and other highly technical matters as a reliable normative guide for decisionmaking, collective and individual. But what makes it sensible for them to do so is that the method of inquiry scientists themselves use does not afford existing “scientific consensus” any particular weight. On the contrary, the entitlement of any previously supported proposition to continued assent is, for science, conditional on its permanent amenability to re-examination and revision in light of new evidence.

If, then, there were reason to believe that scientists themselves were being unconsciously motivated to discount evidence challenging “consensus” positions on issues like climate change, say, or nuclear power or GM foods, by their cultural outlooks, that would be a reason for treating apparent scientific-consensus positions as a less reliable guide for decisionmaking.

Various commentators, including some scientists, now assert that identity-protective reasoning has pervasively distorted the findings of climate scientists, making their conclusions, as reflected in reports like those issued by the Intergovernmental Panel on Climate Change, the National Academy of Sciences, and the Royal Society, unreliable.

Obviously, the best way to test this claim is by conducting valid empirical studies of the scientists whose findings on risk or other policy-relevant facts are being challenged on this basis. But we believe our study, although confined to judges and lawyers, furnishes at least some evidence for discounting the likelihood of the hypothesis that climate scientists or other comparable experts are being influenced by identity-protective reasoning.

The reason is the connection between our study results and the theory of professional judgment on which the study was founded.

As explained, the theoretical basis for our study design and hypotheses was the account of professional judgment most conspicuously associated with the work of Howard Margolis. Margolis treats professional judgment as consisting in the acquisition of specialized prototypes that enable those possessing the relevant form of expertise to converge on the recognition of phenomena of consequence to their special decisionmaking responsibilities.

Margolis used this account of professional judgment among scientists to help explain lay-expert conflicts over environmental risk. Nonexperts necessarily lack the expert prototypes that figure in expert pattern recognition. Nevertheless, members of the public possess other forms of prototypes—ones consisting of what expert judgments look like—that help them to recognize “who knows what about what.” Their adroit use of these prototypes, through the cognitive process of pattern recognition, enables them to reliably converge on what experts know, and thus to get the benefit of it for their own decisionmaking, despite their inability to corroborate (or even genuinely comprehend) that knowledge for themselves.

Nevertheless, in Margolis’s scheme, the bridging function that these “expertise prototypes” play in connecting lay judgments to expert ones can be disrupted. Such sources of disruption create fissures between expert and lay judgment and resulting forms of public conflict over environmental risk.

Identity-protective cognition can be understood to be a disrupting influence of this character. When a fact subject to expert judgment (Is the earth heating up and are humans causing that? Does permitting citizens to carry handguns in public make crime rates go up or down? Does the HPV vaccine protect adolescent girls from a cancer-causing disease—or lull them into sexual promiscuity that increases their risk of pregnancy or other STDs?) becomes entangled in antagonistic cultural meanings, positions on that fact can become transformed into badges of membership in and loyalty to opposing groups. At that point the stake people have in protecting their status in their group will compete with, and likely overwhelm, the one they have in forming perceptions that align with expert judgments.

As we have noted, there is a striking affinity between the account Margolis gives of pattern recognition in expert judgment among scientists and other professionals and Karl Llewellyn’s account of “situation sense” as a professionalized recognition capacity that enables lawyers and judges to converge on appropriate legal outcomes despite the indeterminacy of formal legal rules. We would surmise, based on this study and previous ones, a parallel account of public conflict over judicial decisions.

Lacking lawyers’ “situation sense,” members of the public will not reliably be able to make sense of the application of legal rules. But members of the public will presumably have acquired lay prototypes that enable them, most of the time anyway, to recognize the validity of legal decisions despite their own inability to verify their correctness or comprehend their relationship to relevant sources of legal authority.

But just like their capacity to recognize the validity of scientific expert judgments, the public’s capacity to recognize the validity of expert legal determinations will be vulnerable to conditions that excite identity-protective reasoning. When that happens, culturally diverse citizens will experience disagreement and conflict over legal determinations that do not generate such disagreement among legal decisionmakers.

This was the basic theoretical account that informed our study. It was the basis for our prediction that judges, as experts possessing professional judgment, would be largely immune to identity-protective cognition when making in-domain decisions. By access to their stock of shared prototypes, judges and lawyers could be expected to reliably attend only to the legally pertinent aspects of controversies and disregard the impertinent ones that predictably generate identity-protective cognition in members of the public—and thus resist cultural polarization themselves in their expert determinations. That is exactly the result we found in the study.

Because this result was derived from and corroborates surmises about a more general account of the relationship between identity-protective reasoning and professional judgment, it seems reasonable to imagine that the same relationship between the two would be observed among other types of experts, including scientists studying climate change and other societal risks. Public conflict over climate change and like issues, on this account, reflects a reasoning distortion peculiar to those who lack access to the prototypes or patterns that enable experts to see how particular problems should be solved. But since the experts do possess access to those prototypes, their reasoning, one would thus predict, is immune to this same form of disruption when they are making in-domain decisions.

This is the basis for our conclusion that the current study furnishes reason for discounting the assertion that scientists and other risk-assessment experts should be distrusted because of their vulnerability to identity-protective cognition. Discount does not mean dismiss, of course. Any judgment anyone forms on the basis of this study would obviously be subject to revision on the basis of evidence of even stronger probative value—the strongest, again, being the results of a study of the relevant class of professionals.

At a minimum, though, this study shows that existing work of the impact of identity-protective cognition on members of the public has no probative value in assessing whether the in-domain judgments of climate scientists or other risk-assessment professionals is being distorted by this form of bias. Generalizing from studies of members of the public to these experts would reflect the same question-begging mistake as generalizing from such studies to judges. The results of this study help to illustrate that commentators who rely on experiments involving general-public samples to infer that judges are influenced by identity-protective cognition are making a mistake. Those who rely on how members of the public reason to draw inferences about the in-domain judgments of scientists are making one, too.

* * *

Now here is one more thing that is worth noting & that is noted (but perhaps not stressed enough) in a portion of the paper not excerpted here: the conclusion that professional judgment insultates experts from identity-protective cognition (the species of motivated reasoning associated with ideologically biaed information processing) either in whole or in part does not mean that those experts are not subject to other cognitive biases that might distort their judgments in distinct or even closely analogous ways! There is a rich literature on this.  For a really great example, see Koehler, J.J., The Influence of Prior Beliefs on Scientific Judgments of Evidence Quality. Org. Behavior & Human Decision Processes 56, 28-55 (1993)

Dynamics of cognition need to be considered with appropriate specificity--at least if the goal is to be clear and to figure out what is actually going on.



So are humans very good at designing computer programs to predict Supreme Court decisions? You tell me!

I’ve posted previously about the quality of “computer models” developed by political scientists for predicting judicial decisions by the U.S. Supreme Court. So this is, in part, an update, in which I report what I’ve learned since then.

As explained previously, the models are integral to the empirical proof that these scholars offer in favor of their hypothesis that judicial decisoinmaking generally is driven by “ideology” rather than “law.”

That proof is “observational” in nature—i.e., it relies not on experiments but on correlational analyses that relate case outcomes to various “independent variables” or predictors.  Those predictors, of course, include “ideology” (measured variously by the party of the President who appointed the sitting judges, the composition of the Senate that conferred them, and, in the case of the Supreme Court, the Justices’ own subsequent voting records) the “statistical significance” of which, “controlling for” the other predictors, is thought to corroborate the hypothesis that judges are indeed relying on “ideology” rather than “law” in making decisions.

Commentators have raised lots of pretty serious objections going to the internal validity of these studies. Among the difficulties are sampling biases arising from the decision of litigants to file or not file cases (Kasteller & Lax 2008), and outcome “coding” decisions that (it is said) inappropriately count as illicit “ideological” influences what actually are perfectly legitimate differences of opinion among judges over which legally relevant considerations should be controlling in particular areas of law (Edwards & Livermore 2008; Shapiro 2009, 2010).

But the main issue that concerns me is the external validity of these studies: they don’t, it seems to me, predict case outcomes very well at all.

That was the point of my previous post.  In it,  I noted the inexplicable failure of scholars and commentators to recognize that a computer model that beat a group of supposed “experts” in a widely heralded (e.g., Ayers 2007) contest to predict Supreme Court decisions (Ruger et al. 2004) itself  failed to do better than chance.

It’s kind of astonishing, actually, but the reason that this evaded notice is that the scholars and commentators either didn’t get or didn’t get the significance of the (well known!) fact that the U.S. Supreme Court, which has a discretionary docket, reverses (i.e., overturns the decision of the lower court) in well over 50% of the cases. 

Under these circumstances, it is a mistake (plain and simple) to gauge the predictive power of the model by assessing whether it does better than “tossing a coin.” 

Because it is already known that the process in question disproportionately favors one outcome, the model, to have any value, has to outperform someone who simply chooses the most likely outcome—here, reverse—in all cases (Long 1997; Pampel 2000)

The greater-than 50% predictive success rate of following that obvious strategy is how well someone could be expected to do by chance. Anyone who randomly varied her decisions between “reverse” and “affirm” would do worse than chance—just like the non-expert law professors who got their asses whupped by the computer, who I have in fact befriended and learned is named  “Lexy," in the widely (and embarrassingly!) heralded contest.

The problem, as I pointed out in the post, is that Lexy’s “75%” success rate (as compared to the “expert’s” 59%) was significantly better-- practically or statistically (“p = 0.58”) -- from the 72% reversal rate for the SCOTUS Term in question.

A non-expert who had the sense to recognize that she was no expert would have correctly “predicted” 49 of the 68 decisions that year, just two fewer than Lexy managed to predict.

I was moved to write the post by an recent recounting of Lexy’s triumph in, but I figured that surely in the intervening years—the contest was 13 yrs ago!—the field would have made some advances.

A couple of scholars in the area happily guided me to a cool working paper by Katz, Bommarito & Blackmun (2014), who indeed demonstrate the considerable progress that this form of research has made.

KBB discuss the performance of  a model, whose name I’ve learned (from communication with that computer, whom I met while playing on-line poker against her) is Lexy2.

Lexy2 was fed a diet of several hundred cases decided from 1946 to 1953 (her “training set”), and then turned loose to “predict” the outcomes in 7000 more  cases from the years 1953 to 2013 (technically, that’s “retrodiction,” but same thing, since no one “told” Lexy2 how those cases came out before she guessed; they weren’t part of her training set).

Lexy2 got 70% of the case outcomes right over that time. 

KBB, to their credit (and my relief; I found it disorienting, frankly, that so many scholars seemed to be overlooking the obvious failure of Lexy1 in the big “showdown” against the “experts”), focus considerable attention on the difference between Lexy2’s predictive-success rate and the Court’s reversal rate, which they report was 60% over the period in question.

Their working paper (which is under review somewhere and so will surely be even more illuminating still when it is published) includes some really cool graphics, two of which I’ve superimposed to illustrate the true predictive value of Lexy2:

As can be seen, variability in Lexy2’s predictive success rate ("KBB" in the graphic) is driven largely by variability in the Court’s reversal rate.

Still  70% vs. 60% is a “statistically significant” difference—but with 7000+ observations, pretty much anything even 1% different from 60% would be. 

The real question  is whether the 10-percentage-point margin over chance is practically significant. 

(Of course, it's also worth pointing out that trends in the reversal rate should be incorporated into evaluation of Lexy2's performance so we can be sure her success in periods when reversal might have been persistently less frequent doesn't subsidize for predictive failure during periods when the reversal rate was persistently higher; impossible to say from eyeballing, but it kind of looks like Lexy2 did better before 1988, when the Court still had a considerable mandatory appellate jurisdiction. than it has done with today's wholly discretionary one. But leave that aside for now.)

How should we assess the practical siginificance of Lexy2's predictive acumen?

If it helps, one way to think about it  is that Lexy2 in effect correctly predicted “25%” of the cases (10% of the 40% “affirmed” cases) that “Mr. Non-expert,” who would have wisely predicted "reverse" in all cases, would have missed. Called “adjusted count R2 ,” this is a logistic regression equivalent of R2 for linear regression.

But I think an even more interesting way to gauge Lexy2’s performance this is to compare it to the original Lexy’s.

As I noted, Lexy didn’t genuinely do better than chance.

Lexy2 did, but the comparison is not really fair to the original Lexy.

Lexy2 got to compete against "Mr. Chance" (the guy who predicts reverse in every case) for 60 terms, during which the average number of decisions was 128 cases as compared to 68 in the the single term in which Lexy competed. Lexy2 thus had a much more substantial period to prove her mettle!

So one thing we can do is see how well we'd expect Lexy2 to perform against Mr. Chance in an "average" Supreme Court Term.  

Using the 0.60 reverse rate KBB report for their prediction (or retrodiction) sample & the 0.70 prediction-success rate they report for Lexy2, I simulated 5000 "75-decision" Terms--75 being about average for the modern Supreme Court, which is very lazy in historical terms.

Here's a graphic summary of the resuts:

In the 5000 simulated 75-decision Terms, Lexy2 beats Mr. Chance in 88%. In other words, the odds are a bit better than 7:1 that in a given Term Lexy2 will rack up a score of correct predictions that exceeds Mr. Chance's by at least 1

But what if we want (for bookmaing purposes, say) to determine the spread -- that is the margin by which Lexy2 will defeat Mr. Chance in a given term? 

Remember that Lexy "won" against Mr. Chance in their one contest, but by a pretty unimpressive 3 percentage points (which with N = 68 was, of course, not even close to "significant"). 

If we look at the the distribution of outcomes in 5000 simulated 75-decision terms, Lexy2 beats Mr. Chance by 10% in 50% of the 75-decision terms & fails to beat Mr. Chance by at least 10% in 50%. Not suprising; something would definitely be wrong with the simulation if matters were otherwise! But in any given term, then, Lexy2 is "even money" at +10 pct. 

The odds of Lexy2 winning by 5% or more over Mr. Chance (4 more correct predictions in a 75-decison Term) are around 3:1.  That is, in about 75% (73.9% to be meaninglessly more exact) of the 75-decision Supreme Court Terms, Lexy2 wins by at least +5 pct.   

The odds are about 3:1 against Lexy2 beating Mr. Chance by 15 pct points. 

Obviously the odds are higher than 3:1 that Lexy2 will eclipse the 3-pct-point win eked out by the original Lexy in her single contest with Mr. Chance. The odds of that are, according to this simulation, about 5:1. 

But what if we want to test the relative strength of the competing hypotheses (a) that “Lexy 2 is no better than the original 9001 series Lexy” and (b) that Lexy2 enjoys, oh, a “5-pct point advantage over Lexy” in a 75-decision term? 

To do that, we have to figure out the relative likelihood of the observed data-- that is, the results reported in KBB -- under the competing hypotheses.  Can we do that?  

Well, consider:


Here I've juxtaposed the probability distributions associated with  the "Lexy2 is no better than Lexy" hypothesis and  the "Lexy2 will outperform Lexy by 5 pct points" hypothesis. 

The proponents of those hypotheses are asserting that on average Lexy2 will beat “Mr. Chance” by 3%, Lexy’s advantage in her single term of competition, and 8% (+5% more), respectively, in a 75-decision term. 

Those “averages” are means that sit atop of probability distributions characterized by standard errors of 0.09, which is by my calculation (corroborated, happily!, by the simulation) of the difference in success rates for both 0.72 and 0.75, on the one hand, and 0.60, on the other, for a 75-decision Term. 

The ratio of the densities at 0.10, the observed data, for the "Lexy2 +5 " hypothesis & the "Lexy2 no better"  hypothesis is 1.2.  That's the equivalent of the Bayesian likelihood ratio, or the factor by which we should update our prior odds of Hypothesis 2 rather than Hypothesis 1 being correct (Goodman 1999, 2005; Edwards, Lindman & Savage 1963; Good 1985). 

That's close enough to a likelihood ratio of 1 to justify the conclusion that the evidence is really just as consistent with both hypotheses –“ Lexy2 is no better,” and “Lexy2  +5 over Lexy.” 

Is this “Bayes factor” (Goodman 1999, 2005) approach the right way to assess things? 

I’m not 100% sure, of course, but this is how I see things for now, subject to revision, of course, if someone shows me that I made a mistake or that there is a better way to think about this problem. 

In any case, the assessment has caused me to revise upward my estimation of the ability of Lexy!  I really have no good reason to think Lexy isn’t just as good as Lexy2.  Indeed, it’s not 100% clear from the graphics in KBB, but it looks to me that Lexy's 75% "prediction success" rate probably exceeded that of Lexy2 in 2002-03, the one year in which Lexy competed! 

At the same time, this analysis makes me think a tad bit less than I initially did of ability of Lexy2 (& only a tad; it's obviously an admirable thinking machine). 

Again, Lexy2, despite “outperforming” Mr. Chance by +10 pct over 60 terms, shouldn’t be expected to do any better than the original Lexy in any given Term. 

More importantly, being only a 7:1 favorite to beat chance by at least a single decision, & only a 3:1 favorite to beat chance by 4 decisions or more (+5%), in an average 75-decision Term just doesn’t strike me as super impressive. 

Or in any case, if that is what the political scientists’ “we’ve proven it: judges are ideological!” claim comes down to, it’s kind of underwhelming.  

I mean, shouldn’t we see stronger evidence of an effect stronger than that? Especially for the U.S. Supreme Court, which people understandably suspect of being “more political” than all the other courts that political scientists also purport to find are deciding cases on an ideological basis? 

It’s a result that’s sufficiently borderline, I’d say, to need help from another form of testing—like an experiment. 

No empirical method is perfect.  They are all strategies for conjuring observable proxies of process that in fact we cannot observe directly. 

Accordingly, the only “gold standard,” methodologically speaking, is convergent validity: when multiple (valid) methods reinforce one another, then we can more confident in all of them; if they don’t agree, then we should wary about picking just one as better than another. 

The quest for convergent validity was one of the central motivations for our study—discussed in my post “yesterday” to probe the “ideology thesis”—the political science conclusion, based on observational studies—via experimental methods. 

That our study (Kahan, Hoffman, Evans, Lucci, Devins & Cheng in press) came to a result so decidedly unsupportive of the claim that judges are ideologically biased in their reasoning reinforces my conclusion that the evidence observational researchers have come up with so far doesn’t add much to whatever grounds one otherwise would have had for believing that judges are or are not “neutral umpires.” 

But I'm really not sure.  What do you think?


 Ayres, I. How computers routed the experts. Financial Times ‘FT Magazine,’ Apr. 31, 2007.

Edwards, H.T. & Livermore, M.A. Pitfalls of empirical studies that attempt to understand the factors affecting appellate decisionmaking. Duke LJ 58, 1895 (2008).

Edwards, W., Lindman, H. & Savage, L.J. Bayesian Statistical Inference in Psychological Research. Psych Rev 70, 193 - 242 (1963).

Good, I.J. Weight of evidence: A brief survey. in Bayesian statistics 2: Proceedings of the Second Valencia International Meeting (ed. J.M. Bernardo, M.H. DeGroot, D.V. Lindley & A.F.M. Smith) 249-270 (Elsevier, North-Holland, 1985).

Goodman, S.N. Introduction to Bayesian methods I: measuring the strength of evidence. Clin Trials 2, 282 - 290 (2005).

Goodman, S.N. Toward evidence-based medical statistics. 2: The Bayes factor. Annals of internal medicine 130, 1005-1013 (1999).

Kahan, Hoffman, Evans, Lucci, Devins & Cheng. “Ideology” or “Situation Sense”: An Experimental Investigation of Motivated Reasoning and Professional Judgment. U. Penn. L. Rev. (in press).

Kastellec, J.P. & Lax, J.R. Case selection and the study of judicial politics. Journal of Empirical Legal Studies 5, 407-446 (2008).

Katz, Daniel Martin and Bommarito, Michael James and Blackman, Josh, Predicting the Behavior of the Supreme Court of the United States: A General Approach (July 21, 2014). Available at SSRN: or

Long, J.S. Regression models for categorical and limited dependent variables (Sage Publications, Thousand Oaks, 1997).

Pampel, F.C. Logistic regression : a primer (Sage Publications, Thousand Oaks, Calif., 2000).

Shapiro, C. Coding Complexity: Bringing Law to the Empirical Analysis of the Supreme Court. Hastings Law Journal 60 (2009).

Shapiro, C. The Context of Ideology: Law, Politics, and Empirical Legal Scholarship. Missouri Law Review 75 (2010).


Page 1 ... 7 8 9 10 11 ... 37 Next 20 Entries »