follow CCP

Recent blog entries
popular papers

Science Curiosity and Political Information Processing

What Is the "Science of Science Communication"?

Climate-Science Communication and the Measurement Problem

Ideology, Motivated Cognition, and Cognitive Reflection: An Experimental Study

'Ideology' or 'Situation Sense'? An Experimental Investigation of Motivated Reasoning and Professional Judgment

A Risky Science Communication Environment for Vaccines

Motivated Numeracy and Enlightened Self-Government

Making Climate Science Communication Evidence-based—All the Way Down 

Neutral Principles, Motivated Cognition, and Some Problems for Constitutional Law 

Cultural Cognition of Scientific Consensus

The Tragedy of the Risk-Perception Commons: Science Literacy and Climate Change

"They Saw a Protest": Cognitive Illiberalism and the Speech-Conduct Distinction 

Geoengineering and the Science Communication Environment: a Cross-Cultural Experiment

Fixing the Communications Failure

Why We Are Poles Apart on Climate Change

The Cognitively Illiberal State 

Who Fears the HPV Vaccine, Who Doesn't, and Why? An Experimental Study

Cultural Cognition of the Risks and Benefits of Nanotechnology

Whose Eyes Are You Going to Believe? An Empirical Examination of Scott v. Harris

Cultural Cognition and Public Policy

Culture, Cognition, and Consent: Who Perceives What, and Why, in "Acquaintance Rape" Cases

Culture and Identity-Protective Cognition: Explaining the White Male Effect

Fear of Democracy: A Cultural Evaluation of Sunstein on Risk

Cultural Cognition as a Conception of the Cultural Theory of Risk


Lack of discriminant validity saga #9312 (or "Let's just make this blog into a blog on Gelman's blog!," episode #612)

And since we are on the topic (of lack of discriminant validity): "disgust sensitivity" is correlated "significantly" not only w/ fear of GM food but also w/ fear of plummeting elevators, crashing airplanes, accidental swim pool drownings, & life-threatening carjackings ...

Who'd have thunk it!


Is fit-statistic anarchy the answer to tyranny of the p-value?

Now we are getting somewhere!

But note how much weight this proposal places on (or how much confidence it expresses in) the inferential literacy of referees. If we cut social science loose from the p-value in favor of the gestalt judgment of reviewers and editors, what's to prevent a dictatorship of confirmation bias?


Cross-cultural cultural cognition's latest conquest: Slovakia!

An interesting article on "emerging technology" risk perceptions, this paper also joins the ranks of ones reporting the application of the Cultural Cognition Worldview scales to non-US samples. In addition to the US, studies based on these measures have been carried out in England, Switzerland, Australia, Norway, ... Am I forgetting any others? Probably. If another comes to me, I'll modify the list.

The paper examined risk perceptions of both nanotechnology and the HPV vaccine.  One of the studies tested for biased assimilation--by examining whether information exposure generated polarization (cf. Kahan et al. 2009). Another looked at how culturally identifiable advocates influenced credibility (cf. Kahan et al. 2010).

There were robust cultural worldview effects in both studies.  The "cultural credibility" effect was also replicated (sadly, though, the article has only minimal discussion of how the authors created "culturally identifiable" advocates, nor did they reproduces the stimulus material used to do so). There wasn't a "culturally biased assimilation" effect, however.

The results in Kostovičová et al. suggested a good deal of U.S-Slovakia correspondence on the impact of cultural worldviews on the risks examined, but not a perfect one.

Actually, no one should be surprised if the results of studies on non-US samples differ from the ones performed on US samples.  As I've argued before, there's nothing in the theory of Cultural Cognition that compels inter-cultural uniformity on risk/worldview mappings; the theory predicts there will be conflicts among competing cultural groups, but anticipates that the issues that provoke such conflict will vary across societies in a manner that reflects their distinctive histories. Indeed, a large part of the value of "C4" (cross-cultural cultural cognition) is that it equips researchers with a metric for examining such differences.

The paper also reports a bunch of interesting findings on the interaction of worldviews and characteristics such as gender and prior familiarity with the risk being analyzed.

Pretty cool stuff!

Take a look & see what you think.


Kahan, D. M., Braman, D., Slovic, P., Gastil, J., & Cohen, G. (2009). Cultural Cognition of the Risks and Benefits of Nanotechnology. Nature Nanotechnology, 4(2), 87-91.

Kahan, D., Braman, D., Cohen, G., Gastil, J., & Slovic, P. (2010). Who Fears the HPV Vaccine, Who Doesn’t, and Why? An Experimental Study of the Mechanisms of Cultural Cognition. Law and Human Behavior, 34(6), 501-516. doi:10.1007/s10979-009-9201-0

Kostovičová, L., Bašnáková, J., & Bačová, V. (2017). Predicting Perception of Risks and Benefits within Novel Domains. Studia Psychologica, 59(3), 176-192.


How should I be updating views on impact of fake news based on new evidence?

So . . . here are my “fake-news priors,” which are informed by the study of cultural cognition & affiliated types of politically motivated reasoning, and which are spelled out at (slightly) greater length in a paper entitled, “Misconceptions, Misinformation, and the Logic of Identity-Protective Cognition”:

My competing "models"A great deal of if not all the time, misinformation is not something that happens to the mass public but rather something that its members are complicit in producing as a result of identity-protective cognition. Persons using this mode of reasoning are not trying to form an accurate understanding of the facts in support of a decision that can be made only with the benefit of the best available evidence. Instead they are using their reasoning to cultivate an affective stance that expresses their identity and their solidarity with others who share their commitments (Kahan 2015, 2017). Individuals are quite able to accomplish this aim by selectively crediting and dismissing genuine information. Yet the same mechanisms of information processing will also impel them to credit misinformation suited to gratifying their identity-expressive aims.

Will the motivated public’s attraction to misinformation change the world in any particular way? It no doubt has (Flynn, Nyhan & Reifler 2017). But precisely because individuals’ cultural predispositions exist independently of, and are cognitively prior to, the misinformation they consume for identity-protective purposes (§ 2, supra), what these individuals do with misinformation in most circumstances will not differ from what they would have done without it.

But here are a couple of empirical studies that address incidence and effect of fake news.


My question is, how should I revise my priors based on these studies & by how much? What sort of likelihood ratios should I assign them, bearing in mind that the entire exercise is in the nature of a heuristic, designed to discipline and extend thoughts & inferences?


Allcott, H., & Gentzkow, M. (2017). Media and Fake News in the 2016 Election. J. Econ. Perspectives, 31, 211-236.

Flynn, D. J., Nyhan, B., & Reifler, J. (2017). The Nature and Origins of Misperceptions: Understanding False and Unsupported Beliefs About Politics. Political Psychology, 38, 127-150. doi:10.1111/pops.12394

Kahan, D. M. (2015). Climate-Science Communication and the Measurement Problem. Advances in Political Psychology, 36, 1-43. doi:10.1111/pops.12244

Kahan, D. M. (2017). The expressive rationality of inaccurate perceptions. Behavioral and Brain Sciences, 40. doi:10.1017/S0140525X15002332

Kahan, Dan M., Misconceptions, Misinformation, and the Logic of Identity-Protective Cognition (May 24, 2017). Available at SSRN:

Pennycook, Gordon and Rand, David G., Who Falls for Fake News? The Roles of Analytic Thinking, Motivated Reasoning, Political Ideology, and Bullshit Receptivity (September 12, 2017). Available at SSRN:


97% (p < 0.01) of social scientists don't agree on p-value threshold

This pre-print responds to the recent Nature Human Behvior article/manifesto (pre-print here) that recommended a "change to P< 0.005" be implemented in "fields where the threshold for defining statistical significance for new discoveries is [now] P < 0.05":

Notwithstanding the conservative, lawyerly tone of the piece ("insufficient evidence ... not strong enough ... evaluated before large-scale changes"), the radical bottom line is in the bottom line: there shouldn't be any single standard for "significance"; rather, researchers should use their reason to identify and explain whatever statistical test they use to guard against type 1 error.

Indeed, if one wants to see a defense of replacing p-values with Bayesian "weight of the evidence" statistics, one should read (or re-read) the Nature Human Behaviour piece, which pictures the p < 0.005 standard as a self-punishing, "the worse the better" historical segue to Bayes Factors.  

So embracing Bayes was the cost of getting 72 scholars to agree to continuing the tyranny of p-values, while disclaiming Bayes was the cost of getting another 88 to agree that p-values shouldn't be treated as a threshold screen for publication.





Are you curious to see what Financial Times says about curiosity?

cool! And if you now want to read the study he was referring to, it's right here (no paywall!)

The conservation of perplexity . . .

Every time one feels one has made progress by examining an important question empirically, at least one more important, unanswered empirical question reveals itself.


WSMD? JA! Various summary stats on disgust and gm food risks

This is approximately the 9,999th episode in the insanely popular CCP series, "Wanna see more data? Just ask!," the game in which commentators compete for world-wide recognition and fame by proposing amazingly clever hypotheses that can be tested by re-analyzing data collected in one or another CCP study. For "WSMD?, JA!" rules and conditions (including the mandatory release from defamation claims), click here.

So . . . new CCP subscriber @Zach (membership # 14,000,000,041) has asked for some summary statistics on various of the relationships modeled in “Yesterday’s”™ post on “biased assimilation, disgust, & ideology”. The queries seem aimed at a distinctive interpretation (or in any case, an interpretation; no one else has offered any!) of the data presented in the previous post.

Therefore, I’ll supply the data he’s requested, as I understand the requests:

@Zach:  What does the Disgust (z-score) vs Left_right plot look like for GM foods for your sample? I don't see it in either your previous post on the subject or your working paper (from the left panel of Fig 4 in your paper I would guess it's flat). 

I’m understanding this to mean, What would does the distribution of z-scored responses look like for “how disgusted are you with GM foods?” (6-point: “not at all” to “extremely”). This is a simple one:


It should be obvious, then, that there’s no partisan influence on disgust toward GM foods. No surprise!

@Zach:  For interpreting this data, it might be useful to see the exact distribution of Disgust ratings ("absolute" Disgust) used to generate the Disgust (z-score). It looks like it's asymmetrical, but it would be good to see how much.

Here I think @Zach is asking to see the frequencies with which each of the “disgust” response categories were selected (I’m reading “asymmetrical” to mean skewed); also “absolute disgust” to mean “normally distributed.”)  Again, not too hard a request to satisfy:”

Next @Zach states,

Similar to [above], it might be interesting to see a version of these plots with an absolute x-scale (e.g. Disgust in units of the first figure in your previous post). Are there trends with "absolute" Disgust and how quickly the lines for the two study assessments deviate?

I’m not 100% sure what @Zach has in mind here. . . . Does he want to see the distributions featured in his first request after responses to “disgust” are transformed back from z-scores to raw scores?  There’s nothing interesting to see in that case: the distribution is the same whether “disgust” is presented in raw form or the z-score transformed one.

But @Zach might be suspicious of the “smoothness” of the regression analyses featured in “Yesterday’s”™ post. The linear regression constrains the variance to appear linear when maybe it really wasn’t in raw form—in which case the linear model of the impact of disgust on GM food concerns would be misspecified. So here is a locally weighted regression plot:

 What does this (on its own or in combination with the other bit of information presented here) signify?  I’m not sure! But @Zach apparently had a hypothesis here, albeit one not completely spelled out, about what this way of reporting the “raw data” would look like.  So I’ll leave it to him & interested others to spell out their interpretations here.

Oh -- @Zach gestures toward his answer—

To combine 2 & 3, if the distribution of "absolute" Disgust is asymmetrical and weighted towards neutral, does that help explain how close the two "safe" and "not safe" branches stick together at low Disgust (z-score)? I.e. the opinion may be extreme for the sample, but on average the person still isn't too disgusted with GM foods?

Does @Zach see this in the data prestened in this post? If so, what’s the upshot? Same if the distributions defy his surmise—what additional insight can we derive, if any, from these distributions?


More from ongoing investigation of biased assimilation, disgust, & ideology

"Yesterday," I served up some data on the relationship between disgust and ideological outlooks. The findings were relevant to assessing whether disgust sensibilities mediate only conservative or instead both conservative and liberal appraisals of empirical evidence of the riskiness of behavior that offends their respective values.

Here are some related data.

Study essentials:
  1. Subjects exposed to disgust stimulus (one that shows target of disgust judgment in vivid display).
  2. Subjects then instructed to rate relative persuasiveness of pairs of risk-perception studies that use different methods & reach opposing results.
  3. The subjects' evaluation of the relative quality of methods of studies are then measured conditional on manipulated conclusion (“not safe”/“safe”) of studies
.Results analyzed separately in terms of political outlooks & GM food-disgust rating:



The Stockholm syndrome in action: I find Lodge view more persuasive as 3-day conference goes on

Back from Stockholm. Here’s a delayed postcard:

So in my talk, I presented 4 points—

--aided with discussion of 2 CCP studies (Kahan, Peters et al. 2017; Kahan, D.,  Landrum, A., et al 2017) (slides here).

As previously mentioned, Milton Lodge was among the collection of great scholars who participated in the Wenner-Gren Foundation’s “Knowledge resistance and how to cure it“ symposium. (Lodge also gets conference “outstanding teacher” award for conducting a tag-team-style presentation with one of his students, who did a great job).

I had the honor of being on the same panel as Lodge, who summarized his & Taber’s own body of research (2013) on politically motivated reasoning.  Lodge definitely understood the thrust of my remarks (likely aided by reading it in various forms elsewhere) meant that he and I “had a disagreement.”

That disagreement boils down to how we should view the complicity of “System 2” reasoning in politically distorted information processing. Lodge & Taber (2013) push hard the view that once a partisan has been endowed with motivations that run in one direction or the other, it’s confirmation bias—a system 1 mechanism—that does all the distorting of information processing.

My & my collaborators’ position, in contrast, is that individuals who are high in System 2 reasoning have a more fine-tuned “System 1” reasoning capacity that unconsciously discerns the types of situations in which the use of “System 2” need to be brought to bear to solve an information-processing problem in politically congenial terms. Once engaged, partisans’ “System 2” will generate decision-making confabulations for dismissing evidence that blocks the result they are predisposed to accept.

We had a very brief exchange on this in connection with the motivated numeracy (MN) paper.  Persuaded to an extent by what Lodge was saying, I agreed that the MN result would likely be as consistent with his position as with ours if the result was a consequence of high-numeracy subjects “tuning out” and lapsing into congenial heuristic reasoning when confronted with information that, improperly interpreted, supported positions on gun control at odds with subjects’ political affiliations & outlooks.

In contrast, the study results would lean our way if the high-numeracy subjects were being alerted by unconscious System 1 sensibilities to use System 2 to rationalize away information that they did recognize as contrary to their political predispositions.

I think on reflection that  the design of the MN study doesn’t furnish a lot of light on which interpretation is correct.

But I’d also say that our interpretation—that highly proficient reasoners were using their cognitive advantage to reject evidence as flawed when it challenged their viewpoint-- was consistent with other papers that examined motivated system 2 reasoning (including Kahan, 2013).

Anyway, it takes only one thoughtful engagement of this sort to make a 3-day conference worthwhile.  And this time I was lucky enough to be involved in more than one, thanks to the conference organizers who really did a great job.


Kahan, D.M. Ideology, Motivated Reasoning, and Cognitive Reflection. Judgment and Decision Making 8, 407-424 (2013).

Kahan, D.M., Landrum, A., Carpenter, K., Helft, L. & Hall Jamieson, K. Science Curiosity and Political Information Processing. Political Psychology 38, 179-199 (2017).

Kahan, D.M., Peters, E., Dawson, E.C. & Slovic, P. Motivated numeracy and enlightened self-government. Behavioural Public Policy 1, 54-86 (2017).

Lodge, M. & Taber, C.S. The rationalizing voter (Cambridge University Press, Cambridge ; New York, 2013).


Precis for Clarendon lectures this Nov. at Oxford

Should all sound familiar to 14 billion regular subscribers.

“Cognition, freedom, and truth in the liberal state”


This series of lectures will use the laws of cognition to cast a critical eye on the cognition of law. Using experimental data, statistical models, and other sources, the lectures will probe how legal decisionmakers perceive facts and law and how the public perceives what legal decisionmakers are doing.  The unifying theme of the lectures will be that simply doing impartial law is insufficient to communicate the law’s impartiality to those who must obey it, and hence insufficient to deliver the assurance of neutrality on which the law’s legitimacy depends.  The lecture series will propose a new science of law, the aim of which is to endow law with the resources necessary to bridge this critical gap between professional and lay perspectives.

Lecture I: Laws of cognition and the “neutrality communication” problem

This lecture will present a simple model for systematizing the interaction between mechanisms of cognition and legal decisionmaking (cf. Kahan 2015).  It will then use the model to examine one such mechanism: cultural cognition.  The research in this area, I would argue, furnishes reasonable grounds to suspect that legal decisionmakers—juries, in particular—are vulnerale to biased decisionmaking that undermines the goals of accuracy and liberal neutrality. But even more decisively, the research supports the conclusion that the law lacks the resources (at present, anyway) for communicating accuracy and fairness to culturally diverse citizens, who as a result of cultural cognition will perceive legal decisionmaking to be mistaken and unfair no matter how accurate and impartial it actually is. This is the law’s “neutrality communication problem,” which is akin to science’s “validity communication problem on issues like climate change (cf. Kahan et al. 2012; Kahan 2011; Kahan 2010).

Lecture II: The “rules of evidence” impossibility theorem

This lecture will adopt a critical stance toward a position, dominant in the study of evidence law, that I will call the “cognitive fine-tuning” thesis (CFT).  CFT posits that the recurring decisionmaking miscues associated with bounded rationality—such as hindsight bias, the availability effect, probability neglect, representatives bias, etc.—can be managed through judges’ adroit application of evidence and other procedural rules.  Focusing on “coherence based reasoning” (CBT), I will argue that CFT is a conceit.  CBT refers to a form of “rolling confirmation bias” in which exposure to a compelling piece of evidence triggers the motivation to conform evaluations of the strength of all subsequent, independent pieces of evidence to the position that compelling item of proof supports. Grounded in aversion to residual uncertainty, CBT results in overconfident judgments, and also makes outcomes vulnerable to arbitrary influences such as order of proof (Kahan 2015).  What makes CBT resist CFT is that the triggering mechanism is admittedly valid evidence; indeed, the stronger (more probative) the item of proof is, the more likely it is to trigger the accuracy-distorting confirmation-bias cascade associated with CBT.  Accordingly, to counteract CBT, judges, using “cognitive fine tuning,” would have to exclude the most probative pieces of proof from the case—guaranteeing an outcome that is uniformed by the evidence most essential to an accurate judgment.  Symptomatic of the dilemmas that managing cognitive biases entails, this contradiction exposes the fundamental antagonism between rational truth-seeking and an adversary system that relies on lay factfinders (obviously, this is more an issue in the US than in the UK, which has restricted use of the jury system to criminal cases—although anyone criminal law is exactly the domain in which “the impossibility” of CFT ought to concern us the most, if we value liberty).

Lecture III: Cognitive legal realism: the science of law and professional judgment 

This lecture will offer prescriptions responsive to the difficulties canvassed in the first two.  One of these is the enlargement of the domain of professional judgment in law. Professional judgment consists in habits of mind suited to specialized tasks; one of the core elements of professional judgment is the immunity it confers to various recurring cognitive biases when experts are making in-domain decisions.  Experimental evidence shows that judges are relatively less vulnerable to all manner of bias—including cultural cognition (Kahan et al. in press)—when making legal determinations, both factual and legal.  The congeniality of professional judgment to rational truth-seeking should be maximized by the abandonment not only of the jury (nonprofessionals) but also the adversary system, a mode of evidence development inimical to the dependence of professional judgment on valid methods of information processing. But to supplement the enlargement of professional judgment of law, there must also be a corresponding enlargement in receptivity to evidence-based methods of legal decisionmaking.  The validity of legal professional judgment (even more than its reliability; right now lawyers’ professional judgment is reliable but not valid w/r/t the aims of truth and liberty) depends on its conformity to processes geared to the aims of the law.  Those aims, in a liberal state, are truth and impartiality.  How to attain those ends—and in particular how to devise effective means for communicating the neutrality of genuinely neutral law—present empirical challenges, ones for which the competing conjectures of experienced practitioners need to be tested by the methods of disciplined observation and inference that are the signature of science.  The legal-reform project of the 21st century is to develop a new cognitive legal realism that “brings the culture of science to law” (National Science Foundation 2009).

The end!


Kahan, D. Fixing the Communications Failure. Nature 463, 296-297 (2010).

Kahan, D.M. Laws of cognition and the cognition of law. Cognition 135, 56-60 (2015).

Kahan, D.M. The Supreme Court 2010 Term—Foreword: Neutral Principles, Motivated Cognition, and Some Problems for Constitutional Law Harv. L. Rev. 126, 1-77 (2011).

Kahan, D.M., Hoffman, D.A., Evans, D., Devins, N., Lucci, E.A. & Cheng, K. 'Ideology' or 'Situation Sense'? An Experimental Investigation of Motivated Reasoning and Professional Judgment. U. Pa. L. Rev. 164, 349-438

National Science Foundation. Strengthening Forensic Science in the United States: A Path Forward (National Academies Press, Washington, D.C., 2009).


Disgust and the right-left asymmetry thesis ... some preliminary data

Being a faithful servant of this blog's 14 billion regular readers (there’s another billion or so on any day who are just passing through), I am continuing my study of disgust and its role in conflict over decision-relevant science.

We already have produced one working paper on disgust and perceptions of the risks of GM foods & vaccines all fit together. The answer, I suppose, was “not very well.”

So here’s some new data, collected by CCP & the Annenberg Public Policy Center, to think about.

A couple of yrs ago, there was a debate on this site about whether disgust sensibilities and risk perceptions were concentrated in the right—a kind of asymmetry claim for disgust as a mediator of attitudes toward particular taboo activities.   Well, we decided to try our hand at generating some evidence that might shed light on this question.

First, we measured how disgusted our subjects (drawn from a diverse national panel) were to feel toward certain objects. Take a look:

I think the disgust-asymmetricists would be surprised to see that guns elicit expressions of disgust that are as large as the amount elicited by prositution. Surprised too to see that marijuana is at the bottom of this ranking –but as disgust-eliciting, more or less, as nuclear power plants.

(BTW, take a look at this to help you evaluate whether a self-report measure of disgust is valid, consider Russel, et al. 2013 & Gutierrez et al. 2012 ].)

Next we plotted out these diverse expressions of disgust with left-right political outlooks:

Huh.  Sure looks like disgust symmetry rather than asymmetry.

Finally, we did an experiment in which subjects were furnished empirical studies that variously found these activities to be harmful or inert (even beneficial) in relation to societal wellbeing. We manipulated which one found which conclusion, and measured two separate groups of subjects’ reactions to whether the study in question was sound and persuasive:

Classic biased assimilation (Lord, Ross, Lepper 1979), these results look a lot like what you get when you observe how conventional measures of left-right political outlooks provoke biased information evaluations of these activities. Basically, people are deciding which study they find more persuasisve, less biased etc. based on whether the activity strikes them as disgusting.

Well, this one is still in the shop. But the apparatus that's being constructed there is starting to look recognizable--& recognizably symmetric with respect to ideology.

Thoughts, anyone?


Gutierrez, R., Giner-Sorolla, R. & Vasiljevic, M. Just an anger synonym? Moral context influences predictors of disgust word use. Cognition Emotion 26, 53-64 (2012).

Lord, C.G., Ross, L. & Lepper, M.R. Biased Assimilation and Attitude Polarization - Effects of Prior Theories on Subsequently Considered Evidence. Journal of Personality and Social Psychology 37, 2098-2109 (1979).

Russell, P.S. & Giner-Sorolla, R. Bodily moral disgust: What it is, how it is different from anger, and why it is an unreasoned emotion. Psychological Bulletin 139, 328 (2013).


♪ "I'm goin' to Jackson..." Oh, wait--I meant ♪"goin' to Stockholm"

Started the trip to Stockholm for conference on misinformation on/misunderstanding of science & what to do about the same.

Some really cool researchers will be there.  They include my collaborator Ellen Peters. Also Milton Lodge, whom I've never met in person.  Plus whole bunch more.

I've previewed my remarks in an earlier post, and for once the talk I've prepared matches the abstract I submitted.  But will nevertheless send postcard(s) to describe reaction to what I have to say, & also the interesting points made by others in there presentations.

Don't know if there is a "hash-tag" for twitter but if there is I'll tweet it.

See ya!


Long weekend reading: on MS2R & fake news

Authors find less not more receptivity to fake news among most cognitively proficient (M Turk) subjects. What does this tell us about "motivated system 2 reasoning"(MS2R)?



Weekend update: ". . . replication malpractice . . ."

Sometimes 140 characters (or fewer) convey the essential information as effectively as 2500 words.


How to see replication (protective eyegear required)

This is the key finding from “Rumors of ‘non-replication’. . . Greatly Exaggerated” (Kahan & Peters 2017).


Basically the idea was to display the essential comparative information from the studies in commensurable terms and in a form as economical as possible.

What is the essential information?

Well, remember, in both studies we have separate conditions in which the covariance-detection problem is being solved (click on the inset to refresh your memory of how the problem is set up).

First, there’s the politically neutral skin rash condition, in which, not surprisingly, high-numeracy subjects perform much better than low-numeracy ones.  (Panels (A) and (D)).

Second, there’s the “identity affirmed” condition.  That means that from the point of view of “one side”—either left-leaning subjects or right-leaning ones—the result in the covariance-detection problem, properly interpreted, generates an ideologically congenial answer on the effect of a ban on carrying concealed firearms.

For left-leaning subjects, that result would be that crime increases, whereas for the right-leaning ones, the identity-affirming result would be that crime actually decreases. By aggregating the responses of both right- and left-leaning subjects for whom the experiment produced this result, we can graph the impact of it in one panel for each study—(B) and (E).

Note that in those two panels, the high-numeracy subjects continue to outperform the low-numeracy ones. In short, high-numeracy subjects are better at ferreting out information that supports their “side” than are low-numeracy ones.

Of course, where one side (left- or right-leaning) is in a position to see the result as identity affirming, the other necessarily is in a position to  see the result as identity threatening. That information, too, can be plotted on one graph per study ((C) & (F)) if the responses of ideologically diverse subjects who face that situation are aggregated.

Note that, in contrast with the preceding conditions, high-numeracy subjects no longer do significantly better than low-numeracy ones, either statistically or practically.  Either they have been lulled into the characteristic “heuristic” mode of information processing or (more likely) they are using their cognitive-proficiency advantage to “rationalize” selecting the “wrong” answer.

Whichever it is, we now see a model not only of how partisans exposed to the same information assign opposing significance to it and thus end up even more polarized. In Bayesian terms, the reason isn’t that they have different priors; it’s that the subjects are assigning different likelihood ratios—i.e., different weights to one and the same piece of evidence (Kahan, Peters, Dawson & Slovic 2017; Kahan 2016).

That’s the essential information. What made the presentation relative economical was the aggregation of responses of right- and left-leaning subjects. The effect could be shown for each “side” separately, but that would require either doubling the number of graphs or creating a super-duper busy single one.

Note, too, that the amenability of the data to this sort of reporting was facilitated by running Monte Carlo simulations, which in generating 5000 or so results for each model made it possible to represent the results in each condition as a probability density distribution for subjects whose political outlooks and numeracy varied in the manner most pertinent to the study hypotheses (King, Tomz & Wittenberg 2000).

Pretty fun, don’t you think?


Kahan, D.M. & Peters, E. Rumors of the Non-replication of the “Motivated Numeracy Effect” Are Greatly Exaggerated. CCP Working paper No. 324 (2017) available at

Kahan, D.M. The Politically Motivated Reasoning Paradigm, Part 1: What Politically Motivated Reasoning Is and How to Measure It. in Emerging Trends in the Social and Behavioral Sciences (John Wiley & Sons, Inc., 2016).

Kahan, D.M., Peters, E., Dawson, E.C. & Slovic, P. Motivated numeracy and enlightened self-government. Behavioural Public Policy 1, 54-86 (2017).

King, G., Tomz, M. & Wittenberg., J. Making the Most of Statistical Analyses: Improving Interpretation and Presentation. Am. J. Pol. Sci 44, 347-361 (2000), available at


"Non-replicated"? The "motivated numeracy effect"?! Forgeddaboutit! 

Limited edition--hurry up & get yours now for free!


The earth is (still) round, even at P < 0.005

In a paper forthcoming in Nature Human Behavior (I think it is still “in press”), a large & distinguished group of social scientists propose nudging (shoving?) the traditional NHST threshold from p ≤ 0.05 to P ≤ 0.005. A response to the so-called “replication crisis,” this “simple step would immediately improve the reproducibility of scientific research in many fields,” the authors (all 72 of them!) write.          

To disagree with a panel of experts this distinguished & this large is a daunting task.  Nevertheless, I do disagree.  Here’s why:

1. There is no reason to think a p-value of 0.005 would reduce the ratio of valid to invalid studies; it would just make all studies—good as well as bad—cost a hell of a lot more.

The only difference between a bad study at p ≤ 0.05 and a bad study at p ≤ 0.005 is sample size.  The same for a good study in which p ≤ 0.005 rather than p ≤ 0.05.

What makes an empirical study “good” or “bad” is the quality of the inference strategy—i.e., the practical logic that connects measured observables to the not-directly observables of interest.

 If a researcher can persuade reviewers to accept a goofy theory for a bad study (say, one on the impact of “himmicanes” on storm-evacuation advisories, the effect of ovulation on women’s voting behavior, or the influence of egalitarian sensibililties on the rate of altercations between economy class and business class airline passengers) at p ≤ 0.05, then the only thing that researcher has to do to get the study published at p ≤ 0.005  is collect more observations.

Of course, because sample recruitment is costly, forcing researchers to recruit massive samples will make it harder for researchers to generate bad studies.

But for the same reason, a p ≤ 0.005 standard will make it much harder for researchers doing good studies---ones that rest on plausible mechanisms—to generate publishable papers, too.

Accordingly, to believe that p ≤ 0.005 will improve the ratio of good studies to bad, one has to believe that scholars doing good studies will be more likely to get their hands on the necessary research funding than will scholars doing bad studies.

That’s not particularly plausible: if it were, then funders would be favoring good over bad research already—at p ≤ 0.05.

At the end of the day, a p ≤ 0.005 standard will simply reduce the stock of papers deemed publishable—period—with no meaningful impact on the overall quality of research.

2. It’s not the case that a p ≤ 0.005 standard will “dramatically reduce the reporting of false-positive results—studies that claim to find an effect when there is none—and so make more studies reproducible.”

The mistake here is to think that there will be fewer borderline studies at p ≤ 0.005 than at p ≤ 0.05.

P is a random variable.  Thus, if one starts with a p ≤ 0.05 standard for publication, there is a 50% chance that a study finding that is “significant” at  p = 0.05 will be “nonsignificant” at p = 0.05 on the next trial, even assuming both studies were conducted identically & flawlessly. (That so many replicators don’t seem to get this boggles one’s mind.)

If the industry norm is adjusted to  p ≤ 0.005, we’ll simply see another random distribution of p values, now around the mean of p ≤ 0.005.  So again, if a paper reports a finding at p = 0.005, there will be a 50% chance that the next, replication trial will produce a result that's not significant at p < 0.005. . . .

Certifying reproducibility won’t be any “easier” or any more certain. And for the reasons stated above, there will be no more reason to assume that studies that either clear or just fall short of clearing the bar at p ≤ 0.005 are any more valid  than ones that occupy the same position in relation to p < 0.05.

3. The problem of NHST cannot be fixed with more NHST.

Finally and most imporantly, the p ≤ 0.005 standard misdiagnoses the problem behind the replication crisis: the malignant craft norm of NHST.

Part of the malignancy is that mechanical rules like p ≤ 0.005 create a thought-free, “which button do I push” mentality: researchers expect publication for research findings that meet this standard whether or not the study is internally valid (i.e., goofy) .  They don’t think about how much more probable a particular hypothesis is than is the null—or even whther the null is uniquely associated with some competing theory of the obsrved effect.

A practice that would tell us exactly those things is better not only substantively but also culturally, because it forces the researcher to think about exactly those things.

Ironically, it is clear that a substantial fraction of the “Gang of 72” believes that p-value-driven NHST should be abandoned in favor of some type of “weight of the evidence” measure, such as the Bayes Factor.  They signed on to the article, apparently, because they believed, in effect, that ratcheting up (down?) the  p-value norm would generate even more evidence of the defects of any sort of threshold for NHST, and thus contribute to more widespread appreciation of the advantages of a “weight of the evidence” alternative.

All I can say about that is that researchers have for decades understood the inferential barenness of p­-values and advocated for one or another Bayesian alternative instead.

Their advocacy has gotten nowhere: we’ve lived through decades of defective null hypotheses testing and the response has always been “more of the same.”

What is the theory of disciplinary history that  predicts a sudden radicalization of the “what button do I push” proletariat of social science? 

As intriguing and well-intentioned the p ≤ 0.005 proposal is, arguments about standards aren’t going to break the NHST norm.

“It must get worse in order to get better” is no longer the right attitude.

Only demonstrating the superiority of a “weight of the evidence” alternative by doing it—and even more importantly teaching it to the next generation of social science researchers—can really be expected to initiate the revolution that the social sciences need.   




Science literacy & polarization--what replication crisis?


Weekend update: Where I'll be this Fall

Page 1 ... 3 4 5 6 7 ... 49 Next 20 Entries »