follow CCP

Recent blog entries
popular papers

Science Curiosity and Political Information Processing

What Is the "Science of Science Communication"?

Climate-Science Communication and the Measurement Problem

Ideology, Motivated Cognition, and Cognitive Reflection: An Experimental Study

'Ideology' or 'Situation Sense'? An Experimental Investigation of Motivated Reasoning and Professional Judgment

A Risky Science Communication Environment for Vaccines

Motivated Numeracy and Enlightened Self-Government

Ideology, Motivated Cognition, and Cognitive Reflection: An Experimental Study

Making Climate Science Communication Evidence-based—All the Way Down 

Neutral Principles, Motivated Cognition, and Some Problems for Constitutional Law 

Cultural Cognition of Scientific Consensus

The Tragedy of the Risk-Perception Commons: Science Literacy and Climate Change

"They Saw a Protest": Cognitive Illiberalism and the Speech-Conduct Distinction 

Geoengineering and the Science Communication Environment: a Cross-Cultural Experiment

Fixing the Communications Failure

Why We Are Poles Apart on Climate Change

The Cognitively Illiberal State 

Who Fears the HPV Vaccine, Who Doesn't, and Why? An Experimental Study

Cultural Cognition of the Risks and Benefits of Nanotechnology

Whose Eyes Are You Going to Believe? An Empirical Examination of Scott v. Harris

Cultural Cognition and Public Policy

Culture, Cognition, and Consent: Who Perceives What, and Why, in "Acquaintance Rape" Cases

Culture and Identity-Protective Cognition: Explaining the White Male Effect

Fear of Democracy: A Cultural Evaluation of Sunstein on Risk

Cultural Cognition as a Conception of the Cultural Theory of Risk


Weekend update: "Knowing disbelief in evolution"-- a fragment

Covers familiar ground for the 14.6 billion regular readers of this blog, but for the benefit of the 2 or so billion nonregulars who tune in on a given day here is a portion of the Measurement Problem paper exposing the invalidity of the NSF Science Indicators' "evolution" measure.  What is obsessing and confounding me -- as I indicated in the recent "What exactly is going on in their heads?" post--is how to understand and make sense of the perspective of the "knowing disbeliever": in that context, the individual who displays high comprehension of the mechanisms and consequences of human-caused climate change but "disbelieves it"; here, the bright student who (unlike the vast majority of people who say they "believe in" evolution) displays comprehension of the modern synthesis, and who might well go on to be a scientist or other professional who uses such knowlege, but who nevertheless "disbelieves" evolution. . . .

2.  What does “belief in evolution” measure?

But forget climate change for a moment and consider instead another controversial part of science: the theory of evolution. Around once a year, Gallup or another major commercial survey firm releases a poll showing that approximately 45% of the U.S. public rejects the proposition that human beings evolved from another species of animal. The news is inevitably greeted by widespread expressions of dismay from media commentators, who lament what this finding says about the state of science education in our country.

Actually, it doesn’t say anything. There are many ways to assess the quality of instruction that U.S. students receive in science.  But what fraction of them say they “believe” in evolution is not one of them.

Numerous studies have found that profession of “belief” in evolution has no correlation with understanding of basic evolutionary science. Individuals who say they “believe” are no more likely than those who say they “don’t” to give the correct responses to questions pertaining to natural selection, random mutation, and genetic variance—the core elements of the modern synthesis (Shtulman 2006; Demastes, Settlage & Good 1995; Bishop & Anderson 1990).

Nor can any valid inference be drawn about a U.S. survey respondent's profession of “belief” in human evolution and his or her comprehension of science generally.  The former is not a measure of the latter.

To demonstrate this point requires a measure of science comprehension.  Since Dewey (1910), general education has been understood to have the aim of imparting the capacity to recognize and use pertinent scientific information in ordinary decisionmaking—personal, professional, and civic (Baron 1993).  Someone who attains this form of “ordinary science intelligence” will no doubt have acquired knowledge of a variety of important scientific findings.  But to expand and use what she knows, she will also have to possesses certain qualities of mind: critical reasoning skills essential to drawing valid inferences from evidence; a faculty of cognitive perception calibrated to discerning when a problem demands such reasoning; and the intrinsic motivation to perform the effortful information processing such analytical tasks entail (Stanovich 2011).

The aim of a valid science comprehension instrument is to measure these attributes.  Rather than certifying familiarity with some canonical set of facts or abstract principles, we want satisfactory performance on the instrument to vouch for an aptitude comprising the “ordinary science intelligence” combination of knowledge, skills, and dispositions.

Such an instrument can be constructed by synthesizing items from standard “science literacy” and critical reasoning measures (cf. Kahan, Peters et. al 2012). These include the National Science Foundation’s Science Indicators (2014) and Pew Research Center’s “Science and Technology” battery (2013), both of which emphasize knowledge of core scientific propositions from the physical and biological sciences; the Lipkus/Peters Numeracy scale, which assesses quantitative reasoning proficiency (Lipkus et al. 2001; Peters et al. 2006; Weller et al. 2012); and Frederick’s Cognitive Reflection Test, which measures the disposition to consciously interrogate intuitive or pre-existing beliefs in light of available information (Frederick 2005; Kahneman 1998).

The resulting 18-item “Ordinary Science Intelligence” scale is highly reliable (α = 0.83) and displays a unidimensional covariance structure when administered to a representative general population sample (N = 2000).[1] Scored with Item Response Theory to enhance its discrimination across the range of the underlying latent (not directly observable) aptitude that it can be viewed as measuring, OSI strongly predicts proficiency on tasks such as covariance detection, a form of reasoning elemental to properly drawing casual inferences from data (Stanovich 2009).  It also correlates (r = 0.40, p < 0.01) with Baron’s Actively Open-minded Thinking test, which measures a person’s commitment to applying her analytical capacities to find and properly interpret evidence (Haron, Ritov & Mellers 2013; Baron 2008).

 Consistent with the goal of discerning differing levels of this proficiency (Embretson & Reise 2000), OSI contains items that span a broad range in difficulty.  For example, the NSF Indicator Item “Electrons”—“Electrons are smaller than atoms—true or false?”—is comparatively easy (Figure 1). Even at the mean level of science comprehension, test-takers from a general population sample are approximately 70% likely to get the “right” answer.  Only someone a full standard deviation below the mean is more likely than not to get it wrong.

“Nitrogen,” the Pew multiple choice item on which gas is most prevalent in the atmosphere, is relatively difficult (Figure 1).  Someone with a mean OSI score is only about 20% likely to give the correct response. A test taker has to possess an OSI aptitude one standard deviation above the mean before he or she is more likely than not to supply the correct response.

 “Conditional Probability” is a Numeracy battery item (Weller et al. 2012). It requires a test-taker to determine the probability that a woman who is selected randomly from the population and who tests positive for breast cancer in fact has the disease; to do so, the test-taker must appropriately combine information about the population frequency of breast cancer with information about the accuracy rate of the screening test. A problem that assesses facility in drawing the sort of inferences reflecting the logic of Bayes’s’ Theorem, Conditional Probability turns out to be super hard. At the mean level of OSI, there is virtually no chance a person will get this one right.  Even those over two standard deviations above the mean are still no more likely to get it right than to get it wrong (Figure 1).  


With this form of item response analysis (Embretson & Reise 2000), we can do two things. One is identify invalid items—ones that don’t genuinely measure the underlying disposition in an acceptably discerning manner. We’ll recognize an invalid item if the probability of answering it correctly doesn’t bear the sort of relationship with OSI that valid items do.

The NSF Indicator’s “Evolution” item—“human beings, as we know them today, developed from earlier species of animals, true or false?”—is pretty marginal in that regard. People who vary in science comprehension, we’ve seen, vary correspondingly in their ability to answer questions that pertain to their capacity to recognize and give effect to valid empirical evidence. The probability of getting the answer “right” on “Evolution,” in contrast, varies relatively little across the range of OSI (Figure 1). In addition, the probability of getting the right answer is relatively close to 50% at both one standard deviation below and one standard deviation above the OSI mean, as well as at every point in between. The relative unresponsiveness of  the item to differences in science comprehension, then, is reason to infer that it is either not measuring anything or is measuring something that is independent of science comprehension.

Second, item-response functions can be used to identify items that are “biased” in relation to a subgroup.  “Bias” in this context is used not in its everyday moral sense, in which it connotes animus, but rather in its measurement sense, where it signifies a systematic skew toward either high or low readings in relation to the quantity being assessed.  If an examination of an item’s response profile shows that it tracks the underlying latent disposition in one group but not in another, then that item is biased in relation to members of the latter group—and thus not a valid measure of the disposition for a test population that includes them (Osterlind & Everson 2009).

That’s clearly true for the NSF’s Evolution item as applied to individuals who are relatively religious.  Such individuals—who we can identify with a latent disposition scale that combines self-reported church attendance, frequency of prayer, and perceived importance of religion in one’s life (α = 0.86)—respond the same as relatively nonreligious ones with respect to Electron, Nitrogen, and Conditional Probability. That is, in both groups, the probability of giving the correct response varies in the same manner with respect to the underlying science comprehension disposition that OSI measures (Figure 2).

Their performance on the Evolution item, however, is clearly discrepant. One might conclude that Evolution is validly measuring science comprehension for non-religious test takers, although in that case it is a very easy question:  the likelihood a nonreligious individual with a mean OSI score will get the “right” answer is 80%—even higher than the likelihood that this person would respond correctly to the relatively simple Electron item.

In contrast, for a relatively religious individual  with a mean OSI score, the probability of giving the correct response is around 30%.  This 50 percentage-point differential tells us that Evolution does not have the same relationship to the latent OSI disposition in these two groups.

Indeed, it is obvious that Evolution has no relation to OSI whatsoever in relatively religious respondents.  For such individuals, the predicted probability of giving the correct answer does not increase as individuals display a higher degree of science comprehension. On the contrary, it trends slightly downward, suggesting that religious individuals highest in OSI are even more likely to get the question “wrong.”

It should be obvious but just to be clear: these patterns have nothing to do with any correlation between OSI and religiosity. There is in fact a modest negative correlation between the two (r = -0.17, p  < 0.01).  But the “differential item function” test (Osterlind & Everson 2009) I’m applying identifies differences among religious and nonreligious individuals of the same OSI level. The difference in performance on the item speaks to the adequacy of Evolution as a measure of knowledge and reasoning capacity and not to the relative quality of those characteristics among members of the two groups.

The bias with respect to religious individuals—and hence the invalidity of the item as a measure of OSI for a general population sample—is most striking in relation to respondents’ performance on Conditional Probability. There is about a 70% (± 10 percentage points, at the 0.95 level of confidence) probability that someone two and a quarter standard deviations above the mean on OSI will answer this extremely difficult question correctly. Of course, there aren’t many people two and a quarter standard deviations above the mean (the 99th percentile), but certainly they do exist, and they are not dramatically less likely to be above average in religiosity.  Yet if one of these exceptionally science-comprehending individuals is relatively religious, the probability that he or she will give the right answer to the NSF Evolution item is about 25% (± 10 percentage points, at the 0.95 level of confidence)—compared to 80% for the moderately nonreligious person who is merely average in OSI and whose probability of answering Conditional Probability correctly is epsilon. 

Under these conditions, one would have to possess a very low OSI score (or a very strong unconscious motivation to misinterpret these results (Kahan, Peters, et al. 2013)) to conclude that a “belief in evolution” item like the one in the NSF Indicatory battery validly measures science comprehension in general population test sample.  It is much more plausible to view it as measuring something else: a form of cultural identity that either does or does not feature religiosity (cf. Roos 2012).

One way to corroborate this surmise is to administer to a general population sample a variant of the NSF’s Evolution item designed to disentangle what a person knows about science from who he or she is culturally speaking.  When the clause, “[a]ccording to the theory of evolution  . . .” introduces the proposition “human beings, as we know them today, developed from earlier species of animals” (NSF 2006, 2014), the discrepancy between relatively religious and relatively non-religious test-takers disappears! Freed from having to choose between conveying what they understand to be the position of science and making a profession of “belief” that denigrates their identities, religious test-takers of varying levels of OSI now respond very closely to how nonreligious ones of corresponding OSI levels do. The profile of the item response curve—a positive slope in relation to OSI for both groups—supports the inference that answering this variant of Evolution correctly occupies the same relation to OSI as do the other items in the scale. However, this particular member of the scale turns out to be even easier—even less diagnostic of anything other than a dismally low comprehension level in those who get it wrong—than the simple NSF Indicator Electron item.

As I mentioned, there is no correlation between saying one “believes” in evolution and meaningful comprehension of natural selection and the other elements of the modern synthesis. Sadly, the proportion who can give a cogent and accurate account of these mechanisms is low among both “believers” and “nonbelievers,” even in highly educated samples, including college biology students (Bishop & Anderson 1990).  Increasing the share of the population that comprehends these important—indeed, astonishing and awe-inspiring—scientific insights is very much a proper goal for those who want to improve the science education that Americans receive.

The incidence of “disbelief” in evolution in the U.S. population, moreover, poses no barrier to attaining it. This conclusion, too, has been demonstrated by outstanding empirical research in the field of education science (Lawson & Worsnop 2006).  The most effective way to teach the modern synthesis to high school and college students who “do not believe” in evolution, this research suggests, is to focus on exactly the same thing one should focus on to teach evolutionary science to those who say they do “believe” but very likely don’t understand it: the correction of various naive misconceptions that concern the tendency of people to attribute evolution not to supernatural forces but to functionalist mechanisms and to the hereditability of acquired traits (Demastes, Settlage & Good 1995; Bishop & Anderson 1990)..

Not surprisingly, the students most able to master the basic elements of evolutionary science are those who demonstrate the highest proficiency in the sort of critical reasoning dispositions on which science comprehension depends. Yet even among these students, learning the modern synthesis does not make a student who started out professing “not to believe in” evolution any more likely to say she now does “believe in” it (Lawson & Worsnop 2006).

Indeed, treating profession of “belief” as one of the objectives of instruction is thought to make it less likely that students will learn the modern synthesis.  “[E]very teacher who has addressed the issue of special creation and evolution in the classroom,” the authors of one study (Lawson & Worsnop 2006, p. 165) conclude,

already knows that highly religious students are not likely to change their belief in special creation as a consequence of relative brief lessons on evolution. Our suggestion is that it is best not to try to [change students’ beliefs], not directly at least. Rather, our experience and results suggest to us that a more prudent plan would be to utilize instruction time, much as we did, to explore the alternatives, their predicted consequences, and the evidence in a hypothetico-deductive way in an effort to provoke argumentation and the use of reflective thought. Thus, the primary aims of the lesson should not be to convince students of one belief or another, but, instead, to help students (a) gain a better understanding of how scientists compare alternative hypotheses, their predicated consequences, and the evidence to arrive at belief and (b) acquire skill in the use of this important reasoning pattern—a pattern that appears to be necessary for independent learning and critical thought.

This research is to the science of science communication’s “measurement problem” what the double slit experiment is to quantum mechanics’.  All students, including the ones most readily disposed to learn science, can be expected to protect their cultural identities from the threat that denigrating cultural meanings pose to it.  But all such students—all of them—can also be expected to use their reasoning aptitudes to acquire understanding of what is known to science.  They can and will do both—at the very same time.  But only when the dualistic quality of their reason as collective-knowledge acquirers and identity-protectors is not interfered with by forms of assessment that stray from science comprehension and intrude into the domain of cultural identity and expression.  A simple (and simple-minded) test can be expected to force disclosure of only one side of their reason.  And what enables the most exquisitely designed course to succeed in engaging the student’s reason as an acquirer of collective knowledge is exactly the care and skill with which the educator avoids provoking the student into using her reason for purposes of identity-protection only.


[1] The items comprising the OSI scale appear in the Appendix. The psychometric performance of the OSI scale is presented in greater detail in Kahan (2014)


Baron, J. Why Teach Thinking? An Essay. Applied Psychology 42, 191-214 (1993).

Bishop, B.A. & Anderson, C.W. Student conceptions of natural selection and its role in evolution. Journal of Research in Science Teaching 27, 415-427 (1990).

Demastes, S.S., Settlage, J. & Good, R. Students' conceptions of natural selection and its role in evolution: Cases of replication and comparison. Journal of Research in Science Teaching 32, 535-550 (1995).

Dewey, J. Science as Subject-matter and as Method. Science 31, 121-127 (1910).

Embretson, S.E. & Reise, S.P. Item response theory for psychologists (L. Erlbaum Associates, Mahwah, N.J., 2000).

Kahan, D.M. “Ordinary Science Intelligence”: A Science Comprehension Measure for Use in the Study of Risk Perception and Science Communication. Cultural Cognition Project Working Paper No. 112  (2014).

Kahan, D.M., Peters, E., Dawson, E. & Slovic, P. Motivated Numeracy and Englightened Self Government. Cultural Cognition Project Working Paper No. 116  (2013).

Kahan, D.M., Peters, E., Wittlin, M., Slovic, P., Ouellette, L.L., Braman, D. & Mandel, G. The polarizing impact of science literacy and numeracy on perceived climate change risks. Nature Climate Change 2, 732-735 (2012).

Lawson, A.E. & Worsnop, W.A. Learning about evolution and rejecting a belief in special creation: Effects of reflective reasoning skill, prior knowledge, prior belief and religious commitment. Journal of Research in Science Teaching 29, 143-166 (1992).

Lipkus, I.M., Samsa, G. & Rimer, B.K. General Performance on a Numeracy Scale among Highly Educated Samples. Medical Decision Making 21, 37-44 (2001).

National Science Foundation. Science and Engineering Indicators (Wash. D.C. 2014). 

National Science Foundation. Science and Engineering Indicators (Wash. D.C. 2006). 

Osterlind, S.J., Everson, H.T. & Osterlind, S.J. Differential item functioning (SAGE, Thousand Oaks, Calif., 2009). 

Peters, E., Västfjäll, D., Slovic, P., Mertz, C.K., Mazzocco, K. & Dickert, S. Numeracy and Decision Making. Psychol Sci 17, 407-413 (2006).

Pew Research Center for the People & the Press. Public's Knowledge of Science and Technology (Pew Research Center, Washington D.C., 2013).

Roos, J.M. Measuring science or religion? A measurement analysis of the National Science Foundation sponsored science literacy scale 2006–2010. Public Understanding of Science  (2012).

Shuman, H. Interpreting the Poll Results Better. Public Perspective 1, 87-88 (1998).

Stanovich, K.E. What intelligence tests miss : the psychology of rational thought (Yale University Press, New Haven, 2009). 

Weller, J.A., Dieckmann, N.F., Tusler, M., Mertz, C., Burns, W.J. & Peters, E. Development and testing of an abbreviated numeracy scale: A rasch analysis approach. Journal of Behavioral Decision Making 26, 198-212 (2012).



Weekend update: "Culture is prior to fact" & what that implies about resolving political conflict over risk

The idea that cultural cognition and related dynamics are peculiar to "unsettled" issues, or ones where the scientific evidence is not yet "clearly established," is a recurring theme.  For some reason, the recent "What exactly is going on in their heads?" post has stimulated many commentators -- in the discussion thread & in correspondence -- to advance this claim.  In fact, that view is at odds with the central tenet of cultural cognition as a research program.

The cultural cognition thesis asserts that "culture is prior to fact" in a cognitive sense: the capacity of individuals to recognize the validity of evidence on risks and like policy-relevant facts depends on cognitive faculties that themselves are oriented by cultural affiliations. Because cultural norms and practices certify that evidence has the qualities that entitle it to being credited consistent with science's criteria for valid proof, ordinary members of the public won't be able to recognize that scientific evidence is "clear" or "settled" unless doing so is compatible with their cultural identities. 

Below I reproduce one relatively early formulation of this position. It is from  Kahan, D.M. & Braman, D. Cultural Cognition of Public Policy. Yale J. L. & Pub. Pol'y 24, 147-170 (2006).  

In this essay, Don "Shotgun" Braman & I characterize the "cultural cognition thesis" as a "conjecture."  I am happy to have it continue to be characterized as such -- indeed, prefer that it forever be referred to as "conjectural" no matter how much evidence is adduced to support it than that it be referred to as "proven" or "established" or the like, a way of talking that reflects a vulgar heuristic substitute for science's own way of knowing, which treats every current best understanding as provisional and as subject to modification and even rejection in light of additional evidence. 

But in fact, since this essay was published, the Cultural Cognition Project has conducted numerous experiments that support the "cultural cognition thesis."  These experiments present evidence on mechanisms of cognition the operation of which implies that "clear" or valid evidence can be recognized as such only when assent to it affirms rather than denigrates perceivers' cultural identities.  Such mechanisms include (1) culturally biased search and assimilation; (2) cultural source credibility; (3) the cultural availability effect; and (4) culturally motivated system 2 reasoning.  

As the excerpt emphasizes (and as is documented in its many footnotes, which are not reproduced here), all of these involve extensions of well-established existing psychological dynamics.  The nerve of the cultural cognition research program has been been simply to demonstrate important interactions between known cognitive mechanisms and cultural outlooks, a process that we hypothesize accounts for persistent political conflict on risk and other policy-relevant facts that admit of scientific investigation.

Knowing what I (provisionally) do now, there are collateral elements of the account below that I would qualify or possibly even disavow! I'm sure I'll continue to discover holes and gaps and false starts in the future, too--and I look forward to that.


Public disagreement about the consequences of law is not just a puzzle to be explained but a problem to be solved. The prospects for enlightened democratic decisionmaking obviously depend on some reliable mechanism for resolving such disputes and resolving them accurately. Because such disagreements turn on empirical claims that admit of scientific investigation, the conventional prescription is the pursuit and dissemination of scientifically sound information.

The hope that democracy can be enlightened in such a straightforward manner, however, turns out to be an idle one. Like most heuristics, cultural cognition is also a bias. By virtue of the power that cultural cognition exerts over belief formation, public dispute can be expected to persist on questions like the deterrent effect of capital punishment, the danger posed by global warming, the utility or futility of gun control, and the like, even after the truth of the matter has been conclusively established.

Imagine—very counterfactually—that all citizens are perfect Bayesians. That is, whenever they are apprised of reliable information, they readily update their prior factual beliefs in a manner that appropriately integrates this new information with all existing information at their disposal.

Even under these circumstances, conclusive discovery of the truth is no guarantee that citizens will converge on true beliefs about the consequences of contested public policies. For while Bayesianism tells individuals what to do with relevant and reliable information, it doesn’t tell them when they should regard information as relevant and reliable. Individuals can be expected to give dispositive empirical information the weight that it is due in a rational-decisionmaking calculus only if they recognize sound information when they see it.

The phenomenon of cultural cognition suggests they won’t. The same psychological and social processes that induce individuals to form factual beliefs consistent with their cultural orientation will also prevent them from perceiving contrary empirical data to be credible. Cognitive-dissonance avoidance will steel individuals to resist empirical data that either threatens practices they revere or bolsters ones they despise, particularly when accepting such data would force them to disagree with individuals they respect. The cultural judgments embedded in affect will speak more authoritatively than contrary data as individuals gauge what practices are dangerous and what practices are not. And the culturally partisan foundation of trust will make them dismiss contrary data as unreliable if they perceive that it originates from persons who don’t harbor their own cultural commitments.

This picture is borne out by additional well-established psychological and social mechanisms. One constraint on the disposition of individuals to accept empirical evidence that contradicts their culturally conditioned beliefs is the phenomenon of biased assimilation. This phenomenon refers to the tendency of individuals to condition their acceptance of new information as reliable based on its conformity to their prior beliefs. This disposition to reject empirical data that contradict one’s prior belief (for example, that the death penalty does or doesn’t deter crime) is likely to be especially pronounced when that belief is strongly connected to an individual’s cultural identity, for then the forces of cognitive dissonance avoidance that explain biased assimilation are likely to be most strongly aroused.

Two additional mechanisms reinforce the tendency to see new information as unreliable when it challenges a culturally congenial belief. The first is naïve realism. This phenomenon refers to the disposition of individuals to view the factual beliefs that predominate in their own cultural group as the product of “objective” assessment, and to attribute the contrary factual beliefs of their cultural and ideological adversaries to the biasing influence of their worldviews. Under these conditions, evidence of the truth will never travel across the boundary line that separates a factually enlightened cultural group from a factually benighted one.

Indeed, far from being admitted entry, the truth will be held up at the border precisely because it originates from an alien cultural destination. The second mechanism that constrains societal transmission of truth—reactive devaluation—is the tendency of individuals who belong to a group to dismiss the persuasiveness of evidence proffered by their adversaries in settings of intergroup conflict.

We have been focusing on the impact of cultural cognition as a bias in the public’s recognition of empirically sound information. But it would be a mistake to infer that the immunity of social and natural scientists to such bias improves the prospects for truth, once discovered, to penetrate public debate.

This would be a mistake, first, because scientists aren’t immune to the dynamics we have identified. Like everyone else, scientists (quite understandably, even rationally) rely heavily on their priors when evaluating the reliability of new information. In one ingenious study, for example, scientists were asked to judge the experimental and statistical methods of what was represented to be a real study of the phenomenon of ESP. Those who received the version of the fictitious study that found evidence of ESP rated the methods to be low in quality, whereas those who received the version that found no evidence of ESP rated the methods to be high in quality, even though the methods were in fact independent of the conclusion. Other studies showing that cultural worldviews explain variance in risk perceptions not just among lay persons but also among scientists who specialize in risk evaluation fortify the conclusion that for scientists, too, cultural cognition operates as an information-processing filter.

But second and more important, any special resistance scientists might have to the biasing effect of cultural cognition is beside the point. The issue is whether the discovery and dissemination of empirically sound information can, on its own, be expected to protect democratic policymaking from the distorting effect of culturally polarized beliefs among citizens and their representatives.

Again (for the umpteenth time), ordinary citizens aren’t in a position to determine for themselves whether this or that scientific study of the impact of gun control laws, of the deterrent effect of the death penalty, of the threat posed by global warming, et cetera, is sound. Scientific consensus, when it exists, determines beliefs in society at large only by virtue of social norms and practices that endow scientists with deference-compelling authority on the issues to which they speak. When they address matters that have no particular cultural valence within the group-grid matrix—What are the relative waterrepellant qualities of different synthetic fabrics? Has Fermat’s Last Theorem been solved?—the operation of these norms and practices is unremarkable and essentially invisible.

But when scientists speak to policy issues that are culturally disputed, then their truth-certifying credentials are necessarily put on trial. For many citizens, men and women in white lab coats speak with less authority than (mostly) men and women in black frocks. And even those who believe the scientists will still have to choose which scientists to believe. The laws of probability, not to mention the professional incentives toward contrarianism, assure that even in the face of widespread professional consensus there will be outliers. Citizens (again!) lack the capacity to decide for themselves whose work has more merit. They have no choice but to defer to those whom they trust to tell them which scientists to believe. And the people they trust are inevitably the ones whose cultural values they share, and who are inclined to credit or dismiss scientific evidence based on its conformity to their cultural priors.

These arguments are necessarily interpretative and conjectural. But in the spirit of (casual) empirical verification, we invite those who are skeptical to perform this thought experiment. Ask yourself whether you think there is any credible scientific ground for believing that global warming is/isn’t a serious threat; that the death penalty does/doesn’t deter; that gun control does/doesn’t reduce violent crime; that abortion is/isn’t safer than childbirth. If you believe the truth has been established on any one of these issues, ask yourself why it hasn’t dispelled public disagreement. If you catch yourself speculating about the possible hidden cognitive motivations the disbelievers might have by virtue of their cultural commitments, you may proceed to the next Part of this Essay (although not until you’ve reflected on why you think you know the truth and whether your cultural commitments might have anything to do with that belief).  If, in contrast, you are tempted to answer, “Because the information isn’t accessible to members of the public,” then please go back to the beginning of this Essay and start over.


Nothing in our account implies either that there is no truth of the matter on disputed empirical policy issues or that the public cannot be made receptive to that truth. Like at least some other cognitive biases, cultural cognition can be counteracted. . . .  



For what it's worth: breaking down "belief in" GW vs. "belief in" AGW as function of partisanship & OCSI

As a result of (a) my aggregation of responses to the two-part question used to assess "belief in" human-caused global warming and (b) my failure to indicate that in the Figure label, there was some understandable confusion in the discussion in response to the "What exactly is going on in their heads?" post.

This should help.

Again, the "belief in" question I used -- patterned on standard opinion polling ones used by firms like Pew & Gallup-- has two parts:

  1. "From what you’ve read and heard, is there solid evidence that the average temperature on earth has been getting warmer over the past few decades?" [YES/NO]
  2. If yes: "Do you believe that the earth is getting warmer (a) mostly because of human activity such as burning fossil fuels or (b) mostly because of natural patterns in the earth’s environment?"

Among the people (N = 2000, nationally representative) who took the "Ordinary climate science intelligence" assessment, here is the breakdown for question (1) for respondents defined by their scores in relation to the mean on a "right-left" outlook scale (one that combined responses to items on party allegiance and liberal-conservative ideology):

These results are consistent with what US general public opinion surveys have shown for better part of a decade.

Here are the "item response" profiles-- plots of the predicted probability of answering these questions as indicated -- for subjects of opposing political outlooks in relation to their scores on the OCSI scale:

As can be seen, the probability both of "believing in" global warming and "belief in" human-caused global warming among those who believe in global warming becomes more politically polarized as individuals score higher on OCSI.

Note that OCSI itself is made up of items relating to the mechanisms and consequences of human-caused global warming.  Items on "belief in" global warming -- human or otherwise -- are not part of the scale, since the point was to see if comprehension of the mechanisms and consequences of human-caused climate change, on the one hand, have any particular connection to "belief in" human-caused global warming, on the other. The former clearly don't "cause" the latter!

 I've disabled comments here in order to prevent "forking" the discusison going on in connection with the "Whats going on ..." post.  So feel free to dispense your wisdom on these data there.


Conditional probability is hard -- but teaching it *shouldn't* be!

So, consider these two problems: 

A. Which is more difficult?

B. Which is it easier to teach someone to do correctly?

My answers: BAYES is more difficult but also easier to each someone to do correctly. 

Does that seem plausible to you? I won't be surprised if you say no, particularly if your answer reflects experience in seeing how poorly people do with conditional probability problems.

But if you disagree with me, I do want to challenge your sense of what the problem is

Okay, so here are some data.

For sure, BAYES is harder.  In a diverse sample of 1,000 adults (over half of whom had either a four-year college or post-graduate degree), only 3% got the correct answer (50%). For COVARY, 55% percent got the correct answer (“patients administered the new treatment were not more likely to survive”).

This is not surprising. BAYES involves conditional probability, a concept that most people find very counterintuitive.  There is a strong tendency to treat the accuracy rate of the witness’s color discernment-- 90% --  as the likelihood that the bus is blue.  

That was the modal answer—one supplied by 34% of the respondents—within the sample here. This response ignores information about the base rate of blue versus green buses.  

Another 23% picked 10%--the base rate frequency of blue buses. They thus ignored the additional information associated with the witness’s perception of the color of the bus.

How to combine the base rate information with the accuracy of the witness’s perception of color (or their equivalent in other problems that involve the same general type of reasoning task) is reflected in Bayes’s Theorem, a set of logical operations that most people find utterly baffling.

COVARY is a standard “covariance detection” problem.  It’s not as hard as BAYES, but it’s still pretty difficult!

Many people (usually most; this fairly well educated sample did better than a representative sample would) use one of two heuristics to analyze a problem that has the formal characteristics of this one (Arkes & Harkness 1983).  The first, and most common, simply involves comparing the number of “survivors” to the number of “nonsurvivors” in the treatment condition.  The second involves comparing in addition the number of survivors in the treatment and the number of survivors in the control.

Both of these approaches generate the wrong answer—that patients given the new treatment were more likely to survive than those who didn’t receive it—for the data generated in this hypothetical experiment.

What’s important is the ratio of survivors to nonsurvivors in the two experimental groups.  In the group whose members received the treatment, patients were about three times more likely to survive (223:75 = 2.97:1).  In the untreated group, however, parents were just over five times more likely to survive (107:21 = 5.10:1).

Pretty much anyone who got the wrong answer can see why the correct one is right once the difference in the “likelihood ratios” (which is actually an important common element in conditional probability and covariance problems) is pointed out. 

The math is pretty tame (a fifth grader should be able to handle it), and the inferential logic (the essence of the sort of causal inference strategy that informs controlled experimentation) pretty much explains itself.

The reason such a significant number of people get the answer wrong is that they don’t reliably recognize that they have to compare the ratios of positive to negative outcomes. They effectively succumb to the temptation to settle for “hypothesis-confirming” evidence without probing for the disconfirming evidence that one can extract only by making use of all the available information in the 2x2 contingency table.

Now, why do I feel that it is nevertheless easier to teach people how to solve conditional probability problems of the sort reflected in BAYES than to teach them how to reliably solve covariance-detection ones of the sort reflected in COVARY?

The answer has to do with what someone has to learn to consistently get the problems right.

Doing conditional probability problems is actually easy once one grasps why the base rate matters—and enabling someone to grasp that turns out to be super easy too with the right pedagogical techniques.

The most important of these is to illustrate how a conditional probability problem can be conceived of as a population-sampling one (Spiegelhalter, Pearson & Short 2011).

In BAYES, we are told that 90% of the buses that could have struck Bill are green, and 10% of them are blue.

Accordingly, if we imagine a simulation in which Bill was hit by 100 city buses drawn at random, we’d expect him to be run down by a green bus 90 times and a blue one 10 times.

If we add Wally to the simulation, we’ll expect him correctly to perceive 81 or 90% of the 90 green buses that struck Bill  to be green and incorrectly perceive 9 (10%) of them to be blue.

Likewise, we’ll expect him to correctly perceive 9 of the 10 blue buses (90%) that hit Bill to be blue, but incorrectly perceive 1 of them (10%) to be green.

Overall, then, in 100 trials, Wally will perceive Bill to have been hit 18 times by a blue bus. Nine of those will be cases in which Wally correctly perceived a blue bus to be blue.  But nine will be cases in which Wally incorrectly perceived as blue a bus that was in fact green.

Because in our 100-trial simulation, the number of times Wally was correct when he identified the bus that hit Bill as blue is exactly equal to the number of times he was incorrect, Bill will have been hit by a blue bus 50% of the time and by a green one 50% of time in all the cases in which Wally perceives Bill was hit by a blue bus.

This “natural frequency” strategy for analyzing conditional probability problems has been shown to be an effective pedagogical tool in experimental studies (Sedlmeier & Gigerenzer 2001; Kurzenhäuser & Hoffrage 2002; Wheaton & Deshmuk 2009). 

After using it to help someone grasp the conceptual logic of conditional probability, one can also connect the steps involved to a very straightforward rendering of Bayes’s Theorem: prior odds x likelihood ratio = revised (posterior) odds.

In this rendering, the base rate is represented in terms of the odds that a particular proposition or hypothesis is true: here, independently of Wally’s observation, we’d compute the odds that the bus that struck Bill was blue at 10:90 (“10 in 100”) or 1:9.

The new information or evidence is represented as a likelihood ratio, which reflects how much more consistent that evidence is with the hypothesis or proposition in question being true than with its negation (or some alternative hypothesis) being true.

Wally is able correctly to distinguish blue from green 90% of the time.

So if the bus that struck Bill was in fact blue, we’d expect Wall to perceive it as blue 9 times out of 10, whereas if the bus that struck Bill was in fact green, we’d expect Wally to perceive it as blue only 1 time out of 10. 

Because Wally is nine times (9 vs. 1 or 90% vs. 10%) more likely to perceive a bus was “blue” when it was truly blue than when it was in fact green, the likelihood ratio is 9.

“Multiplying” the prior odds by the likelihood ratio involves computing the product of (1) the element of the odds expression that corresponds to the hypothesis  and (2) the likelihood ratio value. 

Here the prior odds were 1:9 that the bus that struck Bill was blue.  Nine (likelihood ratio) times one (from 1:9) equals 9

The revised odds that the bus that struck bill was blue is thus 1:9 x 9 = 9:9 or 1:1, which is equivalent to 50%.

I’m not saying that one exposure to this sort of exercise will be sufficient to reliably program someone to do conditional probability problems.

But I am saying that students of even middling levels of numeracy can be expected over the course of a reasonable number of repetitions to develop a reliable facility with conditional probability. The “natural frequencies” representation of the elements of the problem makes sense, and students can see which parts of that conceptualization map onto the “prior odds x likelihood ratio = revised odds” rendering of Bayes’s theorem and why.

If you want to make it even easier for this sort of lesson to take hold, & related hardwiring to settle in, give your students this cool Bayes's calculator.

Students can’t be expected, in contrast, to see why any of the other more complex but logically equivalent rendering of Bayes’s Theorem actually makes sense.  They thus can't be expected to retain them, to become adept at heuristically deploying them, or to experience the sort of improvement in discernment and reasoning that occurs as one assimilates statistical concepts.  

Teachers who try to get students to learn to apply these formalisms, then, are doing a shitty job!

Now what about covariance?

Actually, there’s really nothing to it from an instructional point of view.  It explains itself, as I said.

But that’s exactly the problem: facility with it is not a matter of learning how to do any particular thing.

Rather it is a matter of reliably recognizing when one is dealing with a problem in which the sort of steps necessary to detect covariance have to be done.

The typical reaction of someone when it's pointed out that he or she got the covariance problem wrong is an instant recognition of the mistake, and the sense that the error was a result of an uncharacteristic lapse or even a “trick” on the part of the examiner. 

But in fact, in order to make reliable causal inferences based on observation in their everyday life, people will constantly be required to detect covariance.  If they are unable to see the need for, or just lack the motivation to perform, the necessary operations even when all the essential information has been pre-packaged for them into a 2x2 contingency table, then the likelihood that they will lapse into the defective heuristic alternative when they encounter covariance-detection problems in the wild is very very high (Stanovich 2009).

How likely someone is to get the right answer in the covariance problem is associated with their numeracy. The standard numeracy scale (e.g., Peters et al. 2006) is a measure not so much of math skill as of the capacity to reliable recognize when a quantitative reasoning problem requires one or another type of effortful analysis akin to what's involved in detecting covariance.

Frankly, I’m pessimistic that I can instill that sort of capacity in students.  That's not because I have a modest sense of my abilities as a teacher.  It’s because I have due respect for the difficulty that many indisputably great researchers and teachers have encountered in trying to come up with pedagogical techniques that are as successful in imparting critical reasoning dispositions in students as the “natural frequencies” strategy is for imparting a reliable facility in them to do conditional probability problems.

Of course, in order for students to successfully use the “natural frequencies” strategy and—after they become comfortable with it—the prior odds x likelihood ratio = revised odds rendering of Bayes theorem, they must reliably recognize conditional probability problems when they see them. 

But in my experience, at least, that’s not a big deal. When a conditional probability problem makes its appearance, one is about as likely to overlook it as one is to fail to notice that a mother black bear w/ its cub or a snarling honey badger has appeared along side the trail during a hike in the woods.

Which then leads me to the question, how can it be that only 3% of a sample as well educated and intelligent  as the one I tested can get do a conditional probability problem as simple as the one I put in this battery?

Doesn't that mean that too many math teachers are failing to use the empirical knowledge that has been developed by great education researchers & teachers?

Or am I (once again; it happens!) missing something?


Arkes, H.R. & Harkness, A.R. Estimates of Contingency Between Two Dichotomous Variables. J. Experiminal Psychol. 112, 117-135 (1983).

Kurzenhäuser, S. & Hoffrage, U. Teaching Bayesian reasoning: an evaluation of a classroom tutorial for medical students. Medical Teacher 24, 516-521 (2002).

Peters, E., Västfjäll, D., Slovic, P., Mertz, C.K., Mazzocco, K. & Dickert, S. Numeracy and Decision Making. Psychol Sci 17, 407-413 (2006).

Sedlmeier, P. & Gigerenzer, G. Teaching Bayesian reasoning in less than two hours. Journal of Experimental Psychology: General 130, 380-400 (2001).

Spiegelhalter, D., Pearson, M. & Short, I. Visualizing Uncertainty About the Future. Science 333, 1393-1400 (2011).

Stanovich, K.E. What intelligence tests miss : the psychology of rational thought (Yale University Press, New Haven, 2009).

Wheaton, K.J., Lee, J. & Deshmukh, H. Teaching Bayesian Statistics To Intelligence Analysts: Lessons Learned. J. Strategic Sec. 2, 39-58 (2009).




"What exactly is going on in their heads?" (And in mine?) Explaining "knowing disbelief" of climate change

During my trip to Australia, I presented The Measurement Problem twice in one day, first at Monash University and then at RMIT University (slides here). I should have presented two separate lectures but I’m obsessed—disturbed even—by the results of the MP study so I couldn’t resist the opportunity to collect two sets of reactions.

In fact, I spent the several hours between the lectures discussing the challenges of measuring popular climate-science comprehension with University of Melbourne psychologist Yoshi Kashima, co-author of the very interesting study Guy, S., Kashima, Y., Walker, I. & O'Neill, S. Investigating the effects of knowledge and ideology on climate change beliefs. European Journal of Social Psychology 44, 421-429 (2014).

The challenges, we agreed, are two.

The first is just to do it. 

If you want to figure out what people know about the mechanisms of climate change, asking them whether they “believe in” human-caused global warming definitely doesn’t work.  The answer they give you to that question tells you who they are: it is an indicator of their cultural identity uninformed by and uncorrelated with any meaningful understanding of evidence or facts.

Same for pretty much any question that people recognize as asking them to “take a position” on climate change.

To find out what people actually know, you have to design questions that make it possible for them to reveal what they understand without having to declare whose side they are on in the pointless and demeaning cultural status competition that the “climate change question” has become in the US—and Australia, the UK, and many other liberal democracies.

This is a hard thing to do! 

Item response curves for OCSIBut once accomplished, the second challenge emerges: to make sense of the surprising picture that one can see after disentangling people's comprehension of climate change from their cultural identities.

As I explained in my Monash and RMIT lectures, ordinary members of the public—no matter “whose side” they are on—don’t know very much about the basic mechanisms of climate change.  That’s hardly a surprise given the polluted state of the science communication environment they inhabit.

What’s genuinely difficult to sort out, though, is how diverse citizens can actually be on different sides given how uniform their (mis)understandings are.

Regardless of whether they say they “believe in” climate change, most citizens’ responses to the “Ordinary Climate Science Intelligence” (OCSI) assessment suggest they are disposed to blame human activity for all manner of adverse climate impacts, including ones wholly at odds with the mechanisms of global warming.

This result suggests that what’s being measured when one disentangles knowledge from identity is a general affective orientation, one that in fact reflects a widespread apprehension of danger.

The only individuals whose responses don’t display this generic affective orientation are ones who score highest on a general science comprehension assessment—the “Ordinary science intelligence” scale (OSI_2.0).  These respondents can successfully distinguish the climate impacts that scientists attribute to human activity from ones they don’t.

This discriminating pattern, moreover, characterizes the responses of the most science-comprehending members of the sample regardless of their cultural or political outlooks.

Yet even those individuals still don’t uniformly agree that human activity is causing global warming.

On the contrary, these citizens—the ones, again, who display the highest degree of science comprehension generally & of the mechanisms of climate change in particular—are also the most politically polarized on whether global warming is occurring at all.

Maybe not so surprising: what people “believe” about climate change, after all, doesn’t reflect what they know; it expresses who they are.

But still, what is going on inside their heads?

This is what one curious and perceptive member of the audience asked me at RMIT.  How, he asked, can someone simultaneously display comprehension of human-caused global warming and say he or she doesn't “believe in” it?

In fact, this was exactly what Yoshi and I had been struggling with in the hours before the RMIT talk.

Because I thought the questioner and other members of the audience deserved to get the benefit of Yoshi’s expansive knowledge and reflective mind, too, I asked Yoshi to come to the front and respond, which he kindly—and articulately—did.

Now, however, I’ll try my hand. 

In fact, I don’t have an answer that I’d expect the questioner to be satisfied with. That’s because I still don’t have an answer that satisfies me.

But here is something in the nature of a report on the state of my ongoing effort to develop a set of candidate accounts suitable for further exploration and testing.

Consider these four general cases of simultaneously “knowing” and “disbelieving”:

1. “Fuck you & the horse you rode in on!” (FYATHYRIO).  Imagine someone with an “Obama was born in Kenya!” bumper sticker. He in fact doesn’t believe that assertion but is nonetheless making it to convey his antagonism toward a segment of society. Displaying the sticker is a way to participate in denigration of that group’s status. Indeed, his expectation that others (those whom he is denigrating and others who wish to denigrate them) will recognize that he knows the proposition is false is integral to the attitude he intends to convey.  There is no genuine contradiction, in this case, between any sets of beliefs in the person’s mind.

2. Compartmentalization.  In this case, there is a genuine contradiction, but it is suppressed through effortful dissonance-avoiding routines.  The paradigmatic case would be the closeted gay man (or the “passing” Jew) who belongs to a homophobic (or anti-Semitic) group.  He participates in condemnation and even persecution of gays (or Jews) in contexts in which he understands and presents himself to be a member of the persecuting group, yet in other contexts, out of the viewing of that group’s members, he inhabits the identity, and engages in the behavior, he condemns.  The individual recognizes the contradiction but avoids conscious engagement with it through habits of behavior and mind that rigidly separate his experience of the identities that harbor the contradictory assessments.  He might be successful in maintaining the separation or he might not, and for longer or or shorter periods of time, but the effort of sustaining it will take a toll on his psychic wellbeing (Roccas & Brewer 2002).

3. Partitioning. In this case, too, the contradiction is real and a consequence, effectively, of a failure of information access or retrieval.  Think of the expert who possesses specialized knowledge and reasoning proficiencies appropriate to solving a particular type of problem.  Her expertise consists in large part in recognizing or assenting to propositions that evade the comprehension of the nonexpert.  The accessing of such knowledge, however, is associated with certain recurring situational cues; in the absence of those, the cognitive processes necessary to activate the expert’s consciousness and appropriate use of her specialized knowledge will fail. The expert will effectively believe in or assent to some proposition that is contrary to the one that she can accurately be understood to “know.”  The contradiction is thus in the nature of a cognitive bias. The expert will herself, when made aware of the contradiction, regard it as an error (Lewandowsky & Kirsner 2000).

4. Dualism. The contradiction here is once again only apparent—except that it is likely not even to appear to be one to the person holding the views in question. 

Everhart & Hameed (2013) describe the Muslim medical doctor who when asked states that he “rejects Darwinian evolution”: “Man was made by Allah—he did not descend from monkeys!” Nevertheless, the Dr. can readily identify applications of evolutionary science in his own specialty (say, oncology).  He also is familiar with and genuinely excited by medical science innovations, such as stem-cell therapies, that presuppose and build on the insights of evolutionary science.

With prodding, he might see that he is both “rejecting” and “accepting” a single set of propositions about the natural history of human beings.  But the identity of the propositions in this sense does not correspond to any identity of propositions within the inventory of beliefs, assessments, and attitudes that he makes use of in his everyday life.

Within that inventory, the “theory of evolution” he “rejects” and the “theory of evolution” he "accepts" are distinct mental objects (Hameed 2014).  He accesses them as appropriate to enable him to inhabit the respective identities to which they relate (D’Andrade 1981). 

Integral to the “theory of evolution” he “rejects” is a secular cultural meaning that denigrates his religious identity. His “rejection” of that object expresses—in his own consciousness, and in the perception of others—who he is as a Muslim. 

The “theory of evolution” he “accepts” is an element of the expert understandings he uses as a professional. It is also a symbol of the special mastery of his craft, a power that entitles those who practice it to esteem.  “Accepting” that object enables him to be a doctor. 

The “accepted” and “rejected” theories of evolution are understandings he accesses “at home” and “at work,” respectively.

But the context-specificity of his engagement with these understandings is not compartmentalization: there is no antagonism between the two distinct mental objects; no experience of dissonance in holding the sets of beliefs and appraisals that correspond to them; no need effortfully to cordon these sets off from one another. They are "entirely different things!," (he explains with exasperation to the still puzzled interviewer). 

It’s actually unusual for the two mental objects to come within sight of one another. “Home” and “work” are distinct locations, not only physically but socially: negotiating them demands knowledge of, and facility with, sets of facts, appraisals, and the like suited to the activities distinctive of each.

But if the distinct mental objects that are both called "theories of evolution" are summoned to appear at once, as they might be during the interview with the researcher, there is no drama or crisis of any sort. “What in the world is the problem,” the Dr. wonders, as the seemlingly obtuse interviewer continues to press him for an explanation.

So what should we make of the highly science comprehending individual who gets a perfect score on the OCSI but who, consistent with his cultural identity, states, “There is no credible evidence that human activity is causing climate change”?

I feel fairly confident that what’s “going on” in his or her head is neither FYATHYRIO nor “compartmentalization.”

I doubt, too, that this is an instance of “partitioning.”

“Dualism” seems like a better fit to me.  I think something like this occurs in Florida and other states, where citizens who are polarized on “climate change” make use of climate science in local decisionmaking.

But I do not feel particularly confident about this account—in part because even after constructing it, I still myself am left wondering, “But what exactly is going on in their heads?”

It’s not unusual—indeed, it is motivating and exhilarating—to discover that one’s understanding of some phenomenon that one is studying involves some imperfection or puzzle.

Nevertheless, in this case, I am also a bit unsettled. The thing to be explained took me by surprise, and I don’t feel that I actually have figured out the significance of it for other things that I do feel I know.

But after my talk at RMIT, I put all of this behind me, and proceeded to my next stop, where I delivered a lecture on “cultural cognition” and “the tragedy of the science communications commons.” 

You see, I am able to compartmentalize . . . .


D'Andrade, R.G. The cultural part of cognition. Cognitive science 5, 179-195 (1981).

Everhart, D. & Hameed, S. Muslims and evolution: a study of Pakistani physicians in the United States. Evo. Edu. Outreach 6, 1-8 (2013).

Hameed, S. Making sense of Islamic creationism in Europe. Unpublished manuscript (2014).

Kahan, D. M. Climate Science Communication and the Measurement Problem, Advances in Pol. Psych. (in press).

Lewandowsky, S., & Kirsner, Kim. Knowledge partitioning: Context-dependent use of expertise. Memory & Cognition 28, 295-305 (2000).

Roccas, S. & Brewer, M.B. Social identity complexity. Pers Soc Psychol Rev 6, 88-106 (2002).


I ♥ Item Response Theory -- and you can too!

As the 14 billion readers of this blog are aware, I’ve been working for the last 37 years—making steady progress all the while—on developing a “public science comprehension measure” suited for use in the study of public risk perception and science communication.

The most recent version of the resulting scale—“Ordinary Science Intelligence 2.0” (OSI_2.0)—informs the study reported in Climate Science Communication and the Measurement Problem. That paper also presents the results of a proto— public climate-science comprehension instrument, the “Ordinary Climate Science Intelligence” (OCSI_0.01).

Both scales were developed and scored using Item Response Theory.

Since I’m stuck on an 18-hour flight to Australia & don’t have much else to do (shouldn’t we touch down in Macao or the Netherlands Antilles or some other place with a casino to refuel?!), I thought I’d post something (something pretty basic, but the internet is your oyster if you want more) on IRT and how cool it is.

Like other scaling strategies, IRT conceives of responses to questionnaire items as manifest or observable indicators of an otherwise latent or unobserved disposition or capacity.  When the items are appropriately combined, the resulting scale will be responsive to the items’ covariance, which reflects their shared correlation with the latent disposition. At the same time, the scale will be relatively unaffected by the portions of variance in each item that are random in relation to the latent disposition and that should more or less cancel each out when the items are aggregated.

By concentrating the common signal associated with the items and muting the noise peculiar to each, the scale furnishes a more sensitive measure than any one item (DeVellis 2012).

While various scaling methods tend to differ in the assumptions they make about the relative strength or weight of individual items, nearly all treat items as making fungible contributions to measurement of the latent variable conceived of as some undifferentiated quantity that varies across persons.

IRT, in contrast, envisions the latent disposition as a graded continuum along which individuals can be arrayed. It models the individual items as varying in measurement precision across the range of that continuum, and weights the items appropriately in aggregating responses to them to form a scale (Embretson & Reise 2000). 

The difference in these strategies will matter most when the point of making measurements is not simply to characterize the manner in the which the latent disposition (“cultural individualism,” say) varies relative to one or another individual characteristic within a sample (“global warming risk concern”) but to rank particular sample members (“law school applicants”) in relation to the disposition (“critical reasoning ability”). 

In the former case, I’ll do fine with measures that enable me to sum up the “amount” of the disposition across groups and relate them to levels of some covariate of interest.  But in the latter case I’ll also value measures that enable me to discriminate between varying levels of the disposition at all the various points where accurate sorting of the respondents or test takers matter to me.

IRT is thus far and away the dominant scaling strategy in the design and grading of standardized knowledge assessments, which are all about ranking individuals in relation to some aptitude or skill of interest.

Not surprisingly, then, if one is trying to figure out how to create a valid public science comprehension instrument, one can learn a ton from looking at the work of researchers who use IRT to construct standardized assessments. 

Indeed, it’s weird to me, as I said in a previous post, that the development of pubic science comprehension instruments like the NSF Indicators (2014: ch. 7)—and research on public understanding of science generally—has made so little use of this body of knowledge.

I used IRT to help construct OSI_2.0.

Below are the “item response curves” of four OSI_2.0 items, calibrated to the ability level of a general population sample.  The curves (derived via a form of logistic regression) plot the probability of getting the “correct” answer to the specified items in relation to the latent “ordinary science intelligence” disposition. (If you want item wording, check out the working paper.)

One can see the relative “difficulty” of these items by observing the location of their respective “response curves” in relation to the y-axis: the further to the right, the “harder” it is.

Accordingly, “Prob1_nsf,” one of the NSF Indicators “science methods” questions is by far the easiest: a test taker has to be about one standard deviation below the mean on OSI before he or she is more likely than not to get this one wrong.

“Cond_prob,” a Bayesian conditional probability item from the Lipkus/Peters Numeracy battery, is hardest: one has to have a total score two standard deviations above the mean before one has a better than 50% chance of getting this one right (why are conditional probability problems so hard? SENCER should figure out how to teach teachers to teach Bayes’s’ Theorem more effectively!).

“Copernicus_nsf,” the “earth around the sun or sun around the earth?” item, and “Widgets_CRT,” a Cognitive Reflection Test item, are in between.

It's because IRT scoring weights items in relation to their difficulty—and, if one desires, in relation to their “discrimination,” which refers to the steepeness of the item-response curve slope (the steeper the curve, the more diagnostic a correct response is to the disposition level of the respnodent)—that one can use it to gauge a scale's variable measurement precision across the range of the the relevant latent disposition.

All 4 of these OSI_2.0 items are reliable indicators of the latent disposition in question (if they weren’t, the curves would be flatter).  But because they vary in difficulty, they generate more information about the relative level of OSI among heterogeneous test takers than would a scale that consisted, say, of four items of middling difficulty, not to mention four that were all uniformly easy or hard.

Indeed, consider:

The figures illustrate the variable measurement precision of two instruments: the NSF Indicators battery, formed by combining its nine “factual knowledge” and three “science methods” items; and a long (10-item) version of Frederick’s Cognitive Reflection Test (Frederick 2005). 

The “Test Information Curves” plotted in the left panel illustrate the relative measurement precision of each in relation to the latent dispositions each is measuring. Note, the disposition isn’t the same one for both scales; by plotting the curves on one graph, I am enabling comparative assessment of the measurement precision of the two instruments in relation to the distinct latent traits that they respectively assess.

Information” units are the inverse of the scale's measurement variance—a concept that I think isn’t particularly informative (as it were) for those who haven’t used IRT extensively enough to experience the kind of cognitive rewiring that occurs as one becomes proficient with a statistical tool. 

So the right-hand panel conveys the same information for each assessment in the form of a variable “reliability coefficient.”  It’s not the norm for IRT write-ups, but I think it’s easier for reflective people generally to grasp.

The reliability coefficient is conceptualized as the proportion of the variance in the observed score that can be attributed to variance in the "true score" or actual disposition levels of the examined subjects.  A test that was perfectly reliable—that had no measurement error in relation to the latent disposition—would have a coefficient of 1.0. 

Usually 0.7 is considered decent enough, although for “high stakes” testing like the SAT, 0.8 would probably be the lowest anyone would tolerate.

Ordinarily, when one is assessing the performance of a latent-variable scale, one would have a reliability coefficient—like Cronbach’s α, something I’ve mentioned now and again—that characterizes the measurement precision of the instrument as a whole.

But with IRT, the reliability coefficient is a continuous variable: one can compute it—and hence gauge the measurement precision of the instrument—at any specified point along the range of the latent disposition the instrument is measuring.

What one can see from the Figure, then, is that these two scales, while comparable in “reliability,” actually radically differ with respect to the levels of the latent disposition in relation to which they are meaningfully assessing individual differences. 

The NSF Indicators battery is concentrating most of its discrimination within the space between -1.0 and -2.0 SDs.  So it will do a really really good job in distinguishing people who are merely awful from those who outrageously awful.

You can be pretty confident that someone who scores above the mean on the test is at least average.  But the measurement beyond that is so pervaded with error as to make it completely arbitrary to treat differences in scores as representing genuinely different levels in ability.

The test is just too darn easy! 

This is one of the complaints that people who study public science comprehension have about the Indicators battery (but one they don’t voice nearly as often as they ought to).

The CRT has the opposite problem! 

If you want to separate out Albert Einstein from Johnny von Neumann, you probably can with this instrument! (Actually, you will be able to do that only if “cognitive reflection” is the construct that corresponds to what makes them geniuses; that’s doubtful.) The long CRT furnishes a high degree of measurement reliability way out into the Mr. Spock Zone of +3 SDs, where only about .01% (as in “one hundredth of one percent”) of the human population (as in 1 person in 10,000) hangs out.

In truth, I can’t believe that there really is any value in distinguishing between levels of reflection beyond +2.0 (about the 98th percentile) if one is studying the characteristics of critical reasoning in the general population. Indeed, I think you can do just fine in investigating critical reasoning generally, as opposed to grading exams or assessing admissions applications etc., with an instrument that maintains its reliability out to 1.8 (96th percentile).

There’d be plenty of value for general research purposes, however, in being able to distinguish people whose cognitive reflection level is a respectable average from those whose level qualifies them as legally brain dead.

But you can’t with this instrument: there’s zero discrimination below the population mean.

Too friggin’ hard!

The 10-item battery was supposed to remedy this feature of the standard 3-item version but really doesn't do the trick—because the seven new items were all comparably difficult to the original three.

Now, take a look at this:

These are the test information and IRT reliability coefficients for OSI 2.0 as well as for each of the different sets of items it comprises.

The scale has its highest level of precision at about +1 SD, but has relatively decent reliability continuously from -2.0 to +2.0.  It accomplishes that precisely because it combines sets of items that vary in difficulty.  This is all very deliberate: using IRT in scale development made it possible to select an array of items from different measures to attain decent reliability across the range of the latent "ordinary science intelligence" disposition.

Is it “okay” to combine the measures this way?  Yes, but only if it is defensible to understand them as measuring the same thing—a single, common latent disposition.

That’s a psychometric quality of a latent variable measurement instrument that IRT presupposes (or in any case, can’t itself definitively establish), so one uses different tools to assess that.

Factor analysis, the uses and abuses of which I’ve also discussed a bit before, is one method of investigating whether a set of indicators measure a single latent variable.

I’ve gone on too long—we are almost ready to land!—to say more about how it works (and how it doesn’t work if one has a “which button do I push” conception of statistics).  But just to round things out, here is the output from a common-factor analysis (CFA) of OSI_2.0. 

It suggests that a single factor or unobserved variable accounts for 87% of the variance in responses to the items, as compared to a residual second factor that explains another 7%. That’s pretty strong evidence that treating OSI_2.0 as a “unidimensional” scale—or a measure of a single latent disposition—is warranted.

At this point, the only question is whether what it is measuring is really “ordinary science intelligence,” or the combination of knowledge, motivations, and reasoning dispositions that I’m positing enable an ordinary citizen to recognize and give property effect to valid scientific evidence in ordinary decisionmaking contexts.

That’s a question about the “external validity” of the scale.

I say something about that, too, in “ ‘Ordinary Science Intelligence’: A Science Comprehension Measure for Use in the Study of Risk Perception and Science Communication,” CCP Working Paper No. 112.

I won’t say more now (they just told us to turn off electronic devices. . .) except to note that to me one of the most interesting questions is whether OSI_2.0 is a measure of ordinary science intelligence or simply a measure of intelligence.

A reflective commentator put this question to me.  As I told him/her, that’s a challenging issue, not only for OSI_2.0 but for all sorts of measures that purport to be assessing one or another critical reasoning proficiency . . . .

Holy smokes--is that George Freeman?!


DeVellis, R.F. Scale development : theory and applications (SAGE, Thousand Oaks, Calif., 2012).

Embretson, S.E. & Reise, S.P. Item response theory for psychologists (L. Erlbaum Associates, Mahwah, N.J., 2000).

 Frederick, S. Cognitive Reflection and Decision Making. Journal of Economic Perspectives 19, 25-42 (2005).

National Science Foundation. Science and Engineering Indicators (Wash. D.C. 2014).



Don't even think of going there: the "whose is bigger" question on climate science literacy

A curious correspondent posed these questions to me relating to scores on the "ordinary climate science intelligence" assessment:

My question is about the last figure in your posting here on your OCSI instrument and results.

The last figure is a historgram of the No. correct (on your OCSI instrument?) personal beliefs about warming causes (human, natural, no warming).

I have several questions:

1. INTERPRETATION of final figure. Am I interpreting your result correctly by concluding that it shows that you found that those believing in no warming had more correct than those who believed in natural causes of warming, who, in turn, scored higher than those who believed in human caused warming?

I am just asking about the absolute differences, not their statistical significance.

2. SAMPLE. How big was it and who were they? (undergrads, Mechanical Turk, something else, national representative...).

3. STATS. Were the differences in that final figure significant? And, regardless of significance, can you send along the effect sizes?

My responses:

You can get more information on the OCSI scale here: "Climate Science Communication and the Measurement Problem," Advances in Pol. Psych. (forthcoming).  But on your queries:

1. Interpretation. The last figure is a bar chart w/ number of correct for rspts who answered standard "belief in" climate change items that asked "[f]rom what you’ve read and heard, is there solid evidence that the average temperature on earth has been getting warmer over the past few decades" [yes/no]; and (if yes), "Do you believe that the earth is getting warmer (a) mostly because of human activity such as burning fossil fuels or (b) mostly because of natural patterns in the earth’s environment?"

You are eyeballing the differences in mean scores for the 3 groups-- "no warming," "naturally caused warming" and "human warming."  

But my interpretation would be that everyone did about the same.  Among all respondents -- regardless of the answer they gave to "believe in" global warming items -- there was a strong tendency  to attribute to climate scientists pretty much any conclusion that *sounded* consistent with  global warming being serious environmental risk.  Only respondents who were high in science comprehension generally avoided that mistake -- that is, identified accurately which "high risk" conclusions climate scientists have endorsed & which ones not.  Those rspts successfully did that regardless of how they answered the "believe in" question.  

That's why I think the responses members of the public give to surveys that ask whether they "believe in" human-caused global warming are eliciting an expression of an outlook or attitude that is wholly unrelated to what anyone knows or doesn't know about climate science or science generally.  Social scientists (myself included) and pollsters haven’t really understood in the past what items like this are actually measuring:  not what you know, but who you are.

2. Sample. US general population sample.  Stratified for national representativeness.  Recruited for on-line study by the firm YouGov, which uses sampling strategies shown to generate election result estimates at least as reliable as those generated by the major polling firms that still use random-digit dial (I'm basing this on Nate Silver's rankings).  In my view, only YG & GfK use on-line sampling techniques that are valid for studying the effect of individual differences -- cognitive & political -- on risk perceptions.  Mturk is definitely not valid for this form of research.

3. Stats. The diff between "no warming" & "human-caused warming" rspts was significant statistically -- but not practically. N = 2000 so even small differences will be statistically significant.  The difference in the mean scores of those 2 groups of rspts was a whopping 1/3 of 1 SD.  Whether respts were in "no warming," "human cauased warming" or "natural warming" classes explained about 1% of the variance in the the OCSI scores:

I reported "number of correct" in the figure b/c I figured that would be easier for readers to grasp but I scored results of the climate science literacy test with an IRT model and standardized the scores (so mean = 0, of course).  In regression output, belief in "human warming" is the reference group--so their score is actually the constant. 

The constant & the regression coefficients are thus the fractions of a standard deviation below or above average the different groups' performances were!

You can easily compute the means: human warmers = -0.12; natural warmers is 0.07; and no warmers 0.14.

It would be just as embarrassing --just as childish -- for "skeptics" to seize on these results as evidence that skeptics "know more" climate science as it would be for "believers" to keep insisting that a knowledge disparity explains the conflict over climate change in US society.

So don't go there, pls...

But if you have thoughts, reactions, comments, suggestions, disagreements, etc. -- particularly based on analyses as they appear in draft paper -- please do share them w/ me.


Cross-cultural cultural cognition road trip: Australia

I was soooo psyched that Guy, S., Kashima, Y., Walker, I. & O'Neill, S. Investigating the effects of knowledge and ideology on climate change beliefs. European Journal of Social Psychology 44, 421-429 (2014) were able to make good use of the cultural cognition worldview scales in their study of Australians' perceptions of beliefs about climate change that I've decided to go & investigate matters there for myself.

I'll be giving these open lectures in Melbourne next week:

Thursday, Aug. 14

12-2:00 pm: "What is 'cultural cognition'? I'll show you!"
Monash Univ. Building 8, Theatre R7

6- 7:00 pm: "Climate science communication & the Measurement Problem 
RMIT Univ., Kaleide Theatre

I'm very excited about the trip.  In addition to exchanging ideas with scholars and curious people generally, I look forward to meeting some of the local luminaries including:


Carl Williams: Local buisness community leader


Johnny Ibrahim: Nightclub owner

Ron Clarke: C'mon! Everyone knows him!!!!



What would a *valid* measure of climate-science literacy reveal? Guy et al., Effects of knowledge & ideology part 2

This is part 2 of my “journal club report”  on the very interesting paper Guy, S., Kashima, Y., Walker, I. & O'Neill, S. Investigating the effects of knowledge and ideology on climate change beliefs. European Journal of Social Psychology 44, 421-429 (2014).

GKW&O correllate a sample of 300 Australians’ “climate literacy” scores with their cultural worldviews & their “belief in” human-caused climate change and related perceptions.

Last time I explained why I didn’t understand how GKW&O could construe their data as suggesting that “knowledge can play a useful role in reducing the impact of ideologies on climate change opinion.”

In some sense, this statement is a tautology: insofar as “knowledge” is defined as accepting evidence that “human beings are causing climate change,” then, of course increasing the “knowledge” of individuals who are ideologically predisposed to be skeptical will “reduce” their skepticism, (that’s what GKW&O are getting at) and thus mute ideological polarization.

That claim is empty: it's like saying "getting skeptics to believe evidence in climate change would help to counteract skepticism."

The question is how to “increase knowledge” of those who are culturally predisposed to dismiss valid evidence of climate change. 

GKW&O imply that all one has to do is communicate the “facts” about climate change to them. 

But nothing in their data suggest that would be a particularly useful strategy. 

That’s what climate advocates have been focusing on for over a decade.  And notwithstanding that, people remain culturally polarized on what the facts are.

The best explanation for that—one supported by ample observational and experimental data—is that individuals selectively credit or discredit information on climate change based on its consistency with their cultural predispositions.

If this is what's going on, then one would expect to see a correlation between ideology (or cultural worldviews) & "knowledge" of the evidence of human-caused climate change.

That’s exactly that GKW&O’s own data in fact show.

Maybe I’m missing something and either they or others will point out what it is!

Okay-- that was last time!

But now  I'd like to  I’d like to address GKW&O's “climate literacy” scale.

I’m really interested in this aspect of their cool paper b/c how to measure what people’s comprehension of climate change science is a problem I myself have been trying to solve recently

Validly measuring what people actual understand about climate change is in fact a very difficult thing to do! 

There are two related reasons for this.  One is that, in general, people’s perceptions of societal risks reflect general affective orientations—pro- or con- -- toward the putative risk source.  Any more specific perception one assesses—how large the risk is, whether there are an offsetting benefits, etc.—will be an expression of that (Loewenstein et all. 2000).

Accordingly, if one tries to measure what people “know” about the putative risk  sourcein question, what one's really likely to be measuring  is just their pro- or con- affect toward it.  There's little reason to think their emotional response to the risk source reflects genuine comprehension of the evidence.  On the contrary, people’s understanding of what the “evidence” is on an environmental and health risk (nuclear power generation, smoking, contaminated ground water, etc.) is more likely to be a consequence of than a cause of their affective orientation toward it (Slovic et al. 2004).

The second problem—one that clearly comes into play with climate change—is that individuals’ affective orientation toward the putative risk source is itself likely to be a measure or expression of their cultural worldview, which invests the asserted risk with cultural meanings.

Affect—a spontaneous perception or feeling—is the cognitive mechanism through which worldviews shape risk perceptions (Peters, Burraston, & Mertz 2004; Kahan 2009).

Accordingly, when one asks people whether they “agree” or “disagree” with propositions relating to a putative risk source, the responses will tend to reflect their worldviews. Such items won’t be measuring what people know; it will be measuring, in effect, who they are, culturally speaking.

This is exactly what scholarly researchers who’ve investigated public “climate literacy” have repeatedly found (Tobler, Visschers, & Siegrist 2012; Reynolds et al. 2010; Bostrom et al. 1994; ).  Their studies have found that the individuals who tend to get the right answer to questions about the contribution of human activities to climate change (e.g., that burning fossil fuels increases global temperatures) are also highly likely to give the wrong answers to questions about the contribution of other environmentally damaging behavior that are in fact unrelated to climate change (e.g., industrial sulfur emissions).

The people who tend to get the latter questions right, moreover, are less likely to correctly identify which human activities do in fact contribute to global warming.

The conclusion of these studies is that what people “believe” about climate change doesn’t reflect what they “know” but rather reflects a more general affective orientation—pro or con- -- toward environmental risk, the sort of stance that is itself known to be associated with competing worldviews.

In my Measurement Problem paper, I present the results of a “climate science comprehension” test that includes various features designed to unconfound or disentangle affective indicators of people’s identities from their actual knowledge of climate science. The items were more fine-grained than “are humans causing climate change,” and thus less proximate to the antagonistic meanings that evoke identity-expressive responses to questions about this topic.

In addition, the “true-false” propositions comprising the battery were introduced with the phrase “Climate scientists believe . . . .” This device, which has been used to mitigate the cultural bias of test items on evolution when administered to highly religious test takers, distances the respondent from the response, so that someone who is culturally predisposed to skepticism can reveal his or her awareness of the prevailing expert opinion without being put in the position of making an “affirmation” of personal belief that denigrates his or her identity.

This strategy seemed to work pretty well.  I found that there wasn’t the sort of bimodal distribution that one gets when responses to test items reflect the opposing affective orientations of test-takers.

Even more important, scores on the instrument increased in step with respondents’ scores on a general science comprehension test regardless of their political ideology.

This is important, first, because it helps to validate the instrument—one would expect those who are better able to acquire scientific information generally would acquire more of it about climate change in particular.

Second and even more important, this result confirmed that the test was genuinely measuring what people know and not who they are.  Because items on “belief in” climate change do measure cultural identity rather than knowledge, responses to them tend to become more polarized as people become more proficient in one or another of the reasoning dispositions associated with science comprehension.  In the Measurement Problem “climate science literacy” battery, high science-comprehending test-takers scored highest precisely because they consistently gave correct answers to items that they would have gotten wrong if they were responding to them in a manner that expressed their cultural identities.

Constructing a test that disentangled "knowlege" from "identity," of course, confirmed that in fact what people "believe" about human-caused climate change has zero to do with what they know.

But my scale is an admittedly a proto- assessment instrument, a work-in-progress.

I was excited, then, to see the GKW&O results to compare them with my own.

GKW&O treat their “climate literacy” battery as if were a valid measure of knowledge (they call it a “specific [climate change] knowledge” measure, in fact).

Did they succeed, though, in overcome problem researchers have had with the entanglement between affect and identity, on the one hand, and knowledge, on the other?

Frankly, I can’t tell.  They don’t report enough summary data about the responses to the items in their battery, including their individual correlations with “belief in” climate change and with cultural worldviews.

But there is good reason to think they didn’t succeed.

GKW&O asked respondents to indicate which of nine human activities are & which are not “causes” of climate change: 

  • nuclear power generation
  • depletion of ozone in the upper atmosphere
  • pollution/emissions from business and industry
  • destruction of forests
  • people driving their cars
  • people heating and cooling their homes
  • use of chemicals to destroy insect pests
  • use of aerosol spray cans
  • use of coal and oil by utilities or electric companies

They reported that the “true” cause items (in green above) and the “false” cause ones (red) did not form a reliable, unitary scale:

Internal reliability was somewhat less than satisfactory (α = .60). To investigate this issue, items were divided to form two subscales according to whether they represented ‘causes’ or ‘non causes’ and then reanalyzed. This considerably improved the reliability of the scales (α = .89 for ‘knowledge of causes’ scale and α = .75 for the ‘knowledge of non causes’ scale). However, the distributions of the separated scales were highly skewed. Thus, it was decided to proceed with the 9-item knowledge scale, which had a more acceptable distribution.

In other words, the item covariances were more consistent with the inference that they were measuring two separate dispositions: one to correctly identify “true causes” and the other to correctly identify “false causes.”  

The items didn’t form a reliable measure of a single latent trait—one reflecting a disposition to give consistently correct responses on the “causes” of climate change—because respondents who did well on the “true cause” scale were not the ones who did well on the “false cause” ones & vice versa.

Who were these two groups of respondents?  It’s not possible to say because, again, GKW&O didn’t report enough summary data for a reader to figure this out.

But the pattern is certainly consistent with what one would expect to see if individuals culturally predisposed to climate belief did better on the “true cause” items and those culturally predisposed to climate skepticism better on the “false cause” ones.

In that case, one would conclude that the GKW&O “climate literacy” battery isn’t a valid measure of knowledge at all; it would be just a conglomeration of two oppositely valenced affective measures.

GKW&O report that the “score” on their conglomerate battery did correlate negatively with both cultural “hierarchy” and cultural “individualism.”

This could have happened, consistent with my surmise, because of the conglomerate scale had more “true cause” than “false cause” items, and thus more climate-concerned than climate-skeptical affect items.  The effect this imbalance would have created in the correlation between “number correct” and the cultural worldview scales would have been magnified if on, say, the “nuclear power” question, subjects of both types were more evenly divided (a result I’ve sometimes observed in my own work).

But I am admittedly conjecturing here in trying to discern exactly why GKW&O’s “specific knowledge” battery failed to display the characteristics one would demand of a valid measure of climate-science knowledge.  The paper didn’t report enough results to be sure.

I hope GKW&O will say more about this issue—maybe even in a guest blog here!—since these are really interesting issues and knowing more about their cool data would definitely help me and others who are struggling to try to overcome the obstacles I identified to constructing a valid climate-science comprehension measure.

I’m still working on this problem, btw!

So in closing, I’ll show you the results of some additional candidate “climate science literacy” items that I recently tested on a diverse sample of Southeast Floridians.

I used the same “identity-knowledge disentanglement” strategy with these as I did with items in the Measurement Problem battery.  I think it worked in that respect.

And I think the results support the following inferences:

1. Neither Rs nor Ds know very much about climate change.

2. Both have “gotten the memo” that climate scientists believe that humans are causing climate change and that we face serious risks as a result.

3. It’s crazy to think that that ideological variance in “belief in” human-caused climate change has anything to do with a knowledge disparity between Rs and Ds.

What do you think?


 Bostrom, A., Morgan, M.G., Fischhoff, B. & Read, D. What Do People Know About Global Climate Change? 1. Mental Models. Risk Analysis 14, 959-970 (1994). 

Kahan, D.M. Nanotechnology and society: The evolution of risk perceptions. Nature Nanotechnology 4, 705-706 (2009).

Loewenstein, G.F., Weber, E.U., Hsee, C.K. & Welch, N. Risk as Feelings. Psychological Bulletin 127, 267-287 (2001).

Peters, E.M., Burraston, B. & Mertz, C.K. An Emotion-Based Model of Risk Perception and Stigma Susceptibility: Cognitive Appraisals of Emotion, Affective Reactivity, Worldviews, and Risk Perceptions in the Generation of Technological Stigma. Risk Analysis 24, 1349-1367 (2004).

Reynolds, T.W., Bostrom, A., Read, D. & Morgan, M.G. Now What Do People Know About Global Climate Change? Survey Studies of Educated Laypeople. Risk Analysis 30, 1520-1538 (2010).

Slovic, P., Finucane, M.L., Peters, E. & MacGregor, D.G. Risk as Analysis and Risk as Feelings: Some Thoughts About Affect, Reason, Risk, and Rationality. Risk Analysis 24, 311-322 (2004).

Tobler, C., Visschers, V.H.M. & Siegrist, M. Addressing climate change: Determinants of consumers' willingness to act and to support policy measures. Journal of Environmental Psychology 32, 197-207 (2012).



Does "climate science literacy trump ideology" in Australia? Not as far as I can tell! Guy et al., Effects of knowledge & ideology part 1

It was so darn much fun to report my impressions on Stocklmayer, S. M., & Bryant, C. Science and the Public—What should people know?, International Journal of Science Education, Part B, 2(1), 81-101 (2012), that I thought I’d tell you all about another cool article I read recently:

Guy, S., Kashima, Y., Walker, I. & O'Neill, S. Investigating the effects of knowledge and ideology on climate change beliefs. European Journal of Social Psychology 44, 421-429 (2014).


GKW&O report the results of an observational study (a survey, essentially!) on the respective contributions that cultural cogntion worldviews and “climate science literacy” make to belief in human-caused global warming and to understanding of the risks it poses.

Performing various univariate and multivariate analyses, they conclude that both cultural worldviews and climate science literacy (let’s call it) have an effect.

Might not sound particularly surprising.

But it is critical to understand that the GKW&O study is a contribution to an ongoing scholarly conversation.

It is a response, in fact, to Cultural Cognition Project (CCP) researchers and others who’ve conducted studies showing that greater “science literacy,” and higher proficiency in related forms of scientific reasoning (such as numeracy and critical reflection), magnify cultural polarization on climate change risks and related facts.

The results of these other studies are thought to offer support for the “cultural cognition thesis” (CCT), which states, in effect, that “culture is prior to fact.”

Individuals’ defining group commitments, according to CCT, orient the faculties they use to make sense of evidence about the dangers they face and hwo to abate them.

As a result, individuals can be expected to comprehend and give appropriate effect to scientific evidence only when engaging that information is compatible with their cultural identities.  If the information is entangled in social meanings that threaten the status of their group or their standing within it, they will use their reasoning powers to resist crediting that information.

Of course, “information” can make a difference!  But for that to happen, the entanglement of positions in antagonistic cultural meanings must first be dissolved, so that individuals will be relieved of the psychic incentives to construe information in an identity-protective way.

GKW&O meant to take issue with CCT.

The more general forms of science comprehension that figured in the CCP and other studies, GKW&O maintain, are only “proxy measures” for climate science comprehension.  Because GKW&O measure the latter directly, they believe their findings supply stronger, more reliable insights into the relative impact of “knowledge” and “ideology” (or culture) on climate change beliefs.

Based on their results, GKW&O conclude that it would be a mistake to conclude that “ideology trumps scientific literacy.” 

“The findings of our the findings of our study indicate that knowledge can play a useful role in reducing the impact of ideologies on climate change opinion.”


There are many things to like about this paper! 

I counted 487 such things in total & obviously I don’t have time to identify all of them. I work for a living, after all.

But one includes the successful use of the cultural cognition worldview scales in a study of the risk perceptions of Australians

Oh—did I not say the GKW&O collected their data from Australian respondents?  I should have!

I’ve discussed elsewhere some “cross-cultural cultural cognition” item development I had helped work on.  Some of that work involved consulation with a team of researchers adapting the cultural cognition scales for use with Australian samples.

So it’s really cool now to see Australian researchers using the worldview measures (which GKW&O report demonstrated a very high degree of scale reliability) in an actual risk-perception study.

Another cool thing has to do with the GKW&O “climate literacy” battery.  In fact, there are multiple cool things about that part of the study.

I’m very excited about this aspect of the paper because, as is well known to all 16 billion readers of this blog (we are up 4 billion! I attribute this to the Ebola outbreak; for obvious reasons, this blog is the number one hit when people do a google search for “Ebola risk”), I myself have been studying climate science comprehension and its relation to political polarization on “belief” in human-caused climate change and related matters.  I find it quite interesting to juxtapose the results of GKW&O with the ones I obtained.


But before I get to that, I want to say a little more about exactly what the GKW&O results were.

In fact, the data GKW&O report don’t support the conclusion that GKW&O themselves derive from them. 

On the contrary, they reinforce the cultural cognition thesis.

GKW&O are incorrect when they state that general science comprehension was conceptualized as a “proxy” for climate change literacy in CCP study, Kahan, D.M., Peters, E., Wittlin, M., Slovic, P., Ouellette, L.L., Braman, D. & Mandel, G. The polarizing impact of science literacy and numeracy on perceived climate change risks. Nature Climate Change 2, 732-735 (2012) ( Nature Climate Change study),  that they are responding to.

On the contrary, the entire premise of the Nature Climate Change study was that members of distinct cultural groups differ in their climate science literacy:  they are polarized on the significance of what the best available evidence on climate change signifies.

The point of the study was to test competing hypotheses about why they we aren’t seeing public convergence in people’s understanding of the best available evidence on global warming and the dangers it poses.

One hypothesis—the “science comprehension thesis” (SCT)—was that the evidence was too hard for people to get. 

People don’t know very much science.  What’s more, they don’t think in the systematic, analytical fashion necessary to make sense of empirical evidence but instead really on emotional heuristics, including “what do people like me think?!”

The use of a general science comprehension predictor in the study was selected as appropriate for testing the SCT hypothesis. 

If SCT is right—if public confusion and conflict over climate change is a consequence of their over-reliance on heuristic substitutes for comprehension of the evidence—then we would expect polarization to abate as members of culturally diverse groups become more science literate and more adept at the forms of critical reasoning necessary to understand climate science.

But that’s not so. Instead, as general science comprehension increases, people become more polarzied in their understandings of the significance of the best evidence on climate change.

So this evidence counts against the SCT explanation for public contorversy over climate change.

By the same token, this evidence supports the “cultural cognition thesis”—that “culture is prior to fact”: if critical reasoning is oriented by and otherwise enabled by cultural commitments, then we’d expect people who who are more proficient at scientific reasoning to be even more adept at using their knowledge and reasoning skills to find & construe evidence supportive of their group’s position.

There is nothing in GKW&O that is at all at odds with these inferences. 

On the contrary, the evidence they report is precisely what one would expect if one started with the cultural cognition thesis.

They found that there was in fact a strong correlation between their respondents’ cultural worldviews and their “climate science literacy.” 

Hamilton et al.: More science literacy, more polarization on what climate science saysThat is what the cultural cognition thesis predicts: culturally diverse individuals will fit their understanding of the evidence to the positition that predominates in their group.

It's exactly what other studies have found.

And it was, as I said, the premise of the Nature Climate Change study.

Of course, in itself, this correlation is consistent with SCT, too, insofar as cultural cognition could be understood to be a heuristic reasoning alternative to understanding and making use of valid scientific information.

But that’s the alternative explanation that the  Nature Climate Change study—and others—suggest is unsupported: if it were true, then we’d expect culturally diverse people to converge in their assessments of climate change evidence, not become even more polarized, as they become more science comprehending.

The basis for GKW&O’s own interpretation of their data—that it suggests “information” can “offset” or “attenuate” the polarizing impact of cultural worldviews—consists in a series of multivariate regression analyses. The analysies, however, just don't support  their inference.

There is, of course, nothing at all surprising about finding a correlation between “climate science literacy”—defined as agreement with claims about how human activity is affecting the climate—and “belief in human caused climate change.”

Indeed, it is almost certainly a mistake to treat them as distinct.  People generally form generic affective orientations toward risks. The answers they give to more fine-grained questions—ones relating to specific consequences or causal mechanisms etc.—are just different expressions of that

In our study of science comprehension & climate change beliefs, we used the “Industrial Strength Risk Perception Measure” because it has already been shown to correlate 0.80 or higher w/ any more specific “climate change” question one might ask that is recognizable to people, including whether global warming is occurring, whether humans are causing it, and whether it is going to have bad consequences. 

Psychometrically, all of these questions measure the same thing.

GKW&O conclude that the effect of cultural worldviews and climate-science literacy are “additive” in their effect on climate change “beliefs” because their climate-science literacy scale correlates with “belief climate change is occurring” and “belief climate change is human caused” even after “controlling” for cultural world views.

But obviously when you put one measure of an unobserved or latent variable on the right-hand (“independent variable”) side of a regression formula and another measure of it on the left (“dependent” or “outcome variable”) side, the former is going to “explain” the latter better than anything else you include in the model! 

At that point, variance in the unobserved variable (here an affective attitude toward climate change) is being used to “explain” variance in itself.

The question is –what explains variance in the latent or unobserved variable for which “belief” in human caused climate change and the climate literacy scale items are both just indicators?

As noted, GKW&O’s own data support the inference that cultural worldviews—or really the latent sense of group identity for which the worldview variables are indicators!—does.

GKW&O also present a regression analysis of “beliefs” in climate change that shows that there are small interactions between the cultural outlook scales and their measure of climate-science literacy. 

Because in one of the models, the interaction between climate-science literacy and Individualism was negative, they conclude that “knowledge dampen[s] the negative influence of individualist ideology on belief in climate change.”

An interaction measures the effect of one predictor conditional on the level of the other.  So what GKW&O are reporting is that if relatively individualist people could be made to believe in evidence that humans cause climate change, that increased belief would have an even bigger impact on whether they believe climate change is happening than it would on relative communitarian people.

It’s very likely that this result is a mathematical artifact: since communitarians already strongly believe in climate change, modeling a world in which communitarians believe even more strongly that humans are causing it necessarily has little impact; individualists, in contrast, are highly skeptical of climate change, so if one posits conditions in which individualists “believe” more strongly that humans are causing climate change, there is still room left in the scale for their “belief in human caused climate change” to increase.

But even if we take the result at face value, it doesn’t detract at all from the cultural cognition thesis.

Yes, if a greater proportion of individualists could be made to believe that scientific evidence shows humans are causing climat echange, then more of them would believe in climate change. 

The question, though, is why don’t they already believe the evidence? 

GKW&O’s own data suggest that cultural worldviews “explain” variance in acceptance of evidence on climate change. 

And we know that it’s not plausible to say that the reason individualists don’t believe the scientific evidence isn’t that they can’t understand it: in the real world, as they become more science comprehending and better at critical reasoning, persons with these outlooks become even more skeptical.

Finally, there are now considerable experimental data showing that people—of all cultural outlooks—selectively credit and discredit evidence on climate change and other culturally polarized issues conditional on whether it supports or conflicts with the view that is predominant in their group.  Indeed, the more science comprehending, the more numerate, and the more cognitively reflective they are, the more aggressively they culturally filter their appraisals of empirical evidence.

GKW&O in fact recognize all of this.

At the end of the paper, they qualify their own conclusion that “specific climate change knowledge positively influences people’s belief in climate change,” by noting that “it is possible the reverse is true”: their correlational data are just as consistent with the inference that individuals are selectively crediting or discrediting evidence based on its cultural congeniality, a process that would produce precisely the correlation they observe between cultural worldviews and “climate science literacy.” 

As I indicated, that’s the causal inference best supported by experimental data.

But none of this detracts from how interesting the study is, and in particular how intriguing GKW&O’s data on climate-science literacy are.

I’ll have more to say about that “tomorrow”! 

Part 2


Scaling up the SENCER solution to the "self-measurement paradox"

I wasn’t kidnapped by aliens (outerspace or undocumented) last week but I nevertheless had an experience that was just as interesting.

I attended the annual SENCER—Science Education for New Civic Engagements and Responsibilities--Summer Institute.

This was my second Summer Instiute-- I wrote home about my experience last yr too. 

Basically, the raison d'etre of this super cool organization is to obliterate the “self-measurement paradox”: the bizarre and scandalous failure of professions that traffic in scientific knowledge to use science's signature methods of producing knowledge to assess and refine their own craft norms.

(In what one might have suspected was clever self-parody but for the earnest intelligence with which the information was presented and the seriousness with which it was received, the Institute opened with a great 15-minute session on the latest data on effective notetaking strategies--so that we could be sure to maximize retention of all the insights to be imparted in the ensuing lectures, seminars, and workshops.)

Up to now, SENCER has pursued this mission—relentlessly, doggedly—mainly in the domain of science & math instruction.

Its members are constantly creating, testing, tweaking, and sharing their experiences with teaching techniques (grading ones too) and tools for empirically assessing them.

A highlight at this yr’s Institute was a status report from a team at West Point, which is in its third year in a project to “SENCERize” its curriculum.

But lately SENCER has been broadening out. 

It has already made a foray into popular science culture: we heard from KQED's Sue Ellen McCann and Andrea Aust about that flagship PBS station's use of empirical methods to make their programs as engaging and accessible to as large and diverse an audience as possible.

And this year, one of the major themes was how to advance empirical understanding of the processes by which scientific knowledge is recognized and given proper effect in public decisionmaking.

That’s a major interest of mine, of course.  Indeed, what made me so excited about the program last year was the prospect of using the “SENCER model” (which itself involves creating models for adaptation and use by others) to bridge the the unconscionable gap  between the practices of science and science-informed policymaking, on the one hand, and the science of science communication, on the other.

So I was really psyched to participate this year in various Institute programs dedicated to focusing SENCER membrers’ attention on this objective.

There were various sessions relating, essentially, to the need for developing an "evidence based" politics in support of evidence-based policymaing.

I played a lead role in three.

In one, I oversaw participants’ engagement with a pair of “vaccine risk case studies” (materials here).

Case study number 1 featured the introduction of the HPV vaccine into the U.S.  The materials were designed to enable participants to assess who knew what about what—including other relevant actors’ intentions—as of late 2005.

Merck, manufacturer of the HPV vaccine Garadosil, was then planning to apply for fast-track FDA approval a girls-only HPV shot.

It was also seeking the assistance of the womens’ groups to organize a nationwide press for adoption of state legislation mandating vaccination (of girls) as a condition of middle school enrollment.

Women’s health advocates were trying to figure out whether to accept Merck’s proposal.

Religious and social groups had signaled that they were not opposed to approval of the vaccine but would oppose mandatory vaccination legislation.

At least some public health officials were worried that this process—geared, they thought, to enabling Merck to create a dominant market position for Gardasil before GlaxoSmithKline obtained approval for its rival HPV vassine—was likely to enmesh the HPV vaccine in toxic political controversy.

But what were thos nervous Nellies & Nigels so concerned about?

Just a few years earlier, the CDC had recommended that the the HBV vaccine—for hepatitis-b, also a sexually transmitted disese—be included as a universal childhood vaccination, and nearly all the states had added it to their mandatory school-enrollment schedules without fuss.

In addition, there was survey evidence showing that parents would happily accept the HBV vaccine for their daughters if that’s what their pediatricians recommended.

But sure enough, the FDA's approval of Gardosil for girls only, followed by the CDC's recommendation that the vaccined be added to the universal vaccination schedule and then by the Merck-sponsored legislative campaign ignited a polarizing firestorm of cultural controvesy.  Not only did only 1 state enact an HPV mandate (that would be Virginia, in excange for Merck's promise to build a huge manufacturing plant there; maybe they gave the Governor a Rolex too?), but to this day vaccination rates remains anemic (not only for girls but for boys, too; the vaccine was approved for them just 3 yrs approval for girls) b/c of continuing ambivalence about the shot's safety and efficacy.

Could the relevant actors have reasonably anticipated the controversy that unfolded over 2007-2010?  Whose responsibility was it to try to get more info—and who was supposed to do what with it?

Case study 2 involved childhood vaccinations.

The materials were aimed at assessing whether the relevant players—here, government public health agencies, advocacy groups, medical professional associations, philanthropic groups, and the news media—are responding appropriately now to anxiety over public “vaccine hesitancy” in a manner that takes account of the lessons to be learned from the HPV disaster.

Doesn't look like it to me....

The discussion was great -- I was really impressed by how readily the participants saw the complexity of the issues (aversion to recognition of complexity is actually the root of all social dysfunction, in my view; that's such a simple & obvious point, why argue it?) 

My second session was a keynote talk (slides here) on the “Science Communication Measurement Problem.” I shared with the audience data showing that climate-science communication is hobbled by the failure of those engaged in it to disentangle -- both for purposes of measurement and for purposes of practical action -- people's knowledge from their expression of their cultural identities.

Unsurprisingly, people ooo'ed & ahhhh'ed when I displayed my sexy item response curves!

Finally, there was a session in the nature of a seminar or group discussion about how to leverage to the political realm insights that science educators. formal and informal, have acquired about promoting public engagement with controversial science issues.

Science teachers, museum directors, and extension professionals, among others, all shared their experiences with the phenomenon of  knowledge-identity “entanglement”--and techniques they've used to dissolve it. 

We came away with a rich set of conjecture—and a shared sense of resolve to test them with structured, empirical research programs.

Beyond that, we had nothing in common--no disciplinary or insitutional affiliations, no set of cultural commitments, no cause.  

Believe it or not, that's why I find SENCER so inspiring.

The talented and passionate people who are part of SENCER, I've learned, care about only one thing: using science to dispel any obstacle to the acquisition of scientific knowledge by free and reasoning individuals--students, professionals, educators, citizens--to use as they see fit.

The spirit of SENCER is a monument to the affinity of science and liberal values. 


How would scientists do on a public "science literacy" test -- and should we care? Stocklmayer & Bryant vs. NSF Indicators “Science literacy” scale part 2

So . . . this is the second post on the interesting paper Stocklmayer, S. M., & Bryant, C. Science and the Public—What should people know?, International Journal of Science Education, Part B, 2(1), 81-101 (2012)

Skip ahead to the bolded red text if you still vividly remember the first (as if it were posted “only yesterday”) or simply don’t care what it said & want to go straight to something quite interesting—the results of S&B's admnistration of a public “science literacy” test to trained scientists.

But by way of review, S&B don’t much like the NSF Science Indicators “factual knowledge” questions, the standard “science literacy” scale used in studies of public science comprehension.

The basic thrust of their  critique is that the Indicators battery is both undertheorized and unvalidated.

It’s “undertheorized” in the sense that no serious attention went into what the test was supposed to be measuring or why

Its inventors viewed public “science literacy” to be essential to informed personal decisionmaking, enlightened self-government, and a productive national economy. But they didn’t address what kinds of scientific knowledge conduce to these ends, or why the odd collection of true-false items featured in the Indicators (“Lasers work by focusing sound waves”; “The center of the earth is very hot”) should be expected to assess test takers’ possession of such knowledge.

The NSF “science literacy” test is unvalidated in the sense that no evidence was offered—either upon their introduction or thereafter—that scores on it are meaningfully correlated with giving proper effect to scientific information in any particular setting.

S&B propose that the Indicators battery be scrapped in favor of an assessment that reflects an “assets-based model of knowledge.” Instead of certifying test takers’ assimilation of some canonical set of propositions, the aim of such an instrument would be to gauge capacities essential to acquiring and effectively using scientific information in ordinary decisionmaking.

I went through S&B’s arguments to this effect last time, and why I found them persuasive. 

I did take issue, however, with their conclusion that the Indicators should simply be abandoned. Better, I think, would be for scholars to go ahead and use the Indicators battery but supplement it as necessary with items that validly measure the aspects of science comprehension genuinely relevant to their analyses.

It is more realistic to think a decent successor to the Indicators battery would evolve from this sort of process than it is to believe that a valid, new science comprehension scale will be invented from scratch.  The expected reward to scholars who contribute to development of the latter would be too low to justify the expected cost they’d incur, which would include having to endure the unwarranted but predictable resistance of many other scholars who are professionally invested in the Indicators battery.

Okay!  But I put off for “today’s” post a discussion of S&B’s very interesting original study, which consisted of the administration of the Indicators battery (supplemented with some related Eurobarometer “factual knowledge” items) to a group of 500 scientists.

click to see how the scientists did!The scientists generally outscored members of the public, although not by a very large margin (remember, one problem with the NSF battery is that it's too easy—the average score is too high to enable meaningful investigation of variance).

But the more interesting thing was how readily scientists who gave the “wrong” answer were able to offer a cogent account of why their response should in fact be regarded as correct.

For example, it is false to say the “the center of the earth is very hot,” one scientist pointed out, if we compare the temperature there to that on the surface of the sun or other stars.

Not true, 29% of the sample said, in response to the statement, “It is the father’s genes that determine whether the baby is a boy or girl”—not because “it is the mother’s genes” that do so but because it is the father’s chromosome that does.

No fair-minded grader would conclude that these scientists’ responses betray lack of comprehension of the relevant “facts.”  That their answers would be scored “incorrect” if they were among the test takers in an Indicators sample, S&B conclude, “cast[s] further doubts upon the value of such a survey as a tool for measurement of public ‘knowledge.’ ”

If I were asked my opinion in a survey, I’d “strongly disagree” with this conclusion!

Indeed, in my view, the idea that the validity of a public science comprehension instrument should be assessed by administering it to a sample of scientists reflects the very sort of misunderstandings—conceptual and psychometric—that S&B convincingly argue are reflected in the Indicators battery.

S&B sensibly advocate an “assets-based” assessment as opposed to a knowledge-inventory one.

Under the former, the value of test items consists not in their corroboration of a test taker's cathechistic retention of a list of "foundational facts" but rather in the contribution those items make to measuring a personal trait or capacity essential to acquiring and using relevant scientific information.

The way to validate any particular item, then, isn't to show that 100%--or any particular percent—of scientists would “agree” with the response scored as “correct.”

It is to show that that response genuinely correlates with the relevant comprehension capacity within the intended sample of test takers.

Indeed, while such an outcome is unlikely, an item could be valid even if the response scored as “correct” is indisputably wrong, so long as test takers with the relevant comprehension capacity are more likely to select that response.

This point actually came up in connection with my proto- climate-science comprehension measure

That instrument contained the item “Climate scientists believe that if the North Pole icecap melted as a result of human-caused global warming, global sea levels would rise—true or false?”

“False” was scored as correct, consistent with public-education outreach material prepared by NOAA and NASA and others, which explain that the “North Pole ice cap,” unlike the South Pole one, “is already floating,” and thus, like “an ice cube melting in a glass full of water,” already displaces a volume of water equivalent to the amount it would add when unfrozen. 

But an adroit reader of this blog—perhaps a climate scientist or maybe just a well educated nonexpert—objected that in fact floating sea ice has slightly less salinity than sea water, and as a result of some interesting mechanism or another displaces a teeny tiny bit less water than it would add if melted. Global sea levels would thus rise about 1/100th of “half a hair’s breadth”—the width of a human cell, within an order of magnitude—if the North Pole melted.

Disturbed to learn so many eminent science authorities were disseminating incorrect information in educational outreach materials, the blog reader prevailed on NOAA to change the answer it gives in its “Frequently asked questions about the Arctic” page gives to the question, “Will sea levels rise if the North Pole continues to melt?”  

Before the agency said that there'd be "no effect" if the "North Police ice cap melts"; now the page says there would be "little effect."

So at least on NOAA’s site (haven’t check to see if all the other agencies and public educators have changed their materials) “little effect” is now the “correct answer”—one, sadly, that NOAA apparently expects members of the public to assimilate in a completely unreflective way, since the agency gives no account of why, if the “North Pole is already floating,” it wouldn’t behave just like an “ice cube floating in a glass of water.”


But as I also explained, among the general-population sample of test takers to whom I administered my proto-assessment, answering “true” rather than "false" to the “North Pole” item predicted a three times greater likelihood of incorrectly responding true” as well to two other items: one stating that scientists expected global warming from CO2 emissions to “reduce photosynthesis by plants” ("photosynthesis"); and another that scientists believe global warming will "increase the risk of skin cancer” ("skin cancer").

If we assume that people who responded “false” to "photosynthesis" and "skin cancer" have a better grasp of the mechanisms of climate change than those who responded “true” to those items, then a “false” response to "North Pole” is a better indicator—or observable manifestation—of the latent or unobserved form of science comprehension that the “climate literacy” proto-assessment was designed to measure.

Maybe some tiny fraction of the people who answerd "true" to "North Pole" were aware of the cool information about the nano-sized differential between the volume of water the North Pole ice cap and the amount of water it displaces when frozen. But many times more than that no doubt simply didn't know that the North Pole is just an ice cube floating in the Arcitic Ocean or didn't know that ice displaces a volume of water equivalent to the volume it would occupy when melted.

For that reason, when administered to a general population sample , the instrument will do a better job in identifying those who get the mechanisms of human-caused climate change, and who can actually reason about information relating to them, if “false” rather than “true” is treated as correct.  

This is simplifying a bit: the issue is not merely whether there is a positive correlation between the answer deemed "correct" and superior performance on a validated test as a whole but whether the answer deemed correct makes scores on the test more reliable--for which such a correlation is necessary but not sufficient. But the same point applies: the response that makes a validated instrument more reliable could in theory be shown to be wrong or no "more" right than an alternative response the crediting of which would reduce the reliability of the instrument. 

The only person who would object to this understanding of how to score standardized test responses is someone who makes the mistake of thinking that a science-comprehension assessment is supposed to certify assimilation of some inventory of canonical “facts” rather than measure a latent or unobserved capacity to acquire and use scientific knowledge.

S&B don’t make that mistake. On the contrary, they assert that those who constructed the Indicators made it, and criticize the Indicators battery (and related Eurobarometer “factual knowledge” items) on that ground.

So I'm puzzled why they think it "casts further doubt" on the test to show that the "facts" in its science literacy inventory are ones that scienitsts themselves might dispute.

Indeed, it is well known to experts in the design of assessments that sufficiently knowledgeable people will frequently be able to come up with perfectly acceptable accounts of why a “wrong” response to one or another test item could reasonably be seen as correct. 

Again, so long as it is less likely that any particular test taker who selected that response had such an account in mind than that he or she simply lacked some relevant form of comprehension, then giving credit for the “wrong” answer to all who selected it will make the results less accurate.

Obviously, it would be a huge error to equate error with lack of knowledge when a “public science comprehension” assessment is administered to a group of expert scientists.  As S&B discovered,  it is highly likely that the test takers in such a sample will in fact able to give a satisfactory account of why any “wrong” answer they select should be viewed as correct.

But just as obviously, it would be a mistake to assume that when a public science comprehension test is administered to members of the public the small fraction who say the “Sun go[es] around the Earth” rather than “Earth goes around the Sun” are more likely than not conjuring up the defense of such an answer that the astronomer Sir Fred Hoyle could have given: namely, that a geocentric model of planetary motion is no less "correct" than a "heliocentric" one; the latter, Hoyle points out in his essay on Copernicus,  is justified on grounds of its predictive fecundity, not its superior accuracy.  

True, if Hoyle by some chance happened to be among the members of the public randomly recruited to take the test, his science comprehension might end up being underestimated.

Only someone who doesn’t understand that a public science comprehension measure isn’t designed to assess the comprehension level of trained scientists, however, could possibly treat that as evidence that a particular item is invalid.

S&B certainly wouldn’t make that mistake either.  The most important criticism they make of the Indicators is that insufficient attention was given in designing it to identifying what ordinary members of the public have to know, and what capacities they must have, in order to acquire and scientific information relevant to ordinary decisionmaking in a technologically advanced, liberal democratic society.

So for this reason, too, I don't see why they would think the results of their scientist survey "cast[s] further doubts upon the value of [the Indicators]." A valid  public scidence comprehension measure would surely produce the same amusing spectacle if administered to a group of trained scientists-- so the demonstration is neither here nor there if we are trying to figure out whether and how to improve upon the Indicators.

As I said, I really like the S&B paper, and hope that other researchers take its central message to heart: that the study of public science comprehension is being stunted for want of a defensibly theorized, empirically validated instrument.

I’m pretty sure if they do, though, they’ll see why administering existing or prospective instruments to trained scientists is not a very useful way to proceed. 

This simplifying a bit: the issue is not merely whether there is a correlation between the answer scored as "correct" and superior performance on a validated test as a whole but whether the answer deemed correct makes scores on the test more reliable--for which a positive correlation is necessary but not sufficient. But the same point applies--the response that makes a validated instrument more reliable could in theory be shown to be wrong or no "more" right than an alternative response that would reduces the reliability of the scores if deemed "correct."

Undertheorized and unvalidated: Stocklmayer & Bryant vs. NSF Indicators “Science literacy” scale part I

The paper isn’t exactly hot off the press, but someone recently lowered my entropy by sending me a copy of Stocklmayer, S. M., & Bryant, C. Science and the Public—What should people know?, International Journal of Science Education, Part B, 2(1), 81-101 (2012)

Cool article!

The piece critiques the NSF’s Science Indicators “factual knowledge” questions.

As is well known to the 9.8 billion readers of this blog (we’re down another couple billion this month; the usual summer-holiday lull, I’m sure), the Indicators battery is pretty much the standard measure for public “science literacy.”

The NSF items figure prominently in the scholarly risk perception/science communication literature. 

With modest additions and variations, they also furnish a benchmark for various governmental and other official and semi-official assessments of “science literacy” across nations and within particular ones over time.

I myself don’t think the Indicators battery is invalid or worthless or anything like that.

But like pretty much everyone I know who uses empirical methods to study public science comprehension, I do find the scale unsatisfying

What exactly a public sicence comprehension scale should measure is itself a difficult and interesting question. But whatever answer one chooses, there is little reason to think the Indicators’ battery could be getting at that.

The Indicators battery seems to reduce “science literacy” to a sort of catechistic assimilation of propositions and principles: “The earth goes around the sun, not the other way ’round”[check];  “electrons are smaller that atoms” [check]; “antibiotics don’t kill viruses—they kill bacteria!,” [check!].

We might expect an individual equipped to reliably engage scientific knowledge in making personal life decisions, in carrying out responsibilities inside of a business or as part of a profession, in participating in democratic deliberations, or in enjoying contemplation of the astonishing discoveries human beings have made about the workings of nature will have become familiar with all or most of these propositions.

NSF Indicators "factual knowledge" battery & int'l results (click it!)But simply being familiar with all of them doesn’t in itself furnish assurance that she’ll be able to do any of these things.

What does is a capacity—one consisting of the combination of knowledge, analytical skills, and intellectual dispositions necessary to acquire, recognize, and use pertient scientific or empirical information in specified contexts.  It’s hardly obvious that a high score on the NSF’s “science literacy” test (the mean number of correct reponses in a general population sample is about 6 of 9) reliably measures any such capacity—and indeed no one to my knowledge has ever compiled evidence suggesting that it does. 

This—with a lot more texture, nuance, and reflection blended in—is the basic thrust of the S&B paper.

The first part of S&B consists of a very detailed and engaging account of the pedigree and career of the Indictors’ factual-knowledge items (along with various closely related ones used to supplement them in large-scale recurring public data collections like the Eurobarometer). 

What’s evident is how painfully innocent of psychometric and basic test theory this process has been.

The items, at least on S&B’s telling, seem to have been selected casually, more or less on the basis of the gut feelings and discussions of small groups of scientists and science authorities.

Aside from anodyne pronouncements on the importance of “public understanding of science” to “national prosperity,” “the quality of public and private decision-making,” and “enriching the life of the individual,” they made no real effort to articulate the ends served by public “science literacy.” As a result, they offered no cogent account of the sorts of knowledge, skills, dispositions, and the like that securing the same would entail.

Necessarily, too, they failed to identify the constructs—conceptual representations of particular skills and dispositions—an appropriately designed public science comprehension scale should measure. 

Early developers of the scale reported Cronbach’s alpha and like descriptive statistics, and even performed factor analysis that lent support to the inference that the NSF “science literacy” scale was indeed measuring something.

Eurobarometer variantBut without any theoretical referent for what the scale was supposed to measure and why, there was necessarily no assurance that what was being measured by it was connected to even the thinly specified objectives the proponents of them had in mind.

So that’s the basic story of the first part of the S&B article; the last part consists in some related prescriptions.

Sensibly, S&B call for putting first things first: before developing a measure, one must thoughtfully (not breezily, superficially) address what the public needs to know and why: what elements of science comprehension are genuinely important in one or another of the contexts, to one or another of the roles and capacities, in which ordinary (nonexpert) members of the public make use of scientific information?

S&B suggest, again sensibly, that defensible answers to these questions will likely support what the Programme for International Student Assessment characterizes as an “assets-based model of knowledge” that emphasizes “the skills people bring to bear on scientific issues that they deal with in their daily lives.”  (Actually, the disconnect between the study of public science comprehension and the vast research that informs standardized testing, which reflects an awe-inspiring level of psychometric sophistication, is really odd!) 

Because no simple inventory of “factual knowledge” questions is likely to vouch for test takers’ possession of such a capacity, S&B propose simply throwing out the NSF Indicators battery rather than simply supplementing it (as has been proposed) with additional "factual knowledge" items on “topics of flight, pH, fish gills, lightning and thunder and so on.”

Frankly, I doubt that the Indicators battery will ever be scrapped. By virtue of sheer path dependence, the Indicators battery confers value as a common standard that could not easily, and certainly not quickly, be replaced. 

In addition, there is a collective action problem: the cost of generating a superior, “assets-based” science comprehension measure—including not only the toil involved in the unglamorous work of item development, but also the need to forgo participating instead in exchanges more central to the interest and attention of most scholars—would be borne entirely by those who create such a scale, while the benefits of a better measure would be enjoyed disproportionately by other scholars who’d then be able to use it.

I think it is very possible, though, that the NSF Indicators battery can be made to evolve toward a scale that would have the theoretical and practical qualities that S&B.

As they investigate particular issues (e.g., the relationship between science comprehension and climate change polarization), scholars will likely find it useful to enrich the NSF Indicators batter through progressive additions and supplementations, particularly with items that are known to reliably measure the reasoning skills and dispositions necessary to recognize and make use of valid empirical information in everyday decisionmaking contexts.

That, anyway, is the sort of process I see myself as trying to contribute to by tooling around with and sharing information on an “Ordinary science intelligence” instrument for use in risk perception and science communication studies.

Even that process, though, won’t happen unless scholars and others interested in public science comprehension candidly acknowledge the sorts of criticisms S&B are making of Indicators battery; unless they have the sort of meaningful discussion S&B propose about who needs to know what about science and why; and unless scholars who use the Indicators battery in public science comprehension research explicitly address whether the battery can reasonably be understood to be measuring the forms of knowledge and types of reasoning dispositions on which their own analyses depend.

So I am really glad S&B wrote this article!

Nevertheless, “tomorrow,” I’ll talk about another part of the S&B piece—a survey they conducted of 500 scientists to whom they administered the Indicators’ “factual knowledge” items—that I think is very very cool but actually out of keeping with the central message of their paper! 


How to achieve "operational validity"? Translation Science!

Never fails! By recklessly holding forth on a topic that is obviously more complicated than I am making it out to be, I have again provoked a reflective, informed response from someone who really knows something! 

Recently I dashed off a maddeningly abstract post on the “operational validity” of empirical science communication studies. A study has “high” operational validity, I suggested, if it furnishes empirical support for a science-communication practice that real-world actors can themselves apply and expect to work more or less “as is”; such a study has “low operational validity” if additional empirical studies must still be performed (likely in field rather than lab settings) before the study’s insights, as important as they might be, can be reliably brought to bear on one or another real-world science communication problem. 

I wanted to distinguish the contribution that this concept, adapted from managerial studies (Schellenberger 1974), makes to assessment of a study’s practical value from those made by assessments of the study’s “internal” and “external” validity.  

For a study to be of practical value, we must be confident from the nature of its design that its results can be attributed to the mechanisms the researcher purports to be examining and not some other ones (“internal validity”).  In addition, we must be confident that the mechanisms being investigated are ones of consequence to the real-world communication dynamics that we want to understand and influence—that the study is modeling that and not something unrelated to it (“external validity”).

But even then, the study might not tell real-world communicators exactly what to do in any particular real-world setting.  

Indeed, to be confident that she had in fact isolated the relevant mechanisms, and was genuinely observing their responsiveness to influences of interest, the researcher might well have resorted (justifiably!) to devices intended to disconnect the study from the cacophony of real-world conditions that account for our uncertainty about these things in everyday life.

In this sense, low operational validity is often built into strategies for assuring internal and external validity (particularly the former).

That’s not bad, necessarily.

It just means that even after we have gained the insight that can be attained form a study that has availed itself of the observational and inferential advantages furnished by use of a simplified “lab” model, there is still work to be done—work to determine how the dynamics observed in the lab can reliably be reproduced in any particular setting.  We need at that point to do studies of higher “operational validity” that build on what we have learned from lab studies. 

How should we go about doing studies that add high operational validity to studies of insights gained “in the lab”?

Science communication scholar Neil Stenhouse has something to say about that!


How to achieve operational validity: Translation Science

Neil Stenhouse

Neil StenhouseIt is very unlikely that any real organization would want to use the stimuli from a messaging study, for example, without at least a few substantial changes. They would certainly want their organization to be identified as the source of the message. These changes would influence the effect the messages had on their audience. What kind of changes would the organization want to make? How much would that change the effects of the message? How could the message be made acceptable and usable by these organizations, yet still retain the effectiveness it had in previous experiments? 

Communication practitioners wanting to put social science insights to use could very well ask questions like: how do you use the insights of cultural cognition experiments to design an effective large-scale messaging campaign for the Environmental Defense Fund? Alternatively, how do you use these insights to design a town hall meeting on climate change in Winchester, VA? How could you take a short passage about geoengineering, for example, that had a depolarizing effect on hierarchs and egalitarians (Kahan et al., 2012), and design a meeting that had a similar depolarizing effect? And if you did so, how well would it work? 

I recently wrote a paper about research designed to answer questions like these (Stenhouse, 2014). It turns out that at least in one discipline, people are already doing a substantial amount of research that tests not only which kinds of interventions are effective, but figures out the nitty gritty points of what’s needed to effectively transplant the core of the lab-tested intervention into actual operational use in the real world. It addresses an important part of Dan’s concern with making communication research “evidence-based all the way down” (Kahan, 2013).

In public health, there is a whole subdiscipline – and multiple journals – on what is known as translation science, or implementation science (Glasgow et al., 2012). Researchers in public policy and international development are beginning to address this also (Cartwright & Hardie, 2012; Woolcock, 2013).

Translation science can be summarized with an example of testing an exercise program. With traditional public health research, a research team, often from a university, would design an exercise program, implement it, and measure and carefully document the results. Who lost weight? How much? Do they intend to keep exercising? And so on.

Ricky Stenhouse, Jr. (not Neil Stenhouse)With translation research, as well as these kinds of outcomes, there is an additional focus on recording and describing the things involved in implementing these programs in the field, at scale (Glasgow et al., 1999).

For example, the research team might take their exercise program to a sample of the kinds of organizations that would be delivering the intervention if its use actually became widespread – e.g. hospital staff, community health organizations, church recreation group organizers (Bopp et al., 2007). The researchers would aim to answer questions like: how many of the organizations we approached actually wanted to implement the intervention?

Some organizations might be against it, for cost reasons, or political reasons (e.g. perhaps a hospital’s doctors have pre-existing arrangements with the providers of another intervention).

When an organization agrees to use an intervention, do they implement it correctly? Perhaps the intervention has multiple complex steps, and busy hospital staff may occasionally make errors that cause the intervention to be ineffective.

In short, traditional tests measure whether something works in the lab, under ideal, controlled conditions. Translation science measures whether something works in the real world, under typical real-world conditions (Flay, 1986; Glasgow et al., 2003). And in addition, by measuring the things that can be expected to affect whether it works in the real world – such as whether organizations like it, or how easy it is to implement – translation science can help figure out how to make interventions more likely to work in the real world.

For example, if researchers find out that an intervention is difficult for hospital staff for implement, and find out precisely which part is most difficult to understand, then they might be able to find a way of making it simpler without compromising the efficacy of the intervention.

Cool paper by Neil Stenhouse!Translation science provides the “operational validity” Dan was talking about. It answers questions like: What does it even look like when you try to put the results of experiments into real-world practice? How do you do that? What goes wrong? How can you fix it so it works anyway? 

These kinds of questions are important for anyone who wants their insights to be applied in the real world – and especially important if you want them to be applied at scale. I think many researchers on climate communication would be in the latter category. While good traditional research can help us understand a lot about human psychology and behavior, it only does part of the job in putting that knowledge to use.

One question likely to come up is: Why should social scientists do this work, as opposed to the practitioners themselves?

I argue that they should do this work for the same reasons they should do any work – their skill in recording, conceptualizing and describing social processes (Stenhouse, 2014).

If we want rigorous, generalizable, cumulative knowledge about human behavior, we need social scientists. If we want rigorous, generalizable, cumulative knowledge about how to apply social interventions, we need social scientists there too. We need people who understand both the inner workings of the intervention and the context in which it is deployed, so that they can effectively negotiate between the two in creating the optimal solution.

Ricky Stenhouse Jr driving a race car -- pffff, who cares?Questions about division of labor here are certainly open to debate. Should all social scientists doing work with an applied purpose do some translation research? Should some specialize in lab work, and others in translation science, and occasionally collaborate?

These questions, as well as questions about how to shift academic incentives to reward translation science adequately, remain to be decided.

However, I would argue that especially in areas with urgent applied purposes, people are currently not doing nearly enough of this kind of work. We want our findings to be applied in the real world. Currently there are gaps in our knowledge of how to translate our findings to the real world, and other disciplines provide practical ideas for how to fill those gaps in our knowledge. We are not doing our jobs properly if all of us refuse to try taking those steps.

Neil Stenhouse ( is a PhD candidate from the George Mason University Center for Climate Change Communication.

Bopp, M., Wilcox, S., Hooker, S. P., Butler, K., McClorin, L., Laken, M., ... & Parra-Medina, D. (2007). Using the RE-AIM Framework to Evaluate a Physical Activity Intervention in Churches. Preventing chronic disease4(4).

Cartwright, N., & Hardie, J. (2012). Evidence-based policy: A practical guide to doing it better. Oxford University Press.

Flay, B. R. (1986). Efficacy and effectiveness trials (and other phases of research) in the development of health promotion programs. Preventive medicine15(5), 451-474.

Glasgow, R. E., Lichtenstein, E., & Marcus, A. C. (2003). Why don't we see more translation of health promotion research to practice? Rethinking the efficacy-to-effectiveness transition. American Journal of Public Health93(8), 1261-1267.

Glasgow, R. E., Vinson, C., Chambers, D., Khoury, M. J., Kaplan, R. M., & Hunter, C. (2012). National Institutes of Health approaches to dissemination and implementation science: current and future directions. American Journal of Public Health102(7), 1274-1281.

Glasgow, R. E., Vogt, T. M., & Boles, S. M. (1999). Evaluating the public health impact of health promotion interventions: the RE-AIM framework.American Journal of Public Health89(9), 1322-1327.

Kahan, D. M. (2013). Making climate-science communication evidence-based—all the way down. Culture, Politics and Climate Change. London: Routledge. Available at: http://papers. ssrn. com/sol3/papers. cfm.

Kahan, D. M., Jenkins-Smith, H., Tarantola, T., Silva, C. L., & Braman, D. (2012). Geoengineering and climate change polarization: Testing a two-channel model of science communication. Annals of the American Academy of Political and Social Science.

Schellenberger, R. E. (1974). Criteria for Asssessing Model Validity for Mangerial Purposes. Decision Sciences, 5(4), 644-653. doi: 10.1111/j.1540-5915.1974.tb00643.x

Stenhouse, N. (2014). Spreading success beyond the laboratory: Applying the re-aim framework for effective environmental communication interventions at scale. Paper to be presented at the 2014 National Communication Association Annual Convention.

Woolcock, M. (2013). Using case studies to explore the external validity of ‘complex’ development interventions. Evaluation19(3), 229-248.





Constructing an "Ordinary climate science intelligence" assessment: a fragment ...

From Climate Science Communication and the Measurement Problem, Advances in Pol. Psych. (forthcoming):

6.  Measuring what people know about climate science

What do members of the public know about scientific evidence on climate science? Asking whether they “believe in” human-caused climate change does not measure that.  But that does not mean what they know cannot be measured.

a. A disentanglement experiment: the “Ordinary Climate Science Intelligence” instrument. Just as general science comprehension can be measured with a valid instrument, so can comprehension of the science on climate change in particular. Doing so requires items the responses to which validly and reliably indicate test-takers’ climate science comprehension level.

The idea of “climate science comprehension” is hardly straightforward. If one means by it the understanding of and facility with relevant bodies of knowledge essential to doing climate science research, then any valid instrument is certain to show that the level of climate science comprehension is effectively zero in all but a very tiny fraction of the population.

But there are many settings in which the quality of non-experts’ comprehension of much more basic elements of climate science will be of practical concern. A high school science teacher, for example, might aim to impart an admittedly non-expert level of comprehension in students for the sake of equipping and motivating them to build on it in advanced studies. Likewise, without being experts themselves, ordinary members of the public can be expected to benefit from a level of comprehension that enables them reliably to recognize and give proper effect to valid climate science that bears on their decisionmaking, whether as homeowners, businesspeople, or democratic citizens.

Assume, then, that our goal is to form an “ordinary climate science intelligence” (OCSI) instrument.  Its aim would certainly not be to certify possession of the knowledge and reasoning dispositions that a climate scientist’s professional judgment comprises.  It will come closer to the sort of instrument a high school teacher might use, but even here no doubt fall short of delivering a sufficiently complete and discerning measure of the elements of comprehension he or she is properly concerned to instill in students.  What the OCSI should adequately measure—at least this would be the aspiration of it—is a form of competence in grasping and making use of climate science that an  ordinary person would benefit from in the course of participating in ordinary decisionmaking, individual and collective.

There are two challenges in constructing such an instrument.  The first and most obvious is the relationship between climate change risk perceptions and individuals’ cultural identities.  To be valid, the items that the assessment comprises must be constructed to measure what people know about climate science and not who they are.

A second, related problem is the potential for confounding climate science comprehension with an affective orientation toward global warming risk.  Perceptions of societal risk generally are indicators of a general affective orientation. The feelings that a putative risk source evokes are more likely to shape than be shaped by individuals’ assessments of all manner of factual information pertaining to it (Loewenstein et al. 2001; Slovic et al. 2004).  There is an ambiguity, then, as to whether items that elicit affirmation or rejection of factual propositions relating to climate change are measuring genuine comprehension or instead only the correspondence between the propositions in question and the valence of respondents’ affective orientations toward global warming. Existing studies have found, for example, that individuals disposed to affirm accurate propositions relating to climate change—that burning fossil fuels contributes to global warming, for example—are highly likely to affirm many inaccurate ones—e.g., that atmospheric emissions of sulfur do as well—if those statements evince concern over environmental risks generally (Tobler, Visschers & Siegrist 2012; Reynolds et al. 2010).

Two steps were taken to address these challenges in constructing an OCSI instrument, which was then administered to the same survey participants whose general science comprehension was measured with the OSI scale.  The first was to rely on an array of items the correct responses to which were reasonably balanced between opposing affective orientations toward the risk of global warming.   The multiple-choice item “[w]hat gas do most scientists believe causes temperatures in the atmosphere to rise” (“Carbon”) and the true-false one “human-caused global warming will result in flooding of many coastal regions” (“Floods”) evince concern over global warming and thus could be expected to be answered correctly by respondents affectively predisposed to perceive climate change risks as high. The same affective orientation, however, could be expected to incline respondents to give the incorrect answer to items such as “human-caused global warming will increase the risk of skin cancer in human beings” (“Cancer”) and “the increase of atmospheric carbon dioxide associated with the burning of fossil fuels will reduce with photosynthesis by plants” (“Photosynthesis”). By the same token, those respondents affectively disposed to be skeptical of climate change risks could be expected to supply the correct answer to Cancer and Photosynthesis but the wrong ones , Carbon and Floods. The only respondents one would expect to be likely to answer all four correctly are ones who know and are disposed to give the correct response independent of their affective orientations.

The aim of disentangling (unconfounding) affective orientation and knowledge was complimented by a more general assessment-construction tenet, which counsels use of items  that feature incorrect responses that are likely to seem correct to those who do not genuinely possess the knowledge or aptitude being assessed (Osterlind 1998). Because the recent hurricanes Sandy and Irene both provoked considerable media discussion of the impact of climate change, the true-false item “[h]uman-caused global warming has increased the number and severity of hurricanes around the world in recent decades” was expected to elicit an incorrect response from many climate-concerned respondents of low or modest comprehension (who presumably would be unaware of the information the IPCC 5th Assessment (2013, I: TS p. 73) relied upon in expressing “low confidence” in “attributions of changes in tropical cyclone activity to human influence” to date, based on “low level of agreement between studies”).  Similarly, the attention furnished in the media to the genuine decrease in the rate at which global temperatures increased in the last 15 years was expected to tempt respondents, particularly ones affectively disposed toward climate-change skepticism, to give the incorrect response to the true-false item “globally averaged surface air temperatures were higher for the first decade of the twenty-first century (2000-2009)  than for the last decade of the twentieth century (1990-1999).”

The second step taken to address the distinctive challenge of constructing a valid OCSI assessment was to introduce the majority of items with the clause “Climate scientists believe that  . . . .” The goal was to reproduce the effect of the clause “According to the theory of evolution . . .” in eliminating the response differential among religious and nonreligious individuals to the NSF Indicators’ Evolution item.  It is plausible to attribute this result to the clause’s removal of the conflict relatively religious respondents experience between offering a response that expresses their identity and one that signifies their familiarity with a prevailing or consensus position in science.  It was anticipated that using the “Climate scientists believe” clause (and similar formulations in other items) would enable respondents whose identity is expressed by disbelief in human-caused global warming to answer  OCSI items based instead on their understanding of the state of the best currently available scientific evidence.

To be sure, this device created the possibility that respondents who disagree with climate scientists’ assessment of the best available evidence could nevertheless affirm propositions that presuppose human-caused climate change.  One reason not to expect such a result is that public opinion studies consistently find that members of the public on both sides of the climate debate  don’t think their side’s position is contrary to scientific consensus (Kahan et al. 2011).

It might well be the case, however, that what such studies are measuring is not ordinary citizens knowledge of the state of scientific opinion but their commitment to expressing who they are when addressing questions equivalent to “belief in” global warming. If their OCSI responses show that individuals whose cultural identity is expressed by denying the existence of human-caused global warming nevertheless do know what scientists believe about climate change, then this would be evidence that it is the “who are you, whose side are you on” and not the “what do you know” question when they address the issue of global warming in political settings.

Ultimately, the value of the information yielded by the OCSI responses does not depend on whether citizens “believe” what they say they know “climate scientists believe.” Whether they do or not, their answers would necessarily remain valid measures of what such respondents understand to be scientists’ view of the best available evidence. Correct perceptions of the weight of scientific opinion is itself is a critical form of science comprehension, particularly for individuals in their capacity as democratic citizens.  Items that successfully unconfound who are you, whose side are you on from what do you know enable a valid measure of this form of climate science comprehension.

Achieving this sort of decoupling was, it is important to reiterate, the overriding motivation behind construction of the OCSI measure.  The OCSI measure is at best only a proto- assessment instrument. A fully satisfactory “climate science comprehension” instrument would need to be simultaneously broader—encompassing more knowledge domains—and more focused—more calibrated to one or another of the settings or roles in which such knowledge is useful. 

But validly assessing climate-science comprehension in any setting will require disentangling knowledge and identity.  The construction of the OCSI instrument was thus in the nature of an experiment—the construction of a model of a real-world assessment instrument—aimed at testing whether it is possible to measure what people know about climate change without exciting the cultural meanings that force them to pick sides in a cultural status conflict.


 IPCC. Climate Change 2013: The Physical Science Basis, Working Group I Contribution to the Fifth Assessment Report of the Intergovernmental Panel on Climate Change (Cambridge University Press, Cambridge, England, 2013).

Kahan, D.M., Jenkins-Smith, H. & Braman, D. Cultural Cognition of Scientific Consensus. J. Risk Res. 14, 147-174 (2011).

Loewenstein, G.F., Weber, E.U., Hsee, C.K. & Welch, N. Risk as Feelings. Psychological Bulletin 127, 267-287 (2001).

Osterlind, S.J. Constructing test items : multiple-choice, constructed-response, performance, and other formats (Kluwer Academic Publishers, Boston, 1998).

Reynolds, T. W., Bostrom, A., Read, D., & Morgan, M. G. (2010). Now What Do People Know About Global Climate Change? Survey Studies of Educated Laypeople. Risk Analysis, 30(10), 1520-1538. doi: 10.1111/j.1539-6924.2010.01448.x

Tobler, C., Visschers, V.H.M. & Siegrist, M. Addressing climate change: Determinants of consumers' willingness to act and to support policy measures. Journal of Environmental Psychology 32, 197-207 (2012).

Slovic, P., Finucane, M.L., Peters, E. & MacGregor, D.G. Risk as Analysis and Risk as Feelings: Some Thoughts About Affect, Reason, Risk, and Rationality. Risk Analysis 24, 311-322 (2004).


Measuring "ordinary science intelligence": a look under the hood of OSI_2.0

As the 12 billion readers of this blog (we are down 2 billion, apparently because we’ve been blocked in the Netherlands Antilles & Macao. . .) know, I have been working on & reporting various analyses involving an “ordinary science intelligence” (OSI) science-comprehension measure.

Indeed, one post describing how it relates to political outlooks triggered some really weird events—more than once in fact!

But in any case, I’ve now assembled a set of analyses and put them into one document, which you can download if you like here.

The document briefly describes the history of the scale, which for now I’m calling OSI_2.0 to signify that it is the successor of the science comprehension instrument (henceforward “OSI_1.0”) featured in “The polarizing impact of science literacy and numeracy on perceived climate change risks,” Nature Climate Change 2, 732-735 (2012).

Like OSI_1.0, _2.0 is a synthesis of existing science literacy and critical reasoning scales.  But as explained in the technical notes, OSI_2.0 combines items that were drawn from a wider array of sources and selected on the basis of a more systematic assessment of their contribution to the scale’s performance.

The goal of OSI_2.0 is to assess the  capacity of individuals to recognize and give proper effect to valid scientific evidence relevant to their “ordinary” or everyday decisions—whether as consumers or business owners, parents or citizens. 

A measure of that sort of facility with science—rather than, say, the one a trained scientist or even a college or high school science student has—best fits the mission of OSI_2.0 as to enable “empirical investigation of how individual differences in science comprehension contribute to variance in public perceptions of risk and like facts.”

Here are some of the things you, as a regular reader of this blog who has already been exposed to one or another feature of OSI_2.0, can learn from the document:

1. The items and their derivation.  The current scale consists of 18 items drawn from the NSF Indicators, the Pew Science & Technology battery, the Lipkus/Peters Numeracy scale, and Frederick’s Cognitive Reflection Test.  My next goal is to create a short-form version that performs comparably well;  8 items would be great & even 10 much better. . . . But in any case, the current 18 and their sources are specifically identified.

2. The psychometric properties of the scale.  The covariance structure, including dimensionality and reliability, are set forth, of course.  But the cool thing here, in my view, is the grounding of the scale in Item Reponse Theory.

mmmmmm... item response curves ...There are lots of valid ways to combine or aggregate individual items, conceived of as observable or manifest “indicators,” into a scale conceived of as measuring some unobserved or latent disposition or trait.

The distinctive thing about IRT is the emphasis it puts on assessing how each item contributes to the scale’s measurement precision along the range of the disposition treated as a continuous variable.  This is a nice property, in particular, when one is designing some sort of knowledge or aptitude assessment instrument, where one would like to be confident not only that one is reliably relating variance in the disposition as a whole to some outcome variable of interest but also that one reliably assessing individual differences in levels of the disposition within the range of interest (usually the entire range).

IRT information curves for OSI_2.0 & components thereofIRT is a great scale development tool because it helps to inform decisions not only about whether items are valid indicators but how much relative value they are contributing.

One thing you can see with IRT is that, as it is measured by the OSI_2.0 scale at least, the sort of “basic fact” items (“Electrons are smaller than atoms—true or false?”; “Does the Earth go around the Sun, or does the Sun go around the Earth?”) are contributing mainly to measurement discrimination at low levels of “ordinary science intelligence.”

One gets credit for those, certainly, but not as much as for correctly responding to the sorts of quantitative and critical reasoning items that come from the Numeracy scale and the Cognitive Reflection Test.

That’s as it should be in my view: a person who has the capacity to recognize and make use of valid science will no doubt have used it to acquire knowledge of a variety of basic propositions relating to the physical and biological sciences; but what we care about—what we want to certify and measure—is her ability to enlarge that stock of knowledge and use it appropriately to advance her ends.

3. External validity. The technical notes report analyses that show that OSI_2.0 is, unsurprisingly, correlated with education and with open-mindedness (as measured by Baron’s Actively Open-minded Thinking scale) but doesn’t reduce to them and in fact more accurately predicts performance on tasks that demand or display a distinctive science-comprehension capacity (like covariance detection).

4. Other covariates.  There are correlations with race and gender but they are actually pretty small.  None with political outlooks (but note: I didn’t even check for a correlation with belonging to the Tea Party—I’ve learned my lesson!  Actually, I can probably be coaxed into checking & reporting this; what “identity with the Tea Party” measures is a pretty interesting question! But I’ll do it a post in the middle of the night & written in pig latin to be sure to avoid a repeat of the sad spectacle that occurred the last time.).

"patterns ... everywhere in nature ... what about the stock market?!"5. The science-comprehension invalidity of “belief in” questions relating to evolution and global warming.  The notes illustrate the analytical/practical utility of OSI_2.0 by showing how the scale can be used to assess whether variance in response to standard survey items on evolution and global warming reflect differences in science comprehension.  They aren’t!

That, of course, is the conclusion of my new paper Climate Science Communication and the Measurement Problem, which uses OSI_2.0 to measure science comprehension.

click me .... resistance is futile ...But the data in the notes present a compact rehearsal of the findings discussed there and also add additional factor analyses, which reinforce the conclusion that “belief in” evolution and “belief in” global warming items are in fact indicators of latent “group identity” variables that feature religiosity and right-left political outlooks, respectively, and not indicators of the latent “ordinary science intelligence” capacity measured by the OSI_2.0 scale. 

The analyses were informed by interesting feedback on did on a post on factor analysis and scale dimensionality—maybe the commentators on that one will benefit me with additional feedback!



"Bounded rationality": the Grigori Rasputin of explanations for public perceptions of climate change risk

Another excerpt from Climate Science Communication and the Measurement Problem. 

4.  Is identity-protective cognition irrational?

The idea that “disbelief” in global warming is attributable to low “science literacy” is not the only explanation for public conflict over climate change that fails to survive an encounter with actual evidence. The same is true for the proposition that such controversy is a consequence of “bounded rationality.”

Indeed, the “bounded rationality thesis” (BRT) is probably the most popular explanation for public controversy over climate change.  Members of the public, BRT stresses, rely on “simplifying heuristics” that reflect the emotional vividness or intensity of their reactions to putative risk sources (Marx, Weber, Orlove, Leiserowitz, Krantz, Roncoli & Phillips 2007) but that often have “little correspondence to more objective measures of risk” (Weber 2006).  Those more objective measures, which “quantify either the statistical unpredictability of outcomes or the magnitude or likelihood of adverse consequences” (id.), are the ones that scientists employ. Using them demands an alternative “analytical processing” style that is acquired through scientific training and that “counteract[s] the emotionally comforting desire for confirmation of one’s beliefs” (Weber & Stern 2011).

BRT is very plausible, because it reflects a genuine and genuinely important body of work on the role that overreliance on heuristic (or “System 1”) reasoning as opposed to conscious, analytic (“System 2”) reasoning plays in all manner of cognitive bias (Frederick 2005; Kahneman 2003). But many more surmises about how the world works are plausible than are true (Watts 2011).  That is why it makes sense for science communication reasearchers, when they are offering advice to science communicators, to clearly identify accounts like BRT as “conjectures” in need of empirical testing rather than as tested “explanations.”

BRT generates a straightforward hypothesis about perception of climate change risks.  If the reason ordinary citizens are less concerned about climate change than they should be is that that they over-rely on heuristic, System 1 forms of reasoning, then one would expect climate concern to be higher among the individuals most able and disposed to use analytical, System 2 forms of reasoning .  In addition, because these concious, effortful forms of analytical reasoning are posited to “counteract the emotionally comforting desire for confirmation of one’s beliefs” (Weber & Stern 2011), one would also predict that polarization ought to dissipate among culturally diverse individuals whose proficiency in System 2 reasoning is comparably high.

This manifestly does not occur.  Multiple studies, using a variety of cognitive proficiency measures, have shown that individuals disposed to be skeptical of climate change become more so as their proficiency and disposition to use the forms of reasoning associated with System 2 increase (Hamilton, Cutler & Schaefer 2012; Kahan, Peters et al. 2012; Hamilton 2011).  In part for this reason—and in part because those who are culturally predisposed to be worried about climate change do become more alarmed as they become more proficient in analytical reasoning—polarization is in fact higher among individuals who are disposed to make use of System 2, analytic reasoning than it is among those disposed to rely on System 1, heuristic reasoning (Kahan, Peters et al. 2012).  This is the result observed among individuals who are highest in OSI, which in fact includes Numeracy and Cognitive Reflection Test items shown to predict resistance to System 1 cognitive biases (Figure 6).

The source of the public conflict over climate change is not too little rationality but in a sense too much. Ordinary members of the public are too good at extracting from information the significance it has in their everyday lives. What an ordinary person does—as consumer, voter, or participant in public discussions—is too inconsequential to affect either the climate or climate-change policymaking. Accordingly, if her actions in one of those capacities reflects a misunderstanding of the basic facts on global warming, neither she nor anyone she cares about will face any greater risk. But because positions on climate change have become such a readily identifiable indicator of ones’ cultural commitments, adopting a stance toward climate change that deviates from the one that prevails among her closest associates could have devastating consequences, psychic and material.  Thus, it is perfectly rational—perfectly in line with using information appropriately to achieve an important personal end—for that individual to attend to information on in a manner that more reliably connects her beliefs about climate change to the ones that predominate among her peers than to the best available scientific evidence (Kahan, 2012).

If that person happens to enjoy greater proficiency in the skills and dispositions necessary to make sense of such evidence, then she can simply use those capacities to do an even better job at forming identity-protective beliefs.  That people high in numeracy, cognitive reflection, and like dispositions use these abilities to find and credit evidence supportive of the position that predominates in their cultural group and to explain away the rest has been demonstrated experimentally (Kahan, Peters, Dawson & Slovic 2013; Kahan 2013b).   Proficiency in the sort of reasoning that is indeed indispensable for genuine science comprehension does not bring the beliefs of individuals on climate change into greater conformity with those of scientists; it merely makes those individuals’ beliefs even more indicators or measures of the relationship between those beliefs and the identities of those who share their defining commitments.

When “what do you believe” about a societal risk validly measures “who are you?,” or “whose side are you on?,” identity-protective cognition is not a breakdown in individual reason but a form of it. Without question, this style of reasoning is collectively disastrous: the more proficiently it is exercised by the citizens of a culturally diverse democratic society, the less likely they are to converge on scientific evidence essential to protecting them from harm. But the predictable tragedy of this outcome does not counteract the incentive individuals face to use their reason for identity protection.  Only changing what that question measures—and what answers to it express about people—can. 


Frederick, S. Cognitive Reflection and Decision Making. Journal of Economic Perspectives 19, 25-42 (2005).

Hamilton, L.C. Education, politics and opinions about climate change evidence for interaction effects. Climatic Change 104, 231-242 (2011).

Hamilton, L.C., Cutler, M.J. & Schaefer, A. Public knowledge and concern about polar-region warming. Polar Geography 35, 155-168 (2012)

Kahan, D.M. Ideology, Motivated Reasoning, and Cognitive Reflection. Judgment and Decision Making 8, 407-424 (2013b).

Kahan, D.M., Peters, E., Dawson, E. & Slovic, P. Motivated Numeracy and Englightened Self Government. Cultural Cognition Project Working Paper No. 116  (2013).

Kahan, D.M., Peters, E., Wittlin, M., Slovic, P., Ouellette, L.L., Braman, D. & Mandel, G. The polarizing impact of science literacy and numeracy on perceived climate change risks. Nature Climate Change 2, 732-735 (2012).

Kahneman, D. Maps of Bounded Rationality: Psychology for Behavioral Economics. Am Econ Rev 93, 1449-1475 (2003).

Marx, S.M., Weber, E.U., Orlove, B.S., Leiserowitz, A., Krantz, D.H., Roncoli, C. & Phillips, J. Communication and mental processes: Experiential and analytic processing of uncertain climate information. Global Environ Chang 17, 47-58 (2007).

Weber, E. Experience-Based and Description-Based Perceptions of Long-Term Risk: Why Global Warming does not Scare us (Yet). Climatic Change 77, 103-120 (2006).

Weber, E.U. & Stern, P.C. Public Understanding of Climate Change in the United States. Am. Psychologist 66, 315-328 (2011).


Five theses on climate science communication (lecture summary & slides)

The following is the outline of a lecture that I gave at the super awesome Royal Canadian Institute for the Advancement of Science on June 25, 2014 (slides here). The audience comprised a large group of people united by their curiosity and love of science but otherwise as diverse as the pluralistic democracy in which they live; it was an honor to be able to engage them in conversation. My paper Climate science communication and the measurement problem elaborates on the themes and presents additional data. 


What ordinary members of the public “believe” about climate change doesn’t reflect what they know; it expresses who they are.  

Responses to survey questions on “belief in” evolution have no correlation with understanding of evolution or with comprehension of science generally.  Instead, they indicate a cultural identity that features religiosity.

The same goes for survey questions on “belief in” human-caused climate change. Responses to them are interchangeable with responses to survey items used to measure political and cultural outlooks and they have no correlation either to understanding of climate science or science comprehension generally.


Public confusion over climate is not a consequence of defects in rationality; it is a consequence of the rational effect people give to information when they live in a world in which competing positions on disputed risks express membership in opposing cultural groups.

“Bounded rationality”—or limitations in the capacity of most people to give appropriate effect to scientific information on risk—is the most popular popular explanation for persistent public confusion over climate change.  But the durability of this claim itself reflects a form of persistent inattention to empirical evidence, which shows that political polarization over global warming is most intense among those segments of the population whose critical reasoning proficiencies make them the least prone to cognitive bias.

The BR hypothesis misunderstands what ordinary people are doing when they engage information on climate change and other culturally disputed risk issues.  They can’t plausibly be understood to be trying to minimize their exposure to the danger those risk sources pose, since their personal beliefs and actions are too inconsequential to have any impact. 

The positions they take will be understood, however, to signify their membership in and loyalty to one or another competing cultural group. To protect their standing in such a group—membership in which is vital to their emotional & material well-being—individuals can be expected to give to information the effect that aligns them most reliably with their group.  The more acute their powers of reasoning, moreover, the better a job they will do in this regard.

The problem is not too little rationality but too much in a world in which positions on risks and other policy-relevant facts have become entangled in cultural status competition.


Communicating valid science about climate change (or about the expert consensus of climate scientists) won’t dispel public conflict; only dissolving the connection between positions on the issue and membership in competing cultural groups will.


If individuals are using their reason to fit information to the positions that reinforce their connection to identity-defining groups, then bombarding them with more and more information won’t diminish polarization. Indeed, studies show that individuals selectively credit and discredit all manner of evidence—including scientific-consensus “messaging” campaigns—in patterns that enable them to persist in identity-defining belies.

Because that form of reasoning is rational—because it promotes individuals’ well-being at a personal level—the only way to prevent it is to change the relationship that holding positions on global warming has with the identities of culturally diverse citizens.


Ordinary members of the public already know everything they need to about climate science; the only thing that don’t  know (yet) is that the people they recognize as competent and informed use climate science in making important decisions.

Survey items that assess “belief in” human-caused global warming doesn’t measure what people know about climate change, but that doesn’t mean nothing can. 

As is the case for assessing knowledge relating to evolution, it is possible to design a “climate science literacy” instrument that disentangles expressions knowledge from group identity.

The administration of such a test to a nationally representative sample shows that in fact there is little meaningful difference among culturally diverse citizens, who uniformly understand climate change to be a serious risk.

That shared understanding does not lead to popular political support for policies to mitigate climate change, however, because the question “climate change” poses as a political issue is the same one posed by the survey measures of what people “believe” about it: not what do you know but who are you, whose side are you on?

People recognize and make use of all manner of decision-relevant science not by “understanding” it but by aligning their own behavior consistently with that of people they trust and recognize as socially competent.

The actors that members of diverse groups look to in fact are already making extensive use of climate science in their individual and collective decisionmaking.

Climate science communicators ought to be making it easier for members of all groups to see that.  Instead, they are trapped in forms of advocacy—including perpetual, carnival-like “debates”—that fill the science communication environment with toxic forms of cultural animosity.


What needs to be communicated to ordinary decisionmakers is normal climate science; what needs to be communicated to ordinary people is that using climate science is as normal for people like them as using the myriad other kinds of science they rely on to make their lives go well.

Practical decisionmakers of all sorts are eagerly seek and use information about climate science.  The scientists who furnish that information to them  (e.g., those at NCAR and the ones in the Department of Agriculture) do an outstanding job.

But what ordinary people, in their capacity as citizens, need to know is not “normal climate science” ; it is the normality of climate science.  They need to be shown that those whom they trust and recognize as competent already are using climate science in their practical decisionmaking.

That is the form of information that ordinary members of the public ordinarily rely on to align themselves with the best available scientific evidence.

It is also the only signal that can be expected to break through and dispel the noise of cultural antagonism that is now preventing constructive public engagement with climate science.

There are small enclaves in which enlightened democratic leaders are enabling ordinary people to communicate the normality of climate science to one another

The rest of us should follow their example.


3 kinds of validity: Internal, external & operational

Some of the thoughtful things people said in connection with my 3-part series on the “external validity” of science-communication studies made me realize that it would be  helpful to say a bit more about that concept and its connection to doing evidence-based science communication.

In the posts, I described “internal validity” as referring to qualities of the design that support drawing inferences about what is happening in the study, and “external validity” as referring to qualities of the design that support drawing inferences from the study to the real-world dynamics it is supposed to be modeling.

I’m going to stick with that.

But what makes me want to elaborate is that I noticed some people understood me to be referring to “external validity” more broadly as the amenability of a science-communication study to immediate or direct application.  I was thought to be saying “be careful: you can’t just take the stimulus of a ‘framing’ experiment or whathaveyou, send it to people in the mail or wave it around, etc., and expect to see the results from the lab reproduced in the world.”

I would (often) say that!

But I’d say it about many studies that are externally valid.

That is, these studies are modeling something of consequence in the world, and telling us things about how those dynamics work that it is important to know.  But they aren’t always telling us what to do to make effective use of that knowledge in the world.

That’s usually a separate question, requiring separate study. 

This is the very point I stress in my paper, “Making Climate Science Communication Evidence-based—All the Way Down.” There I say there must be no story-telling anywhere in an evidence-based system of science communication

It’s a mistake—an abuse of decision-science—for someone (anyone, including a social scientist) to reach into the grab-bag of mechanisms, pull out a few, fabricate a recommendation for some complicated phenomenon, and sell it to people as “empirically grounded” etc.

Because there are in fact so many real mechanisms of cognition that play a role in one or another aspect of risk perception and the like, there will always be more plausible accounts of some problem—like the persistence of public conflict over climate change—than are true!

Such accounts are thus conjectures or hypotheses that warrant study, and should be clearly designated as such.

The hypotheses have to be tested—with internally and externally valid methods—designed to generate evidence that warrants treating one or another conjecture as more worthy of being credited than another.

Very very important!

But almost never enough. 

The kinds of studies that help to decide between competing plausible mechanisms in science communication typically simplified models of the real-world problem in question.  The models deliberately abstract away from the cacophony of influences in those settings that make it impossible to be sure what’s going on. 

An internally valid study is one that has successfully isolated competing mechanisms from these confounding effects and generated observations that give us more reason to credit one, and less reason to credit the other, than we otherwise would have had.

(Yes, one can test “one” mechanism against the “null” but then one is in effect testing that mechanism against all others. Such designs frequently founder on the shoals of internal validity precisely because, when they “reject the null,” they fail to rule out that some other plausible mechanism could have produced the same effect. I’ll elaborate on why it makes more sense to use designs that examine the relative strength of competing mechanisms instead “tomorrow.”)

Such a study is useful, of course, only if the mechanisms that are being tested really are of consequence in the real-world, and only if the simplifying model hasn’t abstracted away from influences of consequence for the operation of those mechanisms.

That’s the focus of external validity.

But once someone has done all that—guess what?

Such a study  won’t (or at least almost never will ) tell a real-world communicator “what to do.”

How could it? The researcher, in order to be confident that she is observing the influence of the mechanisms of interest and that they are behaving in ways responsive to whatever experimental manipulation she performed, has deliberately created a model that abstracts away from all myriad influences that apply in any particular real-world setting,

If the study succeeds, it helps to identify what plausible mechanisms of consequence a real-world communicator should be addressing and—just as importantly—which plausibly consequential ones he should in fact ignore.

But there will be more plausible ways to engage that mechanism in a manner that reproduces in the world the results the experimenter observed in the lab—than are true, too!

The only way to connect the insight generated by the lab study to the real-world is to do in the real-world exactly what was done in the lab to sort through the surplus of plausible conjectures: that is, by constructing internally and externally valid field studies that give real-world communicators more reason to believe than they had before that one plausible conjecture about how to engage the communication mechanism of consequence is more likely correct than another one.

In other words, evidence-based science communication practice must be evidence  based all the way down.

No story telling in lieu of internally and externally valid studies of the mechanisms of cognition that one might surmise is at work.

And no story telling about how a lab study supports one or another real-world strategy for communication.

Researchers who carry on as if that their lab studies support concrete prescriptions in particular real-world settings are being irresponsible.  They should instead be telling real-world communicators exactly what I’m saying here—that field testing, informed by the judgment of those who have experience in the relevant domain—are necessary.

And if they have the time, inclination, and patience, they should then offer to help carry out such studies.

This is the m.o. of the Southeast Florida Evidence-based Science Communication Initiative that the Cultural Cognition Project, with very generous and much appreciated funding from the Skoll Global Threats Fund, is carrying out in support of the science-communication efforts of the Southeast Florida Climate Compact.

But now, getting back to the “external validity” concept, it should be easier to see that when I say a study is "externally invalid," I’m not saying that it doesn’t generate an immediately operational communication strategy in the field.  

It won't.

But the same can be said for almost all externally valid lab studies.

When I  say that a study isn’t “externally valid,” I’m saying it is in fact not modeling the real-world dynamics of consequence.  Accordingly, I mean to be asserting that it furnishes no reliable guidance at all.

So to be clear about all this, let’s add a new term to the discussion: operational validity.

“Operational validity,” a term I’m adapting from Schellenberger (1974), refers to that quality of a study design that supports the inference that doing what was done in the study will itself generate in the real-world the effects observed in the study.

A study has “high operational validity” if in fact it tests a communication-related technique that real-world actors can themselves apply and expect to work.  For the most part, those will be field-based studies.

A study that is internally and externally valid has “low operational validity” if, in order for it to contribute to science communication in the real-world, additional empirical studies connecting that study’s insights to one or another real-world communication setting will still need to be performed. 

A study with “low operational validity” can still be quite useful.

Indeed, there is often no realistic way to get to the point where one can conduct studies with high operational validity without first doing the sort of stripped-down, pristine “low operational validity” lab studies suited to winnowing down the class of cognitive mechanisms plausibly responsible for any science-communication problem.

But the fact is that when researchers have generated these sorts of studies, more empirical work must still be done before a responsible science-communication advisor can purport to answer the “what do I do?” question (or answer it other than by saying "you tell me!  & I'll measure ...").


Three distinct concepts: internal validity; external validity; operational validity.

All three matter.

This is, admittedly, too abstract a discussion.  I should illustrate.  But I’ve spent enough time on this post (about 25 mins; 30 mins is the limit).

If there is interesting discussion, then maybe I’ll do another post calling attention to examples suggested by others or crafted by me.


Schellenberger, R.E. Criteria for Assessing Model Validity for Managerial Purposes. Decision Sciences 5, 644-653 (1974).



Climate science literacy, critical reasoning, and independent thinking ...

Who you are, not what you know...My paper “Climate Science Communication and the Measurement Problem” features a “climate science literacy” (CSL) test. 

I’ve posted bits & pieces of the paper & described some of the data it contains.  But I really haven’t discussed in the blog what I regard as most important thing about the CSL results. 

This has to do with the relationship between the CSL scores, critical reasoning, and independent or non-conformist thinking.  I’ll say something—I doubt the last thing—about that now!

1. The point of the exercise: disentangling knowledge from identity. I’ll start with the basic point of the CSL—or really the basic point of the study that featured it and the Measurement Problem paper.

Obviously (to whom? the 14 billion regular readers of this blog!), I am not persuaded that conflict over culturally disputed risks in general and climate change in particular originates in public misunderstandings of the science or the weight of scientific opinion on those issues. 

That gets things completely backwards, in fact: It is precisely because there is cultural conflict that there is so much public confusion about what the best available evidence is on the small (it is small) class of issues that display this weird, pathological profile

Given the stake they have in protecting their status in these groups, people can be expected to attend to evidence—including evidence about the “weight of scientific opinion” (“scientific consensus”)—in a manner that reliably connects their beliefs to the position that prevails in their identity-defining groups.

But there are two ways (at least) to understand the effect of this sort of identity-protective reasoning.  In one, the motivated assimilation of information to the positions that predominate in their affinity groups generates widespread confusion over what “position” is supported by the best available scientific evidence.

Call this the “unitary conception” of the science communication problem.

Under the alternative “dualist conception,” “positions” on societal risk issues become bifurcated.  They are known to be both badges of group membership and matters open to scientific investigation.

Applying their reason, individuals will form accurate comprehensions of both positions.  

Which they will act on or express, however, depends on what sort of “knowledge transaction” they are in.  If individuals are in a transaction where their success depends on forming and acting on the position that accurately expresses who they are, then that “position” is the one that will govern the manner in which they process and use information.

If, in contrast, they are in a “knowledge transaction” where their success depends on forming and acting on the positions that are supported by the best available evidence, then that is the “position” that will orient their reasoning.

For most people, most of the time, getting the “identity-expressive position” right will matter most. Whereas people have a tremendous stake in their standing in cultural affinity groups, their personal behavior has no meaningful impact on the danger that climate change or other societal risks pose to them or others they care about.

But still, every one of them does have an entirely separate understanding of the “best-available-evidence” position.  We don’t see that—we see only cultural polarization on an issue like climate change—because politics confronts them with “identity-expressive” knowledge transactions only.

So too do valid methods of public opinion study (observational and experimental) geared to modeling the dynamics of cultural conflict over climate science.

Politics and valid studies both assess citizens' climate-science knowledge with questions that measure who they are, whose side they are on.

But if we could form a reliable and valid measure that disentangles what people know from who they are, we would then see that these are entirely different things, entirely independent objects of their reasoning.

Or so says the "dualist" view of the science communication probolem.

The aim of the “climate science literacy” or CSL measure that I constructed was to see if it was possible to achieve exactly this kind of disentanglement of knowledge and identity on climate change.

I refer to the CSL measure, in the paper and in this blog, as a “proto-” climate-science literacy instrument.  That’s because it's only a step toward developing a fully satisfactory instrument for measuring what people know about climate science. 

Indeed, the idea that there could be an instrument of that sort is absurd. There would have to be a variety, geared to assessing the sort of knowledge that individuals in various settings and roles (“high school student,” “business decisionmaker,” “policymaker,” “citizen” etc.) have to have.

But if the “dualist” conception of the science communication problem is correct, then in any such setting, a CSL, to be valid, would have to be designed to measure what people know and not who they are.

Seeing whether that could be done was the mission of my CSL measure. In that respect, there is nothing “proto-” about it.   

2. The strategy

The strategy I followed to construct a CSL of this sort is discussed, of course, in the paper.  But that strategy consisted of basically two things.

The first was an effort to create a set of items that would avoid equating “climate science literacy” with an affective orientation toward climate change. 

For the most part, that’s what perceptions of societal risks are: feelings with a particular valence and intensity.  As such, these affective orientations are more likely to shape understandings of information than be shaped by them.

The affective orientation toward climate change expresses who people are as members of opposing cultural groups engaged in a persistent and ugly form of status competition.  If we ask “climate science literacy” questions the answers to which clearly correspond to the ones people use to express their group identities, their answers will tell us thatwho they are—and not necessarily what they know.

To avoid this confound, I tried to select a set of items the correct responses to which were balanced with respect to the affective attitudes of “concern” and “skepticism.”  Scoring high on the test, then, would be possible only for those whose answers were not “entangled” in the sort of affective reaction that defines who they are, culturally speaking.

Second, I used a semantic device that has proven successful in disentangling identity and knowledge in measuring people’s positions on evolution.

As I’ve discussed in this blog (and as I illustrate with data in the paper), the true-false question “humans evolved from another species of animal” doesn’t measure understanding of evolution or science comprehension generally.  Rather it measures a form of identity indicated by religiosity.

But if one simply prefaces the statement “According to the theory of evolution,” the question elicits responses that don’t vary based on respondents’ religiosity. Because it doesn’t force them to renounce who they are, the reworded question makes it possible for religious respondents to indicate what they know about the position of science.  (The question is then revealed, too, to be far too easy to tell us anything interesting about how well the person answering it comprehends science.)

I thus used this same device in constructing the CSL items. I either prefaced true-false ones with the phrase “Climate scientists believe . . .” or used some other form of wording that clearly separated “knowledge” from “belief.”

3. The “results”

The results strongly supported the “dualistic” position—i.e., that what people know about climate change is unrelated to their “belief in” human-caused climate change.  Their position on that measures who they are in the same manner as items involving their political outlooks generally

In this way, it becomes possible to see that the cultural polarization that attends climate change is also not a consequence of the effect that cultural cognition has on people’s comprehension of climate science.

It is a consequence of the question that the “climate change” poses to ordinary citizens.

Democratic politics is one of the “knowledge transactions” that measures who one is, whose side one is on, not what one knows about the weight of the best scientific evidence.

People on both sides of the issue, it turns out, don’t know very much at all about climate science.

But if democratic politics were asking them “what they know,” the answer would be a bipartisan chorus of, “We are in deep shit.”

So climate communicators should be working on changing the meaning of the question—on creating conditions that, like the reworded evolution question and related classroom instructional techniques in that setting, make it possible for citizens to express what they know without renouncing who they are.

If you want to see how that's done, book yourself a flight down to SE Florida.  Right now.

4. The “holy shit!” part: the vindication of reason as a source of independent thinking

Now, finally, I get to what for me is the most gratifying part: the vindication of critical reasoning.

The CSL measured featured in the paper is positively correlated with science comprehension in both “liberal Democrats” and “conservative Republicans”!

Why is this so amazing?

As the 14 billion regular readers of this blog know, a signature of the pathology that has infected public discourse on climate change is the impact of science comprehension in magnifying polarization.

 The individuals whose science comprehension and critical reasoning dispositions are most acute are themost polarized.

What you *know*-- not who you are!

Experiments show that individuals high in the dispositions measured by science literacy batteries, the Cognitive Reflection Test, the Numeracy scale and the like use their reasoning proficiency to selective conform their assessment of evidence to the position that predominates in their group.

Polarization over climate change is not a sign that people in our society lack science comprehension.

It is proof how hostile the putrid spectacle of cultural status competition is to the value our society should be getting form the science intelligence it manages to impart in its citizens.

As the 14 billion regular readers know, too, this doesn’t amuse me.  On the contrary, it fills me with despair.

I was heartened in a simple “methods” sense that the CSL had the indicated relationship with science comprehension.  That the two rise in tandem helps to validate the CSL as a measure of what people know, and to corroborate the conclusion that “what do you believe about climate change?,” on which polarization increases as people become more science comprehending, measures nothing other than who they are, what side they are on.

But on an emotional level, I was much more than simply heartened.

I was elated to see the vitality of reason and critical thinking as a source of independent thinking and open-mindedness—to be assured that in fact this aspect of our intelligence hadn’t been annihilated by the sickness of cultural status competition, if it ever existed in the first place.

Remember, the CSL was deliberately designed to disentangle knowledge from identity. 

One of the central devices used to achieve this effect was to balance the items so that respondents’ affective orientation toward climate change—concern or skepticism—would be uncorrelated with their CSL scores.

Thus, to do well on the CSL, individuals had to answer the questions independently of their affective orientations, and hence with the source of them: their cultural identities.

The people who did that the most successfully were those who scored the highest in science comprehension, a disposition that features critical reasoning skills like cognitive reflection and  numeracy, as well as substantive science knowledge.

More later on this, but look: here are your Ludwicks!

This is what happens when one measures what people know.

But this is how it can be, too, in our political life.

If we can just make democratic politics into the sort of “knowledge-assessment transaction” that doesn’t  force people to choose between expressing what they know and expressing who they are.