follow CCP

Recent blog entries
popular papers

What Is the "Science of Science Communication"?

Climate-Science Communication and the Measurement Problem

Ideology, Motivated Cognition, and Cognitive Reflection: An Experimental Study

'Ideology' or 'Situation Sense'? An Experimental Investigation of Motivated Reasoning and Professional Judgment

A Risky Science Communication Environment for Vaccines

Motivated Numeracy and Enlightened Self-Government

Ideology, Motivated Cognition, and Cognitive Reflection: An Experimental Study

Making Climate Science Communication Evidence-based—All the Way Down 

Neutral Principles, Motivated Cognition, and Some Problems for Constitutional Law 

Cultural Cognition of Scientific Consensus

The Tragedy of the Risk-Perception Commons: Science Literacy and Climate Change

"They Saw a Protest": Cognitive Illiberalism and the Speech-Conduct Distinction 

Geoengineering and the Science Communication Environment: a Cross-Cultural Experiment

Fixing the Communications Failure

Why We Are Poles Apart on Climate Change

The Cognitively Illiberal State 

Who Fears the HPV Vaccine, Who Doesn't, and Why? An Experimental Study

Cultural Cognition of the Risks and Benefits of Nanotechnology

Whose Eyes Are You Going to Believe? An Empirical Examination of Scott v. Harris

Cultural Cognition and Public Policy

Culture, Cognition, and Consent: Who Perceives What, and Why, in "Acquaintance Rape" Cases

Culture and Identity-Protective Cognition: Explaining the White Male Effect

Fear of Democracy: A Cultural Evaluation of Sunstein on Risk

Cultural Cognition as a Conception of the Cultural Theory of Risk


Who distrusts whom about what in the climate science debate?

I had the privilege of being part of a panel discussion last Fri. at the great “Scienceonline Climate” conference in Wash. D.C. The other panel members were Tom Armstrong,  Director of National Coordination for the U.S. Global Change Research  in the  Office of Science and Technology Policy; and Michael Mann, Distinguished Professor of Meteorology & Director, Earth System Science Center at Penn State Universitly; Author on the Observed Climate Variability and Change chapter of the Intergovernmental Panel on Climate Change (IPCC) Third Scientific Assessment Report in 2001; organizing committee chair for the National Academy of Sciences Frontiers of Science in 2003; and contributing scientist to the 2007 Nobel Peace Prize awarded to the IPCC. Pretty cool!

Topic was “Credibility, Trust, Goodwill, and Persuasion.”  Moderator Liz Neely (who expended most of her energy skillfully moderating the length of my answers to questions) framed the discussion around the recent blogosphere conflagration ignited by Tamsin Edwards’ column in Guardian.

Edwards seemed to pin the blame for persistent public controversy over what’s known about climate change on climate scientist’s themselves, arguing that “advocacy by climate scientists has damaged trust in the science.”

Naturally, her comments provoked a barrage of counterarguments from climate scientists and others, many of whom argued that climate scientists are uniquely situated to guide public deliberations into alignment with the best available scientific evidence.

All very interesting!

But I have a different take from those on both sides. 

Indeed, the take is sufficiently removed from what both seem to assume about how scientists' position-taking influences public beliefs about climate change and other issues that I really just want to put that whole debate aside.

Instead I'll rehearse the points I tried to inject into the panel discussion (slides here).

If I can manage to get those points across, I think it won’t really be necessary, even, for me to say what I think about the contending claims about the role of “scientist advocacy” in the climate debate.  That’ll be clear enough.

Those points reduce to three:

1. Members of the public do trust scientists.

2. Members of culturally opposing groups distrust each other when they perceive their status is at risk in debates over public policy.

3. When facts become entangled in cultural status conflicts, members of opposing groups (all of whom do trust scientists) will form divergent perceptions of what scientists believe.

To make out these three points, I focused on two CCP studies, and an indisputable but tremendously important and easily ignored fact.

hi! click me!! Please?!The first study examined “who believes what and why” about the HPV vaccine. In it we found that members of the cultural groups who are most polarized on the risks and benefits of the HPV vaccine both treat the positions of public health experts as the most decisive factor.

Members of both groups have predispositions—ones that both shape their existing beliefs and motivate them to credit and discredit evidence in selectively in patterns that amplify polarization when they are exposed to information.

But members of both groups trust public health experts to identify what sorts of treatments are best for their children. They will thus completely change their positions if a trusted public health expert is identified as the source of evidence contrary to their cultural predispositions.

Of course, members of the public tend to trust experts whose cultural values they share. Accordingly, if they are presented with multiple putative experts of opposing cultural values, then they will identify the one whom they (tacitly!) perceive has values closest to their own as the real experts—the one who really knows what he’s talking about and can be trusted—and do what he (we used only white males in the study to avoid any confounds relating to race and gender) says.

me! me! click me!!!There is only one circumstance in which these dynamics produce polarization: when members of the public form the perception that the position they are culturally predisposed to accept is being uniformly advanced by experts whose values they share and positions they are culturally predisposed to reject are being uniformly advanced by experts whose values they reject.

That was the one we got in the real world...

The second study examined “cultural cognition of scientific consensus.” In that one, we examined how individuals identify expert scientists on culturally charged issues—viz., climate change, gun control, and nuclear waste disposal.

We found that when shown a single scientist with credentials that conventionally denote expertise —a PhD from a recognized major university, a position on the faculty of such a university, and membership in the National Academy of Sciences—individuals readily identified that scientist as an “expert” on the issue in question.

resistance is futile ... click ...But only if that scientist was depicted as endorsing the position that predominates among members of the subjects’ own cultural group. Otherwise, subjects dismissed the scientists’ views on the ground that he was not a genuine “expert” on the topic in question.

We offered the experiment as a model of how people process information about what “expert consensus” is in the real world.  When presented with information that is probative of what experts believe, people have to decide what significance to give it.  If, like the vast majority of our subjects, they credit evidence that is genuinely probative of expert opinion only when that evidence (including the position of a scientist with relevant credentials) matches the position that predominates in their cultural group, they will end up culturally polarized on what expert consensus is.

DO NOT CLICK ME!!Our study found that to be the case too. On all three of the risk issues in question—climate change, nuclear waste disposal, and laws allowing citizens to carry concealed hand guns—the members of our nationally representative sample all believed that “scientific consensus” was consistent with the position that predominates in their cultural group. They were all correct, too—1/3 of the time, at least if we use National Academy of Science expert consensus reports as our benchmark of what “expert consensus” is.


These studies, I submit, support points (1)-(3). 

No group's members understand themselves to be taking positions contrary to what expert scientists advocate.  They all believe that the position that predominates in their group is consistent with the views of expert scientists on the risks in question.

In other words, they recognize that science is a source of valid knowledge that they otherwise couldn’t obtain by their own devices, and that in fact one would have to be a real idiot to say, “Screw the scientists—I know what the truth is on climate, nuclear power, gun control, HPV vaccine etc & they don’t!”

That’s the way members of the public are.  Some people aren’t like that in our society—they don’t trust what scientists say on these kinds of issues. But they are really a teeny tiny minority (ordinary members of the public on both sides of these issues would regard them as oddballs, whack jobs, wing nuts, etc).

The tiny fraction of the population who “don’t trust scientists” aren’t playing any significant role in generating public conflict on climate or any of these other issues.

The reason we have these conflicts is because positions on these issues have become symbols of membership in, and loyalty to, the groups in question

Citizens have become convinced that people with values different from theirs are using claims about danger and risk to advance policies that intended to denigrate their way of life and make them the objects of contempt and ridicule.  As a result, these debates are pervaded by the distrust that citizens of opposing values have for one another when they perceive that a policy issue is a contest over the status of contending cultural groups.

When that happens, individuals don’t stop trusting scientists.  Rather, as a result of cultural cognition and like forms of motivated reasoning, they (all of them!) unconsciously conform the evidence of “what expert scientists believe” to their stake in protecting the status of their group and their own standing within it.

That pressure, moreover, doesn’t reliably lead them to the truth.  Indeed, it makes it inevitable that individuals of diverse outlooks will all suffer because of the barrier it creates betweeen democratic deliberations and the best available scientific evidence.

As I indicated, I also relied on a very obvious but tremendously important and easily ignored fact: that this sort of entanglement of “what scientists believe” and cultural status conflict is not normal.

It is pathological, both in the sense of being bad and being rare.

The number of consequential insights from decision-relevant science that generate cultural conflict is tiny—miniscule—relative to the number that don’t. There’s no meaningful cultural conflict over pasteurization of milk, high-power transmission lines, flouridation of water, cancer from cell phones (yes, some people in little enclaves are arguing about this—they get news coverage precisely because the media knows viewers in most parts of the country will find the protestors exotic, like strange species in zoo) or even the regulation of emissions from formaldehyde, etc etc etc etc. 

Moreover, there’s nothing about any particular  issue that makes cultural conflict about “necessary” or “inevitable.”  Indeed, some of the ones I listed are sources of real cultural conflict in Europe; all they have to do is look over here to see that things could have been otherwise.

And all we have to do is look around to see that things could have been otherwise for some of the issues that we are culturally divided on.

The HBV vaccine—the one that immunizes children against Hepatitis b—is no different in any material respect from the HPV vaccine.  Like the HPV vaccine, the HBV vaccine protects people from a sexually transmitted disease. Like the HPV vaccine, it has been identified by the CDC as appropriate for inclusion in the schedule of universal childhood vaccinations.  But unlike the HPV vaccine there is no controversy—cultural or otherwise—surrounding the HBV vaccine. It is on the list of “mandatory” vaccinations that are a condition of school enrollment in the vast majority of states; vaccinate rates are consistently above 90% (they are less than 30% in the target population for HPV) – and were so every year (2007-2011) in which proposals to make the HPV vaccine mandatory was a matter of intense controversy throughout the U.S.

The introduction of subsequent career of the HBV vaccine has been, thankfully, free of the distrust that culturally diverse groups experience toward each other when they are trying to make sense of what the scientific evidence is on the HPV vaccine.  Accordingly, members of those groups, all of whom trust scientists, are able reliably to see what the weight of scientific opinion is on that question.

So want to fix the science communication problem?

Then for sure deal with the trust issue!

But not the nonexistent one that supposedly exists between scientists and the public. 

The real one--between opposing cultural groups locked in needless, mindless, illiberal forms of status conflict that disable the rational faculties that ordinary citizens of all cultural outlooks ordinarily and reliably use to recognize what is known to science.


So what is "the best available scientific evidence" anyway?

A thoughtful person in the comment thread emanating (and emantating & emanating & emanating) from the last post asked me a question that was interesting, difficult, and important enough that I concluded it deserved its own post.

The question

... in your initial post you mention "best available evidence" no less than six times. And you may also have reiterated the phrase in some of your comments.

Perhaps you have identified your criteria for determining what constitutes "best available evidence" elsewhere; but for the benefit of those of us who might have missed it, perhaps you would be kind enough to articulate your criteria and/or source(s) for us. 

It is a rather nebulous phrase; however, I suppose it works as a very confident, if not all encompassing, modifier.  But as far as I can see, your post doesn't tell us specifically what "evidence" you are referring to (whether "best available" or not!)

Is "best available evidence" a new, improved "reframing" of the so-called "consensus" (that is not really holding up too well, these days)? Is it simply a way of sweeping aside the validity of any acknowledgement/discussion of the uncertainties? Or is it something completely different?!

My answer:

Well, to start, I most certainly do  think there is such a thing as "best available scientific evidence." Sometimes people seem to think “cultural cognition” implies that there “is no real truth” or that it is "impossible for anyone to say becaues it all depends on one's values" etc.  How absurd!

But I certainly don't have a set of criteria for identifying the “best available scientific evidence.” Rather I have an ability, one that is generally reliable but far from perfect, for recognizing it.  

I think that is all anyone has—all anyone possibly could have that could be of use to him or her in trying to be guided by what science knows.

For sure, I can identify a bunch of things that are part of what I'm seeing when I perceive what I believe is the best available scientific evidence.  These include, first and foremost, the origination of the scientific understanding in question in the methods of empirical observation and inference that are the signature of science's way of knowing.

Basic technique for recognizing the best available scientific evidenceBut those things I'm noticing (and there are obviously many more than that) don't add up to some sort of test or algorithm. (If you think it is puzzling that one might be able reliably to recognize things w/o being able to offer up any set of necessary and sufficient conditions or criteria for identifying them, you should learn about the fascinating profession of chick sexing!)

Moreover, even the things I'm seeing are usually being glimpsed only 2nd hand.  That is, I'm "taking it on someone's word" that all of the methods used are the proper and valid ones, and have actually been carried out and carried out properly and so on. 

As I said, I don't mean to be speaking only for myself here.  Everyone is constrained to recognize the best available scientific evidence.

That everyone includes scientists, too. Nullius in verba--the Royal Society motto that translates to "take no one's word for it"-- can't literally meant what it says: even Nobel Prize winners would never be able to make a contribution to their fields -- their lives are too short, and their brains too small--if they insisted on "figuring out everything for themselves" before adding to what's known within their areas of specialty.

What the motto is best understood as meaning is don't take the word of anyone except those whose claim to knowledge is based on science's way of knowing--by disciplined observation and inference-- as opposed to some other, nonempirical way grounded in the authority of a particular person's or institution's privileged insight.

Amen! But even identifying those people whose knowledge reflects science's empirical way of knowing requires (and always has) a reliably trained sense of recognition!

So no definition or logical algorithm for identification -- yet I and you and everyone else all manage pretty well in  recognizing the best available scientific evidence in all sorts of domains in which we must make decisions, individual and collective (and even in domains in which we might even be able to contribute to what is known through science).

I find this recognition faculty to be a remarkable  tribute to the rationality of our species, one that fills me with awe and with a deep, instinctive sense that I must try to respect the reason of others and their freedom to exercise it.

I understand disputes like climate change to be a consequence of conditions that disable this remarkable recognition faculty.

Chief among those is the entanglement of risks & other policy-relevant facts in antagonistic cultural meanings

This entanglement generates persistent division, in part b/c people typically exercise their "what is known to science" recognition faculty within cultural affinity groups, whose members  they understand and trust well enough to be able to figure out who really knows what about what (and who is really just full of shit).  If those groups end up transmitting opposing accounts of what the best available scientific evidence is on a particular policy-relevant fact, those who belong to them will end up persistently divided about what expert scientists believe.

Even more important, the entanglement of facts with culturally antagonistic meanings generates division b/c people will often have a more powerful psychic stake in forming and persisting in beliefs that fit their group identities than in "getting the right answer" from science's point of view, or in aligning themselves correctly w/ what the 'best scientific evidence is.”

After all, I can’t hurt myself or anyone else by making a mistake about what the best evidence is on climate change; I don’t matter enough as consumer, voter, “big mouth” etc. to have an impact, no matter what "mistake" I make in acting on a mistaken view of what is going on.

But if I take the wrong position on the issue relative the one that predominates in my group, and I might well cost myself the trust and respect of many on whose support I depend, emotionally, materially and otherwise.

The disablement of our reason – of our ability to recognize reliably (or reasonably reliably!) what is known to science --not only makes us stupid. It makes us likely to live lives that are much less prosperous and safe. 

It also has the ugly consequence of making us suspicious of one another, and anxious that our group, our identities, are under assault, and our status being put in jeopardy by the enactment of laws that, on their face seem to be about risk reduction, but that are regarded too as symbols of the contempt that others have for our values and ways of life.

Hence, the “pollution” of the “science communication environment” with these toxic cultural meanings deprives us of both of the major benefits of the Liberal Republic of Science: knowledge that we can use to improve our lives, individually and collectively; and the assurance that we will not, in submitting to legal obligation, be forced to acquiesce in a moral or political orthodoxy hostile to the view of the best life that we have the right as free and reasoning beings to choose for ourselves!

Well, I want to know, of course, what you think of all this.

But first, back to the questions that motivated the last post.

To answer them, I hope I've now shown you, you won't have to agree with me about what the "best available scientific evidence" is on climate change.  

Indeed, the science of science communication doesn't presuppose anything about the content of the best decision-relevant scientific evidence.  It assumes only two things: (1) that there is such a thing; and (2) that the question of how to enable its reliable apprehension by people who stand to benefit from it admits of and demands scientific inquiry. 

But here goes:

Climate skeptics (or the ones who are acting in good faith, and I fully believe that includes the vast majority of ordinary people -- 50% of them pretty much -- in our society who say they don't believe in AGW or accept that it poses significant risks to human wellbeing) believe that their position on climate change is based on the best available scientific evidence -- just as I believe mine is!

So: how do they explain why their view of what the best evidence on climate science is rejected by so many of their reasonable fellow citizens?

And what do they think should be done?

Not about climate change! 

About the science communication problem--by which I mean precisely the influences that are preventing us, as free reasoning people, from converging on the best available scientific evidence on climate change and a small number of other consequential issues (nuclear power, the HPV vaccine, the lethality of cats for birds, etc)? Converging in the way that we normally do on so many other consequential issues--so many many many more that no one could ever count them!?

I hope they have answers that aren't as poor, as devoid of evidence, as the ones in the blog post I critiqued, in which a skeptic offered a facile, evidence-free account of how people form perceptions of risk-- an account that turned on the very same imaginative, just-so aggregation of  mechanisms that get recycled among those trying without the benefit or hindrance of empirical studies of the same to explain why so many people don't accept scientific evidence on the sources and consequences of climate change.

I hope that they have some thoughts here, not because I am naive enough to think they -- any more than anyone on the other side -- will magically step forward and use what they know to dispel the cloud of toxic partisan confusion that is preventing us from seeing what is known here.

I hope that because I would like to think that once we get this sad matter behind us, and resume the patterns of trust and reciprocal cooperation that normally characterize the nonpathological state in which we are able to recognize the best available scientific evidence, there will be some better science of science communication evidence for us all to share with each other on how to to negotiate the profound and historic challenge we face in communicating what's known to science within a liberal democratic society.



What "climate skeptics" have in common with "believers": a stubborn attraction to evidence-free, just-so stories about the formation of public risk perceptions

My aim in studying the science of science communication is to advance practical understanding of how to promote constructive public engagement with the best available evidence—not to promote public acceptance of particular conclusions about what that evidence signifies or public support for any particular set of public policies.

When I address the sources of persistent public conflict over climate change, though, it seems pretty clear to me that those with a practical interest in using the best evidence on science communication are themselves predominantly focused on dispelling what they see as a failure on the part of the public to credit valid evidence on the extent, sources, and deleterious consequences of anthropogenic global warming.

I certainly have no problem with that! On the contrary, I'm eager to help them, both because I believe their efforts will promote more enlightened policymaking on climate change and because I believe their self-conscious use of evidence-based methods of science communication will itself enlarge knowledge on how to promote constructive public engagement with decision-relevant science generally. 

Indeed, I am generally willing and eager to counsel policy advocates no matter what their aim so long as they are seeking to achieve it by enhancing reasoned public engagement with valid scientific evidence (and am decidedly uninterested, and adamantly unwilling, to help anyone who wants to achieve a policy outcome, no matter how much I support the same, by means that involve misrepresenting evidence, manipulating the public, or otherwise bypassing ordinary citizens' use of their own reasoning powers to make up their own minds).

One thing that puzzles me, though, is why those who are skeptical about climate change don’t seem nearly as interested in practical science communication of this sort.

Actually, it’s clear enough that climate skeptics are interested in the sort of work that I and other researchers engaged in the empirical study of science communication do. I often observe them reflecting thoughtfully about that work, and I even engage them from time to time in interesting, informative discussion of these studies.

But I don’t see skeptics grappling in the earnest—even obsessive, anxious—way that climate-change policy advocates are with the task of how to promote better public understanding.

That seems weird to me. 

After all, there is a symmetry in the position of “believers” and “skeptics” in this regard. 

They disagree about what conclusion the best scientific evidence on climate change supports, obviously. But they both have to confront that approximately 50% of the U.S. public disagrees with their position on that.

The U.S. public has been and remains deeply divided on whether climate change is occurring, why, and what the impact of this will be (over this entire period, there’s also been a recurring, cyclical interest in proclaiming, on the basis of utterly inconclusive tib bits of information, that public conflict is dissipating and being superseded by an emerging popular demand for “decisive action” in response to the climate crisis; I’m not sure what explains this strange dynamic).

The obvious consequence of such confusion is divisive, disheartening conflict, and a disturbingly high likelihood that popularly accountable policymaking institutions will as a result fail to adopt policies consistent with the best available scientific evidence.

Don’t skeptics want to do something about this?

A great many of them honestly believe that the best available evidence supports their views (I really don’t doubt this is so). So why aren’t they holding conferences dedicated to making sense of the best available evidence on public science communication and how to use that evidence to guide the public toward a state of shared understanding more consistent with it?

I often ask skeptics who comment on blog posts here this question, and feel like I am yet to get a satisfying answer.

But maybe my mystification reflects biased sampling on my part.

Maybe, despite my desire to engage constructively with anyone whose own practical aims involve promoting constructive public engagement with scientific evidence, I am still being exposed to an unrepresentative segment of the population who fit that description, one over-representing climate-change believers.

I happened across something that made me think that might be so.

It consists of a blog post from a skeptic who is trying to explain to others who share the same orientation why it is that such a large fraction of the U.S. population believes that climate change resulting from fossil fuel consumption poses serious risks to human wellbeing.

As earnest and reflective as the account was, this climate skeptic’s account deployed exactly the same facile set of just-so tropes—constructed from the same evidence-free style of selective synthesizing of decision-science mechanisms—that continue to dominate, and distort, the thinking of climate change believers when they are addressing the “science communication problem.”


Why do people believe that global warming has already created bigger storms? Because when "experts" repeatedly tell us that global warming will wreck the Earth, we start to fit each bad storm into the disaster narrative that's already in our heads.

Also, attention-seeking media wail about increased property damage from hurricanes. . . .

Also, thanks to modern media and camera phones, we hear more about storms, and see the damage. People think Hurricane Katrina, which killed 1,800 people, was the deadliest storm ever. But the 1900 Galveston hurricane killed 10,000 people. We just didn't have so much media then.

Here they are, all the usual “culprits”: a “boundedly rational” public, whose reliance on heuristic forms of information-processing are being exploited by strategic misinformers, systematically biased by “unbalanced” media coverage and amplified by social media.

Every single element of this account—while plausible on its own—is in fact contrary to the best available evidence on public risk perception and the dynamics of science communication. 

  • Blaming the media is also pretty weak. The claim that "unbalanced" media coverage causes public controversy on climate change science is incompatible with cross-cultural evidence, which shows that US coverage is no different from coverage in other nations in which the public isn't polarized (e.g., Sweden). Indeed, the "media misinformation" claim has causation upside down, as  Kevin Arceneaux’s recent post helps to show. The media covers competing claims about the evidence because climate change is entangled in culturally antagonistic meanings, which in turn create persistent public demand for information on the nature of the conflict and for evidence that the readers who hold the relevant cultural identities can use to satisfy their interest in persisting in beliefs consistent with their identities. 
  • The “internet echo chamber” hypothesis is similarly devoid of evidence. There are plenty of evidence-based sources that address and dispel the general claim that the internet reinforces partisan exposure to and processing of evidence (sources that apparently can’t penetrate the internet echo chamber, which continues to propagate the echo-chamber claim despite the absence of evidence).

But here's one really simple way to tell that the blog writer's explanation of why people are overestimating the risks of climate change is patent B.S.: it is constructed out of exactly the same mechanisms that so many theorists on the other side of the debate imaginatively combine to explain why people are underestimating exactly the same risks. 

This is the tell-tale signature of a just-so story: it can explain anything one sees and its opposite equally well!

So what to say?

Well, it turns out that despite their disagreement about what the best scientific evidence on climate change signifies--about what the facts are, and about what policy responses are appropriately responsive to them—advocates in the “believer” and “skeptic” camps have some important common science communication interests.

They both have an interest in understanding it and using it, as I indicated at the outset.

But beyond that, they both have a stake in freeing themselves from the temptation to be regaled by story tellers, who, despite the abundance of evidence that now exists, remain committed to perpetually recycling empirically discredited just-so stories rather than making use of and extending the best available evidence on what the science communication problem consists in and how to fix it.


Partisan Media Are Not Destroying America

At the risk of creating an expectation for edification that we'll never again approach satisfying, CCP Blog again brings you an exclusive guest post by a foremost scholarly expert on an issue that everyone everywhere is astonishingly confused about! The expert is political scientist Kevin Arceneaux of Temple University. The issue is whether partisan cable news and related media outlets are driving conflict over climate change and other divisive issues by misinforming credulous members of the public and otherwise fanning the flames of political polarization. I've questioned this widely held view myself (see, e.g., here & here.)  But no one listens to me, of course.  Well now Arceneaux--employing the novel strategy of actually bringing evidence derived from valid empirical methods to bear--will straighten everything out once and for all. His post furnishes a preview--again, exclusively for the 14 billion readers of the CCP Blog!--of his soon-to-be-published book, Changing Minds, Changing Channels (Univ. Chicago Press 2013), co-authored with Martin Johnson. (Psssst ... you can actually download a couple of chapters in draft right now for free! Don't tell anybody!)

Kevin Arceneaux:

There is little doubt that the American legislative process has become more partisan and polarized. But is the same true for the mass public? For the most part, it seems that most Americans remain middle of the road. Rather than becoming more polarized, people mostly seem to have brought their policy positions in line with their partisan identification.

Despite the empirical evidence, many—especially pundits—cannot shake the notion that Americans are becoming more politically extreme and divided. Not only do many in the chattering class take mass polarization as a self-evident fact, the culprit is equally self-evident: the partisan news media.

On some level, I understand why this is such a popular conclusion. If political elites are so polarized, and clearly they are, it only seems intuitive that the same must be true for the mass citizenry. What’s more, people tend to overestimate the effects of media content on others, and what is the mass public if not masses of other people.

Nonetheless, in our soon-to-be published book Changing Minds or Changing Channels, Martin Johnson and I challenge the conventional wisdom that Fox News and MSNBC are responsible for polarizing the country.

We must keep in mind that in spite of their visibility to people like us who are politically engaged, relatively few people tune into shows like The O’Reilly Factor or The Rachel Maddow Show. For instance, voter turnout in the 2012 presidential election was roughly 12 times the size of the top-rated partisan talk show audiences on Fox News and MSNBC.

More important, people choose to watch partisan news audiences. The type of person who gravitates to partisan news shows is more politically and ideologically motivated than those who choose to watch mainstream news or tune out the news altogether, partisan or otherwise. People are not passive or particularly open-minded when it comes to political controversies. Not only do they choose what to watch on television, but they also choose whether to accept or reject the messages they receive from the televisions shows they watch.

In short, two forces simultaneously limit and blunt the effects of partisan news media. First, partisan news shows cannot polarize—in a direct sense—the multitude of Americans who do not tune into these shows. Second, the sort of people who actively choose to watch partisan news are precisely the sort of people who already possess strong opinions on politics and precisely the sort of people who should be less swayed by the content they view on these shows.

Wait—you may be thinking—don’t studies conclusively show that Fox News viewers know less about foreign events and express more conservative opinions on important policy issues like climate change?

The fact that people select into partisan news audiences also makes it difficult to study the effects of these shows. If people tune into Fox News because they care more about domestic political debates than foreign events or because they have conservative views, we would expect them to know less about foreign policy and distrust climate scientists even if Fox News did not exist.

What these studies do not and cannot tell us is the “counterfactual”:  What would Fox News viewers know and believe about politics if we lived in a world without Fox News?

The counterfactual is, of course, unknowable, and the central goal of causal inference is finding a way to estimate it. It turns out that observational designs do a terrible job at this.

Consequently, Martin and I turned to randomized experiments to investigate the effects of partisan media. By randomly assigning subjects to treatment and control groups, we are able to simulate the counterfactual by creating equivalent groups that experience different states of the world (e.g., one in which they watch Fox News and one in which they do not).

Using randomized experiments to study media effects has a long and successful history.

However, without modifications, the standard experimental design that assigns one group to a control group (e.g., no partisan news) and another group to a treatment group (e.g., partisan news) would not help us understand how selectivity—these choices we know viewers are making—influences the effects of partisan news shows. Forced exposure experiments (as we call them) allow one to estimate the effects of media content under the assumption that everyone is exposed to it. The current media environment, rife with abundant choice, makes it impossible for anyone to assume even a majority of viewers are exposed to a type of program, let alone everyone .

So, we modified the forced exposure experiment in two ways, which I'll describe in turn.

The first modification involved creating of a research design we call the Selective Exposure Experiment to compare a world where people had to watch partisan news to one that more closely approximates the one in which we live, where people can choose to watch entertainment programming instead. This experimental design starts with the forced exposure experimental design as its foundation. We randomly assigned some people to watch partisan news and some people to a control group where they could only watch an entertainment show.

These conditions allow us to estimate the effects of partisan news if people had no choice but to watch it. To get at the effects of selectivity, we randomly assigned a final group of subjects to a condition where they could watch any of the programs in the forced exposure conditions at will. We gave these subjects a remote control and allowed them to explore the partisan news programs and entertainment shows just as they would at home. They were free to watch all of a show, none of it, or flip back and forth among shows if that’s what they wanted to do.

The Selective Exposure Experiments taught us that the presence of choice blunted the effects of partisan news shows. To take one example from the book, we conducted an experiment in which some people watched a likeminded, or proattitudinal, news program (e.g., a conservative watching Fox) about the health care debate back in 2010; others watched an oppositional, or counterattitudinal, news program (e.g., a liberal watching Fox) on the same topic; others watched basic cable entertainment fare, devoid of politics; and finally, a group of subjects were allowed to choose among these shows freely.

The figure below summarizes the results from this Selective Exposure Experiment. The bars represent how polarized liberals and conservatives are after completing the viewing condition.

Across a number of aspects in the health care debate—how people rate the major political parties to deal with the issue, the personal impact of the policy, and the wisdom of the public opinion, individual mandate, and plan to raise taxes on the wealthy—forced exposure to both pro- and counterattitudinal shows increased polarization. So, it is clear that partisan shows can polarize.

However, subjects in the choice condition were much less polarized. Keep in mind that subjects in the choice condition only had four options from which to choose. Had we given subjects over 100 channels to choose from, as is commonplace in most households today, we can only imagine that these effects would have been even smaller.

Figure 4.2 in Arceneaux and Johnson (2013)

Next, we wished to sort out why we observed smaller effects in the choice condition. Undoubtedly, part of the explanation has to be that with fewer people watching, one should observe smaller overall effects. Recall, though, that we also anticipate that those who seek out partisan news—news-seekers as Markus Prior calls them—should be less susceptible to partisan news effects.

It was to investigate this hypothesis that we devised our second modification of  the standard forced-exposure experiment. 

In a design we call the Participant Preference Experiment, we measured people’s viewing preferences before randomly assigning them to view a proattitudinal, counterattitudinal, or entertainment show. Measuring viewing preferences before exposure to the stimuli allows us to gauge whether news-seekers react differently to partisan news than entertainment-seekers.

The figure below shows the results from one of these experiments. The news programs in these experiments focused on the controversy around raising taxes on the top income earners. Across a number of issue questions on the topic, we find that partisan news shows do more to polarize entertainment-seekers forced to watch the partisan news program than it does among news-seekers who often watch these shows.

Figure 4.4 in Arceneaux and Johnson (2013)

Note that the proattitudinal program had almost no effect on news-seekers, while the counterattitudinal show did. If people tend to gravitate toward likeminded news programming and entertainment seekers tend to tune out news, then these findings suggest that the direct effects of partisan news should be minimal.

As an aside, notice that the counterattitudinal news programming across all of these studies, if anything, polarizes those who are forced to watch it. Not only is this finding consistent with our thesis that people are not passive, blank slates (they can reject messages with which they disagree!), but it also undermines the Pollyanna notion that if people would just listen to the other side, the country would be a more tolerant and moderate place.

Finally, let me be clear that Martin and I are not arguing that partisan news shows have no effects. For one, they seem to lead many people to perceive that the country is more polarized, even if it isn’t. For another, they may have indirect effects on politics by energizing viewers (if not changing their minds) to contact their elected officials and vocalize their extreme opinions. Fox and MSNBC may indeed be a polarizing force in politics, but it is unlikely that it is causing masses of people to be more and more extreme.


More on disgust: Both liberals and conservatives *feel* it, but what contribution is it really making to their moral appraisals?

It’s been far far too long-- over a week!-- since we discussed disgust and its relationship to political ideology.  Part of the reason is that after the guest post by Yoel Inbar, the prospects for finding someone who could actually say anyting that would enlarge the knowledge of this site's 14 billion regular readers (NOTE: JOKE; DO NOT CIRCULATE OR ATTRIBUTE “14 billion" FIGURE) seemed extremely remote.  But we did it! Today, yet another sterling guest post on this topic from Dr. Sophie Russell, a psychologist at the University of Surrey. 

Russell has published a number of extremely important studies on the contribution that emotions make to moral judgment. She also is the co-auhtor—along with Roger Giner-Sorrola, another leading moral psychologist who has collaborated with Russell in the study of disgust—of an important review paper that concludes that disgust is a highly unreliable source of moral guidance generally and a source of moral perception distinctively inimical to the values of a “liberal society because it ignores factors . . . such as intentionality, harm, and justifiability.” That paper figured in the interesting discussion of Inbar’s essay.  Now she offers her own views: 

Sophie Russell:

Sophie RussellSo, is disgust reserved for conservatives? My answer to this question is no.  But rather, liberals and conservatives may show differences in their associations between disgust and moral judgement.

People feel disgust toward many different acts (such as incest, sexual fetishes, eating lab grown meat etc.), but this does not necessarily mean that they think it is morally wrong too.

I think what we should be asking ourselves is how easily can individuals separate their feelings of disgust from judgements of wrongdoing.

One thing that is clear from some of our research is that disgust has a different relationship with moral judgement than anger, in terms of how intertwined they are.  For example, we have found that after individuals consider the current context they change their feelings of anger but not their feelings of disgust toward harmful acts and bodily norm violations, and changes in anger relate to changes in moral judgement (Russell & Giner-Sorolla, 2011).

In another line of research we have also found that feelings of anger are associated with the ability to come up with mitigating circumstances for immoral acts but disgust is unrelated to whether or not people can imagine mitigating circumstances(Piazza, Russell, & Sousa, 2012). The story from both lines of research is that in general people can disentangle their feelings of disgust from judgements of wrongness, while this is not the case with anger.  It seems as if their feelings of disgust remain.  So, should we care if someone finds something disgusting? I think we should still be concerned about this because disgust is a withdrawal emotion, so people will still want to avoid the person or thing they may find disgusting, they just may not have the moral conviction that others need to agree with them.

Our findings follow on from a long laundry list of appraisals that work to make sure that anger is properly directed, such as: Is the behaviour justified; Is the behaviour intentional? Is the behaviour harmful, Is the behaviour unfair etc. (see Russell & Giner-Sorolla, 2013 for a review). It is less clear how we assess if something is disgusting depending on the current context; that is, what is the essence or concept that makes something disgusting in a given context. It seems as if judgements of disgust are tied to the specific person or object whilst anger is associated with more abstract appraisals of the current situation.

Supporting this distinction through the analysis of post-hoc justifications, we have found that people find it very hard to articulate why they think non-normative sexual acts are disgusting (Russell & Giner-Sorolla, 2011).

I think this effect will be the same for both conservatives and liberals because essentially this phrase ‘X is disgusting’ serves a very strong communicative function and we are not pushed/motivated to explain what we mean.  For this reason we may use this phrase towards things that are not literally evoking the disgust emotion, in order to signal that we want to break off all ties from this thing.

Both conservatives and liberals use this phrase frequently because of its potency, but this phrase does not necessarily mean that they actually feel physical revulsion.

I think another difference between anger and disgust that can cause a divide between conservatives and liberals is that anger is mainly relevant when there is a clear victim while disgust is relevant to “victimless” acts between consenting individuals (Piazza & Russell, in preparation).  

For example, in this research we looked at the impact of individuals giving consent toa range of sexual behaviours, such as necrophilia, incest, and sexual relations with a transgender individual. We found that people feel significantly more anger toward a wrongdoer when consent is absent versus present, and this relationship is mediated by justice appraisals.

On the other hand, individuals feel significantly more disgust when the recipient of wrongdoing consents to action versus not, thus, we feel disgust towards both people that consented to the act. This relationship is mediated by judgments of perverse character, which supports the view that disgust is based on judgments of the person or object, rather than the outcome or situation.  Thus, it seems as if anger is the more relevant emotion when there is a clear victim.

So, my conclusion is that for both liberals and conservatives, disgust is focused on the person while anger is focused on the circumstances and consequences, which is problematic if we want people to consider changes across time, context, and relationships.

On a separate note, something that is also interesting to me and I would like to leave with you,  is that when I include things like political orientation or disgust sensitivity as moderators when I conduct studies in the UKI find that they have very little to no influence on the effects that I find. However, if I include them whilst collecting an American Mturk sample they gain importance. So, I am really interested to know what you think about this.


Piazza, J., Russell, P.S. & Sousa, P. Moral emotions and the envisaging of mitigating circumstances for wrongdoing. Cognition & Emotion 27, 707-722 (2012).


Homework assignment: what's the relationship between science literacy & persistent political conflict over decision-relevant science?

I've agreed to do a talk at the annual American Geophysical Union in December. It will be part of a collection on "climate science literacy."

Here's the synopsis I submitted:

The value of civic science literacy

The persistence of public conflict over climate change is commonly understood to be evidence of the cost democracy bears as a result of the failure of citizens to recognize the best available decision-relevant science. This conclusion is true; what’s not is the usual understanding of cause and effect that accompanies this perspective. Ordinarily, the inability of citizens to comprehend decision-relevant science is identified as the source of persistent political conflict over climate change (along with myriad other issues that feature disputed facts that admit of scientific investigation). The truth, however, is that it is the persistence of public conflict that disables citizens from recognizing and making effective use of decision-relevant science. As a result, efforts to promote civic science literacy can’t be expected to dissipate such conflict. Instead, the root, cultural and psychological sources of such conflict must themselves be extinguished (with the use of tools and strategies themselves identified through valid scientific inquiry) so that our democracy can realize the value of educators' considerable skills in making citizens science literate. 

I have ideas along these lines -- ones that have figured in various papers I've written, informed various studies I've worked on, and appeared in one or another blog posts on this site.

But I haven't come close to working all this out.  

What's more, I worry (as always) that I could be completely wrong about everything.

So I welcome reflections by others on the basic claim expressed here-- reflections on how to convey it effectively; on what to do about the practical problem it reflects; but also on how to continue to probe and test to see whether it is true and to help identify any alterative account that's even more well founded and that furnishes an even more useful guide to action.

So get going-- don't put this off until the day before the talk & pull an all nighter! 



Can we SENCERize the communication of science?

I had the tremendous privilege—which yielded an even larger benefit in enlargement of personal knowledge—of being able to participate in the SENCER summer institute at Santa Clara University last week.

SENCER—which stands for Science Education for New Civic Engagements and Responsibilities—is an integrated set of practical research initiatives aimed at promoting the development and use of scientific knowledge on how to teach science.  It is actually one of a family of programs create to carry out the broader mission of the National Center for Science and Civic Engagement, “to inspire, support, and disseminate campus-based science education reform strategies that strengthen learning and build civic accountability among students in colleges and universities.”

It’s not amusing that those job it is to impart knowledge on empirical methods so infrequently even ask themselves whether their own methods for doing so—from the mode of teaching they use in the classroom to the materials and exercises they assign to students to the examinations they administer to test student comprehension—are valid and reliable.

On the contrary, it’s an outright scandal that demeans the culture of science.

SENCER comprises a sprawling, relentless, and expanding array of resources aimed at dissolving this embarrassing contradiction. These include a growing stockpile of empirical research findings; a trove of practical materials designed to enable use of this knowledge to improve science education; the sponsorship of regular events at which such knowledge is shared and plans for enlarging it formulated; a set of regional centers that coordinate efforts to promote evidence-based methods in the teaching of science; and most important of all a critical mass of intelligent and passionate people committed to the program’s ends.

The occasion for SENCER—the peculiar insularity of a craft dedicated to propagating valid empirical methods from empirical evidence relating to the realization of its own goals—is not unique to science education.

It is at the root, too, of what I have called the science communication problem—the failure of ample, compelling, readily accessible and indeed widely disseminated evidence to quiet persistent public controversy over risks and other facts to which that evidence directly speaks. Climate change is, of course, the most conspicuous example of the science communication problem but it is hardly the only consequential instance of it.

Immense resources are being dedicated to solving this problem and appropriately so.

But the aggressive resistance to evidence-based practice that pervades the climate-change advocacy community and their counterparts on other issues means that the vast majority of these resources are simply wasted. 

I’m not kidding: hundreds and hundreds of millions of dollars are foreseeably expended on programs that are certain not to have any positive impact (aside from raising the profile of those who operate the programs)—not so much because the initiatives being sponsored are ill-considered (although many indisputably are!) but because those who are being awarded the money to carry them out aren’t genuinely committed (or maybe just not genuinely capable) of considering empirical evidence. 

They don’t meaningfully engage existing evidence on communication dynamics to determine what psychological and political mechanisms their initiatives presuppose and what is known about those mechanisms.

They don’t carry out their initiatives in a manner that is geared to generating what might be called programmatic evidence in the form of pretest results or early-return data that can be used to refine and calibrate communication efforts as they are unfolding.

And worst of all, they lack any protocols that assure information on the impact of their efforts (including the lack thereof) is collected, preserved, and freely distributed in the manner that enables the progressive accretion of knowledge.

Instead, every surmise from every source—no matter how innocent of the conclusions of those who have previously used scientific methods to test theirs—is created equal in the world of science communication advocacy. 

Everyday is a new day, to be experienced free of the burden to take seriously what was learned (from failure as well as success) the day before.

I have written a paper about this.

So has Amy Luers, in a perceptive, evidence-informed article in Climatic Change that was addressed specifically to the foundations that are the primary sources of support for efforts to promote constructive engagement with climate science.

Her article is evidence of a heartening awareness that the evidence-free culture that has characterized science communication in this area of public policy and others is barren of the supportive practices and habits and outlooks that nourish growth of empirical knowledge.

Maybe things will change.

But there are still other science-communication professions that are puzzlingly—unacceptably, intolerably!—innocent of science in their own operations.

Science journalism—including (here) popular science writing and science documentary production as well as science news writing—is one. 

I have said before that I regard these professionals with awe—and gratitude, too.  Much as the bumblebee defies the calculations of physicists who insist that their capacity for flight defies physical laws, so science journalists seem to defy basic mechanisms of psychology by creating a form of commensurability in understanding that enables the curious nonscientist to participate in—and thus experience the wonder of—what scientists, by applying their highly specialized knowledge, discover about the mysteries of nature.

There is no communication alchemy involved here. Using a form of professional judgment exquisitely tuned by experience, the science journalist mines the fields of common cultural understanding for the resources needed to construct this remarkably engineered bridge of insight.

Yet how to do what they do is a matter that constantly confronts the members of this special profession with factual questions that they themselves do not have confident answers to—or have confident but conflating opinions about.

Do norms of journalistic neutrality—such as “balanced” coverage of science issues that generate controversy, within science or without—distort public understanding or help inform curious individuals of the nature of competing claims?

Is the segment of the population that experiences wonder and awe at scientific discovery more culturally diverse than the one than the current regular consumers of the highest quality science documentaries? If so, do those programs convey meanings collateral to their core, scientific content that constrain the size and diversity of their audience?

(These are issues that figured, actually, in two of the sessions of my Science of Science Communication course from last spring; I am delinquent in my promise to report on the nature of those sessions.)

These are empirical questions, ones the answers to which would be made better if journalists had evidence generated specifically to informing the ongoing collective discussion and practice that are the source of their craft knowledge.  But instead, we see here, too, the sort of “every-conjecture-created-equal,” “every-day-a-new-day” style of engagement that is the signature of evidence-free, nonscientific thought that by its nature is incapable of creating incremental enlargement of knowledge.

I could go on; not just about science journalism, but about many other evidence-or science-communication professions that are evidence-free about the nature of their own practices. Like the law, e.g.

But the point is that these professions, too, are ripe for SENCERizing.  They need to be fortified with the sorts of resources and programs that SENCER comprises.  And to get that fortification they require a core of practitioners who not only agree with this philosophy—I think they all already have them, actually—but also structures of collective action that will, through the dynamics of reciprocity, create the self-reinforcing contributions of those practioners to those resources and programs.

SENCER itself might be well be a vehicle for such developments.  It’s gracious invitation to me to participate in its summer institute reflects the interest of its members in enlarging the scope of their endeavor to the communication of decision-relevant science.

But it would be a mistake to think that SENCERizing science communication generally means relying on SENCER, or SENCER alone, to facilitate the advent of evidence-based practices within the relevant science-communication professions.

The remarkable founder of SENCER, Wm. David Burns, made this clear to me, in fact.

I asked him if he himself regarded the program as an “engine for” or a “model of” what needs to be done to make science education and science communication generally more evidence based.

He answered that the only appropriate way to think of SENCER is as an “experiment” of a fractal nature: by enabling those who believe science education must be evidence based to continuously form, refine, and test competing conjectures about how to build on and refine their knowledge of how to effectively impart scientific knowledge, SENCER itself is a test of a hypothesis that the particular mode of organization that it is and will become in such a process is an effective way to achieve its own ends.

SENCER, then, is surely a model (an iterative, self-updating one at that!) of the style of conjecture and refutation that is the engine that drives scientific discovery.

And such a model is necessarily one that cannot be reduced to a particular form or formula. For the very logic on which its own success is founded consists in the continuous engagement of competing models, whose successive remedies for one another's inevitable imperfections are what continuously make us smarter than we were before.


Weekend update: Yale professor does *what*, you say?

Maybe @Paul Mathews has a point after all, but I think the commenter who offered to sell the Brooklyn Bridge to the author of this blog post has the better of the argument (you might have thought the title of my post would have given him a clue as well).

Needless to say, I am a tad anxious about Preet Bharara getting wind of all this...


Motivated system 2 reasoning--experimental evidence & its significance for explaining political polarization

My paper Ideology, Motivated Reasoning, and Cognitive Reflection was published today in the journal Judgment and Decision Making.

I’ve blogged on the study that is the focus of the paper before.  In those posts, I focused on the relationship of the study to the “asymmetry thesis,” the view that ideologically motivated reasoning is distinctive of (or at least disproportionately associated with) conservativism.

The study does, I believe, shed light on (by ripping a fairly decent-sized hole in) the asymmetry thesis. But the actual motivation for and significance of the study lie elsewhere.

The cultural cognition thesis (CCT) holds that individuals can be expected to form risk perceptions that reflect and reinforce their connection to groups whose members subscribe to shared understandings of the best life and the ideal society.

It is opposed to various other accounts of public controversy over societal risks, the most significant of which, in my view, is the bounded rationality thesis (BRT)

Associated most prominently with Kahneman’s account of dual process reasoning, BRT attributes persistent conflict over climate change, nuclear power, gun control, the HPV vaccine, etc. to the public’s over-reliance on rapid, visceral, affect-laden, heuristic reasoning—“System 1” in Kahneman’s terms—as opposed to more deliberate, conscious, analytical reasoning— “System 2,” which is the kind of thinking, BRT theorists assert, that characterizes the risk assessments of scientists and other experts.

BRT is quite plausible—indeed, every bit as plausible, I’m happy to admit—as CCT. Nearly all interesting problems in social life admit of multiple plausible but inconsistent explanations.  Likely that’s what makes them interesting.  It’s also what makes empirical testing—as opposed to story-telling—the only valid way to figure out why such problems exist and how to solve them

In my view, every Cultural Cognition Project study is a contribution to the testing of CCT and BRT.  Every one of them seeks to generate empirical observations from which valid inferences can be drawn that give us more reason than we otherwise would have had to view either CCT or BRT as more likely to be true.

click on it -- you know you can't resist!In one such study, CCP researchers examined the relationship between perceptions of climate change risk, on the one hand, and science literacy and numeracy, on the other. If the reason that the public is confused (that’s one way to characterize polarization) about climate change and other risk issues (we examined nuclear power risk perceptions in this study too) is that it doesn’t know what scientists know or think the way scientists think, then one would expect convergence in risk perceptions among those members of the public who are highest in science literacy and technical reasoning ability.

The study didn’t find that.  On the contrary, it found that members of the public highest in science literacy and numeracy are the most divided on climate change risks (nuclear power ones too).

That’s contrary to what BRT would predict, particularly insofar as numeracy is a very powerful indicator of the disposition to use “slow” System 2 reasoning.

That science literacy and numeracy magnify rather than dissipate polarization is strongly supportive of CCT.  If people are unconsciously motivated to fit their perceptions of risk and comparable facts to their group commitments, then those who enjoy highly developed reasoning capacities and dispositions can be expected to use those abilities to achieve that end.

In effect, by opportunistically engaging in System 2 reasoning, they’ll do an even “better” job at forming culturally congruent perceptions of risk.

Now enter Ideology, Motivated Reasoning, and Cognitive Reflection. The study featured in that paper was aimed at further probing and testing of that interpretation of the results of the earlier CCP study on science literacy/numeracy and climate change polarization.

The Ideology, Motivated Reasoning, and Cognitive Reflection study was in the nature of experimental follow up aimed at testing the hypothesis that individuals of diverse cultural predispositions will use their “System 2” reasoning dispositions opportunistically to form culturally congenial beliefs and avoid forming culturally dissonant ones.

The experiment reported in the paper corroborates that hypothesis.  That is, it shows that individuals who are disposed to use “System 2” reasoning—measured in this study by use of the Cognitive Reflection Test, another performance based measure of the disposition to use deliberate, conscious (“slow”) as opposed to heuristic-driven (“fast”) reasoning—exhibit greater motivated reasoning with respect to evidence that either affirms or challenges their ideological predispositions.

The evidence on which subjects demonstrated motivated reasoning concerned how “closed-minded” and “unreflective” individuals of opposing ideologies are.

Closed mindedness” is a very undesirable trait generally.

It’s also what those on each side of politically polarized debates like the one over climate change identify as the explanation for the other’s refusal to accept what each side sees as the clear empirical evidence in favor of its own position.

One might thus expect individuals who have a stake in forming perceptions of facts congenial to their cultural commitments to react in a defensive way to evidence that those who share their commitments are less “open-minded” and “reflective” than those who harbor opposing commtiments.

So I tested that.  I advised subjects that psychological evidence suggests that the Cognitive Reflection Test measures “open-mindedness” (some psychologists take that position; I actually think they are wrong—as I’ll explain in a moment!).  Members of a control group were told no more than this.  But subjects in two other groups were told either that climate change “skeptics” score higher than climate change “believers” or vice versa.

I found that subjects displayed motivated reasoning with respect to the evidence of the “validity” of the Cognitive Reflection Test as a measure of “open mindedness.” That is, they credited the evidence that the CRT is a “valid” test of “open-mindedness” and “reflection” much more readily if they were advised that individuals who hold the climate-change position consistent with the subjects’ ideologies scored higher, but rejected that evidence when they were informed that those same individuals score lower, than individuals with the opposing position on climate change.

Moreover, this tendency was highest among individuals with the highest Cognitive Reflection Test scores.

That finding is highly inconsistent with BRT, which assumes that a deficit in System 2 reasoning capacities explains the failure of the members of the public to converge on conclusions supported by the best available decision-relevant science.

But it very much consistent with CCT, which predicts that individuals will use their System 2 reasoning capacities strategically and opportunistically to reinforce beliefs that the their cultural group’s positions on such issues reflect the best available evidence and that opposing groups’ positions do not.

It's consistent, too, with a growing collection of findings in political psychology.  This research shows not only that ideologically motivated reasoning drives political polarization (generating perverse effects, e.g., like hardening of commitment to mistaken beliefs when "fact checkers" try to correct false claims), but also that this effect intensifies as individuals become more sophisticated about politics.

Some could have attributed this effect to a convergence between political knowledge and intensity of partisanship.  But the result in my study makes it more plausible to see the magnification of polarization associated with political knowledge as reflecting the tendency of people who simply have a better comprehension of matters political to use their knowledge in an opportunistic way so as to maintain congruence between their beliefs and their ideological identities. (I've addressed before how "cultural cognition" relates to the concept of ideologically motivated reasoning generally, and will even say a bit more on that below.)

As for the asymmetry thesis, the study also found, as predicted, that this tendency was symmetric with respect to right-left ideology.  That’s not what scholars who rely on the “neo-authoritarian personality” literature—which rests on correlations between conservativism and various self-report measures of “open-mindedness”—would likely have expected to see here.

Interestingly, I also found that there is no meaningful correlation between cognitive reflection and conservativism.

The Cognitive Reflection Test is considered a “performance” or “behavioral” based “corroborator” of the self-report tests (like “Need for Cognition,” which involves agreement or disagreement with statements like “I usually end up deliberating about issues even when they do not affect  me personally” and "thinking is not my idea of fun") that are the basis of the neo-authoritarian-personality literature on which “asymmetry thesis” rests.

It has also been featured in numerous studies that show that religiosity, which is indeed negatively correlated with cognitive reflection, predicts greater resistance to engaging evidence that challenges pre-existing beliefs.

Accordingly, one might have expected, if the “asymmetry thesis” is correct, that Cognitive Reflection Test scores would be negatively correlated with conservativism.  Studies based on nonrepresentative samples—ones consisting of M Turk workers or of individuals who visited a web site dedicated to disseminating research findings on moral reasoning style—have reported such a finding.

But in my large, nationally representative sample, scores on the Cognitive Reflection Test were not meaningfully correlated with political outlooks.

Actually, there was a very small positive correlation between cognitive reflection and identification with the Republican Party.  But it was too tiny to be of any consequence for anything as consequentially large as the conflict over climate change.

Moreover, there was essentially zero correlation between cognitive reflection and a more reliable, composite measure of ideology and political party membership.

Because I think the only valid way to test for motivated reasoning is to do do experimental tests that feature that phenomenon, I don’t really care that much about correlations between cognitive style measures and ideology.

But if I were someone who did think that such correlations were important, I’d likely find it pretty interesting that conservativism doesn’t correlate with Cognitive Reflection Test scores.  Because this test is now widely regarded as a better measure of the disposition to engage in critical reasoning than are the variety of self-report measures on which the “asymmetry thesis” thesis literature rests—and, as I said, has been featured prominently in recent studies of the cognitive reasoning style associated with religiosity—the lack of any correlation between it and conservative political outlooks raises some significant questions about exactly what the correlations reported in that literature were truly measuring.

For this reason, I anticipate that “asymmetry thesis” supporters will focus their attention on this particular finding in the study.  Yet it’s actually not the finding that is most damaging to the “asymmetry thesis”; the experimental finding of symmetry in motivated reasoning is!  Indeed, I obviously don’t think the Cognitive Reflection Test—or any other measure of effortful, conscious information processing for that matter—is a valid test of open-mindedness (which isn't to say there might not be one; I'd love to find it!).  But it has been amusing—a kind of illustration of the experiment result itself—to see “asymmetry thesis” proponents, in various responses to the working paper version of the study, attack the the Cognitive Reflection Test as “invalid” as a measure of the sort of “closed mindedness” that their position rests on!

One final note:

The study characterizes differences in individuals’ predispositions with a measure of their right-left political leanings rather than their cultural worldviews. I’ve explained before that “liberal-conservative ideology” and “cultural worldviews” can be viewed as alternative observable “indicators” of the same latent motivating disposition.  I think cultural worldviews are better, but I used political outlooks here in order to maximize engagement with those researchers who study motivatated reasoning in political psychology, including those who are interested in the “asymmetry thesis,” the probing of which was, as indicated, a secondary but still important objective of the study. I have also analyzed the study data using cultural worldviews as the predisposition measure and reported the results in a separate blog post.


Weekend update 2: Money talks, bullshit on scientific consensus (including lack thereof) walks

The comment thread following yesterday's "update" on the persistent, and persistently unenlightening, debate over the most recent "97% consensus" study has only renewed my conviction that anyone genuinely interested in helping confused and curious members of the public to assess the significance of the best available evidence on climate change would not be bothering with surveys of scientists but would instead be creating a market index in securities the value of which depends on global warming actually occurring.

I've explained previously how such an index would operate as a beacon of collective wisdom, beaming a signal of considered judgment through a filter of economic self-interest that removes the distorting influence of cultural cognition & like forms of bias.

I just instructed my broker to place an order for $153,252 worth of stocks in firms engaged in arctic shipping. I wonder how many of the people arguing against the validity of the Cook et al. study are shorting those same securities?





Weekend update: The distracting, counterproductive "97% consensus" debate grinds on

I don’t want to go back there but since 10's of millions of people get all their news exclusively from this blog (oh, btw, there was a royal baby, everyone, in case any of you care) I felt that I ought to note that controversy continues to attend the Cook et al. study that, “97%” of climate scientists agree that human activity is contributing to climate change.

Studies making materially identical findings have been appearing at regular intervals for the better part of a decade. Every time, they are widely heralded; indeed, the media have been saturated with claims that there is “scientific consensus” on climate change since at least 2006, when Al Gore made that message the centerpiece of a $300-million effort to build public support for policies to reduce carbon emissions in the U.S.

But it is demonstrably the case (I'm talking real-world evidence here) that the regular issuance of these studies, and the steady drum beat of “climate skeptics are ignoring scientific consensus!” that accompany them, have had no—zero, zilch—net effect on professions of public “belief” in human-caused climate change in the U.S.

On the contrary, there’s good reason to believe that the self-righteous and contemptuous tone with which the “scientific consensus” point is typically advanced (“assault on reason,” “the debate is over” etc.) deepens polarization.  That's because "scientific consensus," when used as a rhetorical bludgeon, predictably excites reciprocally contemptuous and recriminatory responses by those who are being beaten about the head and neck with it.

Such a mode of discourse doesn't help the public to figure out what scientists believe. But it makes it as clear as day to them that climate change is an "us-vs.-them" cultural conflict, in which those who stray from the position that dominates in their group will be stigmatized as traitors within their communities.  

This is not a condition conducive to enlightened self-government.

Nevertheless, the authors of the most recent study announced (in a press release issued by the lead author’s university) that “when people understand that scientists agree on global warming, they’re more likely support politics that take action on it,” a conclusion from which the authors inferred that “making the results of our paper widely-known is an important step toward closing the consensus gap and increasing public support for meaningful climate change.”

Unsurprisingly, the study has in the months since its publication supplied a focal target for climate skeptics, who have challenged the methods the authors employ.

It’s silly to imagine that ordinary members of the public can be made familiar with results of particular studies like this.  

But it’s very predictable that they will get wind of continuing controversy over “what scientists believe” so long as advocates keep engaging in impassioned, bitter, acrimonious debates about the validity of studies like this one.

That’s too bad because, again, the best evidence on why the public remains divided on climate change is the surfeit of cues that the issue is one that culturally divides people.  Those cues motivate members of the public to reject any evidence of “scientific consensus” that suggests it is contrary to the position that predominates in their group. Under these circumstances, one can keep telling people that there is scientific consensus on issues of undeniable practical significance, and a substantial proportion of them just won’t believe what one is saying.

The debate over the latest “97%” paper multiplies the stock of cues that climate change is an issue that defines people as members of opposing cultural groups. It thus deepens the wellsprings of motivation that they have to engage evidence in a way that reinforces what they already believe. The recklessness  that the authors displayed in fanning the flames of unreason that fuels this dynamic is what motivated me to express dismay over the new study.

But look: Matters like these are admittedly complex and open to reasonable disagreement. I could be wrong, and I welcome evidence & reasoned argument that would give me reason to revise my views. In the best spirit of scholarly conversation, the lead author of the latest "97%" study, John Cook, penned a very perceptive, engaging, and gracious response--and I urge people to take a look at it & decide for themselves if my reaction was well-founded.

So what’s the new development?

Mike Hulme, a climate scientist who is famous for his own conjectures about public conflict over climate change has apparently added his voice to the chorus of critics.

I say apparently because the comments attributed to Hulme appear in a short on-line comment on a blog post that described an interview of the UK Secretary of State for Energy and Climate Change. I assume Hulme must be the actual author of the comment because no one seems to be challenging that and he hasn’t disavowed it. 

Anyway, in the comment, Hulme (assuming its him!) acidly states:

Needless to say, the comment—because it comes from a figure of significant stature among proponents of aggressive policy engagement with the risks posed by climate change—has lifted the frenzy surrounding the latest “97%” study to new heights (most noticeably in dueling twitter posts, a form of exchange more suited for playground-style taunting than serious discussion).

What to say?

First, what a sad spectacle.  Honestly, it’s hard for me to conceive of an issue that could be further removed from the important questions here—ones involving what the best empirical evidence reveals about climate change and about the pathologies that make public debate impervious to the same—than whether the latest “97%” study is “sound.”

Second, I think Hulme’s frustration, while probably well-founded, is not as well articulated as it should be.  What exactly does he mean, e.g., when he says “public understanding of the climate issue has moved on”?  The statement admits of myriad interpretations, many of which would be clearly false (such as that polarization in the U.S., e.g., has abated). 

Of course, it's not reasonable to expect perfect clarity or cogency in 5-sentence blog comment. Hulme has written a very thoughtful essay in which he presents an admirably clear and engaging case against trying to buy public consensus in the currency of appeals to the authority of "scientific consensus." His argument is founded on the manifestly true point that science's way of knowing consists neither in nose counting nor appeals to authority--and to proceed as if that weren't so demeans science and makes the source of the argument look like a fool.

My position is slightly different from his, I think.

I'd say it makes perfect sense for the public to try to give weight to what they perceive to be the dominant view on decision-relevant science. Indeed, it's a a form of charming but silly romanticism to think that ordinary members of the public should "take no one's word for it" (nullius in verba) but rather try to figure out for themselves who is right when there are (as is inevitably so) debates over decision-relevant science.

Members of the public are not experts on scientific matters. Rather they are experts in figuring out who the experts are, and in discerning what the practical importance of expert opinion is for the decisions they have to make as individuals and citizens.  

Ordinary citizens are amazingly good at this.  Their use of this ability, moreover, is not a substitute for rational thought; it is an exercise rational thought of the most impressive sort.

But in a science communication environment polluted with toxic partisan meanings, the faculties they use to discern what most scientists believe are impaired.

The problem with the suggestion of the authors' of the latest "97%" study that the key is to "mak[e] the results of [their] paper widely-known" is that it diverts serious, well-intentioned people from efforts to clear the air of the toxic meanings that impede the processes that usually result in public convergence on the best available (and of course always revisable!) scientific conclusions about people can protect themselves from serious risks.

Indeed, as I indicated, the particular manner in which the "scientific consensus" trope is used by partisan advocates tends only to deepen the toxic fog of cultural conflict that makes it impossible for ordinary citizens to figure out what the best scientific evidence is. 

Meanwhile, time is “running out.”  On what? Maybe on the opportunity to engage in constructive policies on climate change.

But more immediately, time is running out on the opportunity to formulate a set of genuinely evidence-based strategies for promoting constructive engagement with the IPC’s 5th Assessment, which will be issued in installments beginning this fall. It will offer an authoritative statement of best current evidence on climate change. 

Much of what it has to say, moreover, will consist in important revisions and reformulations of conclusions contained in the 4th Assessment.

That’s inevitable; it is in the nature of science for all conclusions to be provisional, and subject to revision with new evidence.

In the case of climate change, moreover, revised assessments and forecasts can be expected to occur with a high degree of frequency because the science involved consists in iterative modeling of complex, dynamic systems—a strategy for advancing knowledge that (as I’ve discussed before) self-consciously contemplates calibration through a process of prediction & error-correction carried out over time.

My perspective is limited, of course. But from what I see, it is becoming clearer and clearer that those who have dedicated themselves to promoting public engagement with the best available scientific evidence on climate change are not dealing with the admittedly sensitive and challenging task of explaining why it is normal, in this sort of process, to encounter discrepancies between forecasting models and subsequent observations and to adjust the models based on them.  And why such adjustment in the context of climate change is causefor concluding neither that “the science was flawed” nor that “there is in fact nothing for anyone to be concerned about.”

Part of the evidence, to me, that they aren’t preparing to do this is how much time they are wasting instead debating irrelevant things like whether “97%” of scientists believe a particular thing.

p.s. Please don’t waste your & readers’ time by posting comments saying (a) that I am arguing there isn’t scientific consensus on issues of practical significance on climate change (I believe there is); (b) that I think it is “unimportant” for the public to know that (it’s critical that that it be able to discern this); or (c) that I am offering up no “alternative” to continuing to rely on a strategy that I say doesn’t work (not true; but if it were-- then what? I should nod approvingly if you propose that we all resort to prayer, too?).  Not only are none of these things either stated or implied in what I’ve written. They are mistakes that I’ve corrected multiple times (e.g., here, here, here . . .).




Dual process reasoning, Liberalism, & disgust

Interesting discussion ongoing in connection with Yoel Inbar's guest post Is Disgust a Uniquely "Conservative" Moral Emotion? I think the contributions made to it so far are more interesting than anything I have to say today, and I am loath to preempt additional contributions to that discussion. So today is an official "more discussion" day.

But just to give a sense of the nature of the matters being discussed, among the interesting questions that came up (in an exchange w/ Inbar initiated by Jon Baron)  is the relationship between the "disgust is conservative" thesis (DIC) and dual-process reasoning theories (DRT) in moral psychology.  Consider two possibilities:

A. The two could be combined. E.g., one could take the view (1) that moral reasoning is reliable & valid only when it is guided either exclusively by conscious reflection or by intuitive sensibilities including emotions the content of which would be validated by reflection; (2) that disgust is unreliable because either unreflective or, on reflection, not valid because on reflection not susceptible to validation by a normatively defensible moral theory; and (3) disgust is characteristically "conservative" either b/c conservatism is associated with a cognitive style hostile to cognitive reflection or b/c disgust involves moral appraisals that on reflection are "conservative"--or, more interestingly, illiberal in the sense of being antagonistic to key premises of Liberalism understood in the political philosophical sense.

B. Alternatively, one could separate DIC from DRT.  The validity of moral reasoning, on this account, doesn't depend on it involving or being validated by reflection. Indeed, one might believe that emotions and other "automatic," "intuitive," "unconscious," "perceptive" etc. forms of cognition play some indispensable role in moral reasoning-- a role that can't be reproduced by conscious reflection, etc. On this view, then, diverse moral styles would be distinguished not by the degree of reflection they involve, necessarily, but by the nature of the appraisals that are embodied in the emotions that those who subscribe to them use to size up goods and states affairs.  "Disgust" would be "conservative," this account would say, insofar as "disgust" reliably guides appraisals to the ones that fit the "conservative" moral style. But "liberals" would then be understood to be relying on some alternative emotion or set of emotions calibrated to generating "liberal" perceptions and related affective stances toward those same goods and states of affairs

Baron, as I understood him, was taking issue with Inbar on the assumption that Inbar subscribed to something like position A.  Inbar replied that he was somewhere closer to B -- or at least that he thought "liberals" as well as "conservatives" were relying on emotion to the same extent in their reasoning; he expressed uncertainty as to whether emotion is simply a heuristic substitution for reflection in moral reasoning or a unique and indispensable ingredient of it.

I had tried to identify scholars who clearly are committed to either A or B.  I proposed Martha Nussbaum for B.  For A, I suggested maybe John Jost, although in fact he has not (as far as I know) written about disgust. I suggested that I saw Haidt as sometimes A & sometimes B, although Inbar offered that he viewed Haidt as pretty clearly in the camp B.

As it turns out, I happened to read an excellent article yesterday that is pure, unadulterated A

We review evidence that disgust, in the context of bodily moral violations, differs from other emotions of moral condemnation, particularly anger, in three different senses of the word unreasoning. First, bodily moral disgust is weakly associated with situational appraisals, such as whether a behavior is harmful or justified.Instead, it tends to be based on associations with a category of object or act; certain objects are just disgusting. Second, bodily moral disgust is relatively insensitive to context, both in thoughts and behaviors, and therefore disgust is less likely to change from varying contexts. Third, bodily moral disgust is less likely to be justified with external  reasons; instead, persons often use their feelings of disgust as a tautological justification. These unreasoning traits can make disgust a problematic sociomoral emotion for a liberal society because it ignores factors that are important to judgments of fairness, such as intentionality, harm, and justifiability.

Very much worth reading! And further evidence, as Inbar emphasized in his excellent post, that debate in this area remains vibrant and ongoing.

There were other interesting issues under debate too, including regular commentator Larry's surmise that disgust is a kind of feigned strategic posturing on the part of "liberals."

I propose that additional comments -- I hope there will be some! -- be added to the existing trail originating in Yoel's post.


"Integrated & reciprocal": Dual process reasoning and science communication part 2

This is the second in what was to be a two-part series on dual process reasoning and science communication.  Now I’ve decided it must be three!

In the first, I described a conception of dual process reasoning that I don’t find compelling. In this one, I’ll describe another that I find more useful, at least for trying to make sense of and dispel the science communication problem. What I am planning to do in the 3rd is something you’ll find out if you make it to the end of this post.

A brief recap (skip down to the red type below if you have a vivid recollection of part 1):

Dual process theories (DPT) have been around a long time and come in a variety of flavors. All the various conceptions, though, posit a basic distinction between information processing that is largely unconscious, automatic, and more or less instantaneous, on the one hand, and information processing that is conscious, effortful, and deliberate, on the other. The theories differ, essentially, over how these two relate to one another.

In the first post I criticized one conception of DPT, which I designated the “orthodox” view to denote its current prominence in popular commentary and synthetic academic work relating to risk perception and science communication.

The orthodox conception, which reflects the popularity and popularization of Kahneman’s appropriately influential work, sees the “fast,” unconscious, automatic type of processing—which it refers to as “System 1”—as the default mode of processing. This conception, which you can find all over the place, goes like this:

System 1 is tremendously useful, to be sure. Try to work out the optimal path of evasion by resort to a methodical algorithm and you’ll be consumed by the saber-tooth tiger long before you complete your computations (etc).

But System 1 is also prone to error, particularly when used for assessing risks that differ from the ones (like being eaten by saber-tooth tigers) that were omnipresent at the moment of our evolutionary development during which our cognitive faculties assumed their current form.

Our prospects for giving proper effect to information about myriad modern risks—including less vivid and conspicuous but nevertheless highly consequential ones, like climate change; or more dramatic and sensational but actuarially less significant ones like those arising from terrorism or from technologies like nuclear power and genetically modified foods the benefits of which might be insufficiently vivid to get System 1’s attention—depends on our capacity, time, and inclination to resort to the more effortful, deliberate, “slow” kind of reasoning, which the orthodox account labels “System 2.”

This is the DPT conception I don’t like.

I don’t like it because it doesn’t make sense.

The orthodox position’s picture of “reliable” System 2 “monitoring” and “correcting” “error-prone” System 1 commits what I called the “System 2 ex nihilo fallacy”—the idea that System 2 crates itself “out of nothing” in some miraculous act of spontaneous generation.

Nothing makes its way onto the screen of consciousness that wasn’t instants earlier floating happily along in the busy stream of unconscious impressions.  Moreover, what yanked it from that stream and projected it had to be some unconscious mental operation too, else we face a problem of infinite regress: if it was “consciously” extracted from the stream of unconsciousness, something unconscious had to tell consciousness to perform that extraction.

I accept that the sort of conscious reflection on and re-assessment of intuition associated with System 2 truly & usefully occurs.  But those things can happen only if something in System 1 itself—or at least something in the nature of a rapid, automatic, unconscious mental operation—occurs first to get System 2's attention.

So the Orthodox DPT conception is defective. What’s better?

I will call the conception of DPT that I find more compelling “IRM,” which stands for the “integrated, reciprocal model."

The orthodox conception sees “System 1” and “System 2” as discrete and hierarchical.  That is, the two are separate, and System 2 is “higher” in the sense of more reliably connected to sound information processing.

“Discrete and hierarchical” is in fact clearly how Kahneman describes the relationship between the two modes of information processing in his Nobel lecture.

For him, System 1 and 2 are "sequential": System 1 operations automatically happen first; System 2 ones occur next, but only sometimes. So the two are necessarily separate. 

Moreover, what System 2 does when it occurs is check to see if System 1 has gotten it right. If it hasn’t, it “corrects” System 1’s mistake. So System 2 “knows better,” and thus sits atop the hierarchy of reasoning processes within an ordering that ranks their contribution to rational thought.

IRM sees things differently. It says that “rational thought” occurs as a result of System 1 and System 2 working together, each supplying a necessary contribution to reasoning. That’s the integrated part.

Moreover, IRM posits that the ability of each to make its necessary contribution is dependent on the other’s contribution. 

As the “System 2 ex nihilo” fallacy helps us to see, conscious reflection can make its distinctive contribution only if summoned into action by unconscious, automatic System 1 processes, which single out particular unconscious judgments as fit for the sort of interrogation that System 2 is able uniquely to perform.

But System 1 must be seletctive:  there are far too many unconscious operations going on for all to be monitored, much less forced onto the screen of conscious tought, which would be overwhelmed by such indiscriminate summoning! But in being selective, it has to pick out the "right" impressions for attention & not ignore the ones unreflective reliance on which would defeat an agent's ends.  

How does System 1 learn to perform this selection function reliably? From System 2, of course.

The ability to perform the valid conscious reasoning that consists in making valid inferences from observation, and the experience of doing so regularly, are what calibrate unconscious processes, and train them to select some for the attention System 2, which is then summoned to attend to them. 

When it is summoned, moreover, System 2 does exactly what the orthodox view imagines: it checks and corrects, and on the basis of mental operations that are indeed more likely to get the “right” answer than those associated with System 1.  That event of correction will itself conduce to the calibration and training of System 1.

That’s the reciprocal part of IRM: System 2 acts on the basis of signals from System 1, the capacity of which to signal reliably is trained by System 2.

I do not by any means claim to have invented IRM!  I am synthesizing it from the work of many brilliant decision scientists.

The one who has made the biggest contribution to my view that IRM, and not the Orthodox conception of DRT, is correct is the brilliant social psychologist Howard Margolis.

Margolis presented an IRM account, as I’ve defined it, in his masterful trilogy (see the references below) on the role that “pattern recognition” makes to reasoning. 

Pattern recognition is a mental operation in which a phenomenon apprehended via some mode of sensory perception is classified on the basis of a rapid, unconscious process that assimilates the phenomenon to a large inventory of “prototypes” (“dog”; “table”; “Hi, Jim!”; “losing chess position”; “holy shit—those are nuclear missile launchers in this aerial U2 reconaisance photo! Call President Kennedy right away!” etc).

For Margolis, every form of reasoning involves pattern recognition.  Even when we think we are performing conscious, deductive or algorithmic mental operations, we are really just manipulating phenomena in a manner that enables us to see the pattern in the manner requisite to an accurate and reliable form of unconscious prototypical classification. Indeed, Margolis ruthlessly shreds theories that identify critical thinking with conscious, algorithimic or logical assessment by showing that they reflect the incoherence I've described as the "System 2 ex nihilo fallacy."

Nevertheless, how well we perform pattern recognition, for Margolis, will reflect the contribution of conscious, algorithmic types of reasoning.  The use of such reasoning (particularly in collaboration with experienced others, who can vouch through the use of their trained pattern-recognition sensibilities that we are arriving at the “right” result when we reason this way) stocks the inventory of prototypes and calibrates the unconscious mental processes that are used to survey and match them to the phenomena we are trying to understand.

As I have explained in a previous post (one comparing science communication and “legal neutrality communication”), this position is integral to Margolis’s account of conflicts between expert and lay judgments of risk. Experts, through a process that involves the conscious articulation and sharing of reasons, acquire a set of specialized prototypes, and an ability reliably to survey them, suited to their distinctive task. 

The public necessarily uses a different set of prototypes—and sees different things—when it views the same phenomena.  There are bridging forms of pattern recognition that enable nonexperts to recognize who the “experts” are—in which case, the public will assent to the experts’ views (their “pictures,” really).  But sometimes the bridges collapse; and there is discord.

Margolis’s account is largely (and brilliantly) synthetic—an interpretive extrapolation from a wide range of sources in psychology and related disciplines.  I don’t buy it in its entirety, and in particular would take issue with him on certain points about the sources of public conflict on risk perception.

But the IRM structure of his account seems right to me.  It is certainly more coherent—because it avoids the ex nihilo fallacy—than the Orthodox view.  But it is also in better keeping with the evidence. 

That evidence, for me, consists not only in the materials surveyed by Margolis.  They include too work by contemporary decision scientists.

The work of some of those decision scientists—and in particular that of Ellen Peters—will be featured in Part 3.

I will also take up there what is in fact the most important thing, and likely what I should have started with: why any of this matters.

Any “dual process theory” of reasoning will necessarily be a simplification of how reasoning “really” works.

But so will any alternative theory of reasoning or any theory whatsoever that has any prospect of being useful.

Better than simplifications, we should say such theories are, like all theories in science, models of phenomena of interest.

The success of theories as models doesn’t depend on how well they “correspond to reality.”  Indeed, the idea that that is how to assess them reflects a fundamental confusion: the whole point of “modeling” is to make tractable and comprehensible phenomena that otherwise would be too complex and/or too remote from straightforward ways of seeing to be made sense of otherwise.

The criteria for judging the success of competing models of that sort are pragmatic: How good is this model relative to that one in allowing us to explain, predict, and formulate satisfying prescriptions for improving our situation?

In Part 3, then, I will also be clear about the practical criteria that make IRM conception so much more satisfying than the Orthodox conception of dual process reasoning.

Those criteria, of course, are ones that reflect my interest (and yours; it is inconceivable you have gotten this far otherwise) in advancing the scientific study of science communication--& thus perfecting the Constitution of the Liberal Republic of Science



Is Disgust a Uniquely "Conservative" Moral Emotion?

As the 14 billion regular readers of this blog know, I went through a period where I was obsessed with disgusting things. Not incest or coprophagia, or any of that mundane stuff but rather things like the "Crickett," the miniaturized but fully functional .22 rifle that is marketed under the logo "My first rilfe!," and that is intended to be purchased by parents for preadolsecent children (they come in a variety of styles featuring child-attractive motifs, like pink-colored laminated stocks meant to appeal to young girls) in order to introduce them to the wonders of a cultural style in which guns are symbols of shared commitments and also instruments or tools that enable various sorts of role-specific behavior that transmit and propagate commitment to that style.... People who harbor an opposing style say they are disgusted by the Crickett--and I see (feel) where they are coming from.  That place, moreover, is very remote from "conservative" political ideology or a "conservative" moral style, which Jonathan Haidt and others have identified in extremely important and appropriately influential work as uniquely (or at least disproportionately) associated with the use of "disgust" as a moral sensibility. Rather, they seem like the people who subscribe to the "liberal" moral style that, in the work of Haidt and others, makes no or at least less use of disgust as a form of moral appraisal and instead relies on perceptions of harm. The reaction to the Crickett--that it and the way of life in which it figures are disgusting (a reaction widely expressed in the aftermath of the widely covered tragic accidental shooting of a two-year old Kentucky girl by her Crickett-toting five-year old brother), seemed like evidence to me for a different position, one I associate with Mary Douglas and William Miller, who view disgust as a universal moral sensibility that adherents to diverse cultural systems across place and time make use of to focus their perception of the objects and behavior characteristic of opposing styles; and to motivate their denunciation of them, in terms that are strikingly illiberal in the sense of being disconnected to harm, which is imputed to behavior that offends the cultural norms of those experiencing this reaction...

Readers also know that one of my favorite strategies for advancing my own knowledge and that of others is to recklessly offer my own conjectures on matters such as this as a way of luring/provoking those who know more to respond & correct the myriad mistakes they see in my ruminations!  Well, I've succeed once again!  

Below is an amazingly thoughtful & penetrating response from Yoel Inbar. Inbar is a social psychologist whose work on disgust, which is broadly in alignment with the account I attributed to Haidt, is of tremendous quality and importance and central to ongoing scholarly discussion of the role of disgust in informing moral and related sensibilities.  He takes issue with me, of course! I am much smarter as a result of reading and thinking about his essay & offer it to my loyal readers so that they can enjoy the same benefit!

Is Disgust a Uniquely "Conservative" Moral Emotion?

Yoel Inbar

Yoel Inbar (left)Among politically liberal academics, the emotion of disgust has an unsavory reputation. The philosopher Martha Nussbaum has argued that disgust is wielded by privileged social groups to marginalize and dehumanize those of lower status, and indeed research has found that the disgust-prone are more negative towards immigrants, foreigners, and "social deviants." Furthermore, disgust seems to have a relationship with political conservatism: self-described political conservatives are more easily disgusted, and states where people are on average more disgust sensitive were (all else equal) more likely to go for McCain over Obama in the 2008 U.S. presidential election. A tempting conclusion for liberals might be that disgust is an irrational, immoral, and politically suspect emotion, at least when it is applied to morality. 

Yet the view that disgust as a moral emotion is only important to political conservatives has a problem: on its face, it seems obviously wrong. As Dan Kahan pointed out on this blog, political liberals often use the word "disgust" when talking about things they find immoral: liberals say they are disgusted by multi-million-dollar Wall Street bonuses, gun manufacturers who make weapons for 10-year-olds, racism, and lots of other things. Doesn't this mean that liberals are just as likely as conservatives to base their moral judgments on disgust? Perhaps (liberal) researchers are simply more likely to label moral positions that they disagree with as disgust-based (and therefore, by implication, irrational) while giving positions they agree with a free pass.

Although political bias in social psychology is a real problem, this objection misses a crucial difference between liberals and conservatives, namely what they find morally objectionable. There are some behaviors that are at least in theory harmless, but (for a lack of a better word) gross. For example, consider a man who, every Saturday, buys a whole chicken at the supermarket, masturbates into it, cooks it, and eats it for dinner (this wonderful and by now famous story was invented by Jon Haidt). Almost everyone finds this disgusting. However, most liberals will concede that despite being disgusting, having sex with a chicken and consuming it is not morally wrong, because no one is harmed (after all, the chicken is already dead). Many conservatives (although by no means all) will say that despite being harmless, this behavior is wrong--because it is disgusting. In fact, conservatives are more likely than liberals to say that many different kinds of disgusting-but-harmless behaviors are morally wrong. Unusual habits regarding food, hygiene, and (especially) sex are often seen by conservatives as immoral regardless of whether they directly harm anyone. And the emotion that people feel when contemplating these kinds of behaviors (which Haidt and his colleagues have called purity violations) is disgust. Certainly Western liberals may also feel disgusted when considering these behaviors, but they are often reluctant to call them immoral unless they can point to a victim--to someone who is directly harmed.

Of course, many people who morally object to (for example) certain kinds of sex between consenting adults claim that their objection is motivated by the putative harm caused by the behavior, not by the observer's queasy feelings. In such a case, how are we to know whether beliefs about harm caused the moral conviction, or whether they are merely post-hoc rationalizations of a (disgust-based) moral intuition? This is a difficult question, but there are several good reasons to think the latter answer is right: 1) When Jon Haidt and his collaborator, Matthew Hersh, asked liberals and conservatives to defend their views about the moral permissibility of anal sex between two men, conservatives but not liberals were likely to defend their beliefs even when they admitted they could not give (harm-based) justifications for them (a phenomenon Haidt has called moral dumbfounding); 2) in the same study, judgments of moral permissibility were statistically predicted by subjects' self-reported emotional reactions to imagining the acts in question, and not by their judgments of their harmfulness; 3) when people are asked directly about how much different considerations are relevant to deciding whether something is right or wrong, conservatives rate "whether someone violated standards of purity and decency" and "whether or not someone did something disgusting" as more morally relevant than do liberals.

What, then, of liberals who say they're disgusted by gun manufacturers or Goldman Sachs? Well, it turns out that "disgust" is a tricky term, at least in English--many laypeople use "disgusted" in a metaphorical sense, to mean "angry." As David Pizarro and I recently argued with one or two exceptions there's very little evidence that people are physically disgusted by immoral behavior that doesn't involve food, cleanliness, or sex. In fact, recent research by Roberto Gutierrez, Roger Giner-Sorolla, and Milica Vasiljevic suggests that people use the word "disgust" to mean physically disgusted when judging unusual sexual or dietary practices, but use the same word to mean something much closer to "angry" when judging instances of deceit or exploitation.  Of course, this is an area that's actively being researched at the moment, and this may change, but the balance of evidence so far suggests that when people use "disgust" to refer to their reactions to unfairness, exploitation, or violations of someone's rights, they are doing so metaphorically, not literally.

This is not to say that disgust qua disgust plays no role in liberals' moral judgments. For example, consider another story invented by Jon Haidt: Mark and Julie are siblings who are vacationing together in the south of France. One night, they decide that it would be fun and interesting if they tried making love. Julie is on birth control, but just to be safe Mark also uses a condom. They both enjoy the experience, but they decide not to do it again and to keep it a special secret between the two of them. Was this morally wrong? Here, liberals and conservatives seem equally likely to say "yes"--and equally unable to back up those judgments with harm-based justifications. When Jon Haidt and Matthew Hersh asked their undergraduate subjects about the moral permissibility of incest, they found that liberals were just as likely as conservatives to reject it, and just as likely to become morally dumbfounded when attempting to defend their judgments. For both liberals and conservatives, visceral disgust sometimes leads to moral revulsion, but this seems to be more common for conservatives. This is likely to be for two reasons: 1) conservatives are more readily disgusted in general; and 2) conservatives seem to be more comfortable pointing to feelings of disgust as a justification for moral beliefs (for example, conservative bioethicist Leon Kass's well-known argument for the "wisdom of repugnance."

Does this mean that liberals are better moral decision-makers than conservatives? After all, if conservatives base more of their moral judgments on disgust, an unreasoned emotion, and liberals base more of their moral judgments on whether someone was harmed or treated unfairly, doesn't this mean that liberals are more careful, thoughtful, and reasoned in their moral judgments? The answer is unambiguously no. There is no evidence that liberals are any less likely to base their moral judgments on (unreasoned) intuitions than conservatives, although liberals and conservatives do often rely on different moral intuitions. But what moral intuitions underlie the moral judgments of political liberals, and why these intuitions can be just as fallible as those of conservatives, are questions big enough to leave for a separate post.


"System 1" and "System 2" are intuitively appealing but don't make sense on reflection: Dual process reasoning & science communication part 1

“Dual process” theories of cognition (DPT) have been around a long time but have become dominant in accounts of risk perception and science communication only recently, and in a form that reflects the particular conception of DPT popularized by Daniel Kahneman, the Nobel Prize winning behavioral economist.

In this post--the first in a 2-part series-- I want to say something about why I find this conception of DPT unsatisfying.  In the next, I'll identify another that I think is better.

Let me say at the outset, though, that I don't necessarily see my argument as a critique of Kahneman so much as an objection to how his work has been used by scholars who study public risk perceptions and science communication.  Indeed, it's possible Kahneman would agree with what I'm saying, or qualify it in ways that are broadly consistent with it and that I agree improve it.

So what I describe as "Kahneman’s conception, while grounded in his own exposition of his views, should be seen as how his position is understood and used by scholars diagnosing and offering prescriptions for the pathologies that afflict public risk perceptions in the U.S. and other liberal democratic socieities.

This conception of DPT posits a sharp distinction between two forms of information processing: “System 1,” which is “fast, automatic, effortless, associative and often emotionally charged,” and thus “difficult to control or modify”; and “System 2,” which is “slower, serial, effortful, and deliberately controlled,” and thus “relatively  flexible and potentially rule-governed.” (Kahneman did not actually invent the “system 1/system 2” terminology; he adapted it from Keith Stanovich and Richard West, psychologists whose masterful synthesis of dual process theories is subject to even more misunderstanding and oversimplification than Kahneman's own)

While Kahneman is clear that both systems are useful, essential, “adaptive,” etc., System 2 is more reliably connected to sound thinking.  

In Kahneman’s scheme, System 1 and 2 are serial: the assessment of a situation suggested by System 1 always comes first, and is then—time, disposition, and capacity permitting—interrogated more systematically by System 2 and consciously revised if in error.

All manner of “bias,” for Kahneman, can in fact be understood as manifestations of people’s tendency to make uncorrected use of intuition-driven System 1 “heuristics” in circumstances in which the assessments that style of reasoning generates are wrong.

Human rationality is “bounded” (an idea that Kahneman and those who elaborate his framework take from the pioneer decision scientist Herbert Simon) but how perfectly individuals manifest rationality in their decisionmaking, on Kahneman’s account, reflects how adroitly they make use of the “monitoring and corrective functions of System 2” to avoid the “mistakes they commit” as a result of over-reliance on System 1 heuristics.

This account has attained something akin to the status of an orthodoxy in writings on public risk perception and science communication (particularly in synthetic works in the nature of normative and prescriptive “commentaries,” as opposed to original empirical studies).  Popular writers and even many scholars use the framework as a sort of template for explaining myriad public risk perceptions—from those posed by climate change and terrorism nuclear power and genetically modified foods—that, in these writers’ views, the public is over- or underestimating as a result of its reliance on “rapid, intuitive, and error-prone” System 1 thinking, and that experts are “getting right” by relying on methods (such as cost-benefit analysis) that faithfully embody the “deliberative, calculative, slower, and more likely to be error-free” assessments of System 2.

This is the account I don’t buy.

It has considerable intuitive appeal, I agree.  But when you actually slow down a bit and reflect on it, it just doesn’t make sense.

The very idea that "conscious" thought "monitors" and "corrects" unconscious mental operations is psychologically incoherent.

There is no thought that registers in human consciousness that wasn’t, an instant earlier, residing (in some form, but unlikely one that could usefully be described as a “thought” or at least anything with a concrete, articulable propositional content) in some element of a person’s “unconsciousness.”

Moreover, whatever yanked it out of the stream of unconscious “thought” and projected it onto the screen of consciousness also had to be an unconscious mental operation.  Even if we imagine (cartoonishly) that there was a critical moment in which a person consciously “noticed” a useful unconscious “thought” floating along and “chose” to fish it out, some unconscious cognitive operation had to occur prior to that for the person to “notice” that thought, as opposed to the literally infinite variety of other alternative stimuli, inside the mind and out, that the person could have been focusing his or her conscious attention on instead.

Accordingly, whenever someone successfully makes use of the “slower, serial, effortful, and deliberately controlled” type of information processing associated with System 2 to “correct” the “fast, automatic, effortless, associative and often emotionally charged” type of information processing associated with System 1, she must be doing so in response to some unconscious process that has reliably identified the perception at hand as one in genuine need of conscious attention.

Whatever power “deliberative, calculative, slower,” modes of conscious thinking have to "override" the mistakes associated with the application of “rapid, intuitive, and error-prone” intuitions about risk, then, necessarily signify the reliable use of some other form of unconscious or pre-conscious mental operations that in effect “summon” the faculties associated with effortful System 2 information processing to make the contribution that they are suited to making to information processing.

Thus, system 2 can’t only reliably “monitor” and “correct” System 1 (Kahneman’s formulation) unless System 1 (in the form of some pre-conscious, intuitive, affective, automatic, habitual, uncontrolled etc mental operation) is reliably monitoring itself.

The use of System 1 cognitive processes might be integral to the “boundedness” of human rationality.  But how close anyone can come to perfecting rationality necessarily depends on the quality of those very same processes.

The problem with the orthodox picture of deliberate, reliable conscious,"System 2" checking impetuous, impulsive "System 1" can be called the “system 2 ex nihilo fallacy”: the idea that the form of conscious, deliberate thinking one can use to "monitor" and “correct” automatic, intuitive assessments just spontaneously appears—magically, “out of nothing,” and in particular without the prompting of unconscious mental processes—whenever heuristic reasoning is guiding one off the path of sound reasoning.

The “System 2 ex nihilo fallacy” doesn’t, in my view, mean that dual process reasoning theories are “wrong” or “incoherent” per se.

It means only that the truth that such theories contain can’t be captured by a scheme that posits the sort of discrete, sequential operation of “unconscious” and “conscious” thinking that is associated with the view I’ve been describing—a conception of DPT that is, as I’ve said, pretty much an orthodoxy in popular writing on public risk perception and science communication.

In part 2 of this series, I’ll suggest a different conception of DPT that avoids the “System 2  ex nihilo fallacy.”

It is an account that is in fact strongly rooted in focused study of risk perception and science communication in particular.  And it furnishes a much more reliable guide for the systematic refinement and extension of the study of those phenomena than the particular conception of DPT that I have challenged in this post.

Kahneman, D. Maps of Bounded Rationality: Psychology for Behavioral Economics. Am Econ Rev 93, 1449-1475 (2003).

Simon, H.A. Models of bounded rationality (MIT Press, Cambridge, Mass.; 1982).

Stanovich, K.E. & West, R.F. Individual differences in reasoning: Implications for the rationality debate? Behavioral and Brain Sciences 23, 645-665 (2000).

Sunstein, C.R. Laws of Fear: Beyond the Precautionary Principle. (Cambridge University Press, Cambridge, UK ; New York; 2005).



A measured view of what can be validly measured with M Turk samples

As the nation continues to be convulsed by polarized debate and street demonstrations following last week's publication of Chandler, Mueller & Paolacci, Nonnaïveté among Amazon Mechanical Turk workers: Consequences and solutions for behavioral researchers, Behavioral Methods (advance on-line 2013), CCP Blog is proud to present ... an EXCLUSIVE guest post from Jesse Chandler, lead author of this important article! Jesse offers his views on the critique I posted on the validity of MT samples for studying the interaction of culture, cognition, and perceptions of risk and other policy-relevant facts.

I wanted to elaborate a bit on some of the issues that Dan raised, amplifying some of his concerns about generalizability, but also mounting something of a defense of Mechanical Turk workers, and their use in research. In brief, I wanted to reinforce the point that he made about understanding the sample that is used and consciously deciding what inferences to make from it. However, I also wanted to push back a bit on his claim that MTurk is demonstrably worse than other methods of collecting data. Along the way, I also have to dispute his characterization of MTurk workers as liars and frauds.

MTurk is not representative, but it is more representative than many samples researchers currently use 

As Dan notes, Mechanical Turk is, in principle, fine as a sample for any research study for which college students are currently deemed “representative enough” (which is a larger proportion of the social sciences than the site’s readers may appreciate).  If anything, MTurk samples are more representative than other convenience samples, and discovering that a finding observed among college students is robust in a different and more homogeneous population  is useful. 

Moreover, in the social sciences it should generally be assumed that any observed process generalizes unless there is a reason to think that it does not (nothing insidious here, just Occam’s razor). If a researcher believes that a finding would not replicate on another population, then they should try to replicate it across both samples and compare results. Ideally, they have a reason why they expect the populations to differ that they can articulate, operationalize and use in meditational analysis. In other words, concerns about the validity of findings on MTurk represent an opportunity to advance theory, not a reason to dismiss findings out of hand.  

Know thy sample 

Perhaps more importantly, I think Dan is spot on in emphasizing the importance of understanding the sample one is using and the question being asked. “Representative enough” is clearly not suitable for some research questions, and some inferences do not logically follow from non-representative samples. Likewise, for researchers interested in specific populations, MTurk results may vary. Some populations (like conservatives) may be missing or underrepresented in this sample, which is bad for Dan. Other populations, like the unemployed,underemployed and socially anxious may be over-represented  which is great for someone else. For researchers with limited budgets who work at homogeneous colleges, some populations, like people from other cultures or who speak other languages may only be available on MTurk. 

Another closely related point Dan alludes to that I also want to reemphasize is that the constituents of a particular MTurk sample cannot be taken for granted. Workers are not randomly selected from the pool of available workers and assigned to studies. They choose what they want to participate in. While there are ways to convert selection bias based on study content into attrition (e.g. by placing the consent form after workers accept the HIT), other procedural factors may influence who completes a HIT. We show, for example, that if a HIT makes it onto Reddit, the sample can end up much younger and disproportionately male. It is likely that sample characteristics may also depend on other variables including the requester’s reputation, the sample size, payment and the minimum reputation of the recruited workers (none of which have been thoroughly studied).

It is important to collect relevant variables from participants directly, rather than only appealing to the demographic characteristics collected by other researchers. Very simple demographic differences can fundamentally change point estimates on survey responses. As Dan notes, MTurk is overwhelmingly pro-Obama. There might be a complicated reason for this, but it may also reflect the fact that  American MTurk workers are more likely to be young, lower income, and female - all of these demographic characteristics predict more support for Obama.

Dan thinks the Internet is full of weirdos and frauds.

Despite agreeing with the spirit of Dan's comments, I have to take issue with his argument that Mechanical Turk workers are more likely to engage in immoral behavior than other samples, and thus MTurk samples are inferior to other kinds of panel data. 

I take particular issue with these claims because the take home implication from them is that data provided by MTurk workers is less credible, not because the workers are a non-representative population, but because the data are more likely to fabricated than data obtained from other sources. If this were true, this issue of internal validity would be a far more serious threat to the usefulness of findings on MTurk and would call into question all data collected on it. However, there is little evidence to suggest these concerns are true. These are comparative arguments for which comparative data does not exist, and often even the data for MTurk itself is missing or misleading. 

Yes, adult content exists on MTurk, but workers must opt in to view HITs that may contain adult content (including flagging it for removal from non-adult sites). Around 80 000 workers have opted to do so. We don’t know how many workers actually view this content, let along how this proportion compares to the population of Internet users who watch adult content.

Yes, some workers probably intend to engage in fraudulent behavior on MTurk. Again, we don’t know how many workers do this. Dan notes that a large proportion of posted HITs commit fraud, in the sense that they ask requesters to “like” social media posts contrary to Amazon’s ToS. Taking this as evidence for worker fraud relies on the assumptions that i.) these HITs are actually completed, ii.) by workers in general and not just a subsample of super productive fraudsters (analogous to our research superturkers), that iii.) there is overlap between the sample that completes spam HITs and research HITs and iv.) that workers even understand that this is a fraudulent activity (Dan read the Amazon’s terms of service, but hey, he is a lawyer). 

Another variation of the argument that workers are somehow fundamentally strange comes from the question “who would work for $1.50 an hour?” If I had to guess who works for these low wages, I would say that it is the large number of long term unemployed and other people living at the margin in a country muddling through an economic catastrophe. Although MTurk pays little, the money it does pay is at the margin. Moreover, there may be good reasons why workers accept low wages:  MTurk work is flexible enough to be completed in slack time, and accommodate other life commitments (for a discussion see here). Also, we live in a country where people pay to click mice repeatedly. Knowing this, it is not so surprising that people will do the same to earn money. I would not be surprised though if different workers had different reserve wages, and if sample characteristics changed as a function of wages, or in response to external economic conditions.

Workers are people. Don’t be surprised if they act like… people

Problems with worker data quality do not need to be explained by pathologizing workers. Many of the issues that vex researchers could arise from workers acting basically like ordinary people.

Workers will lie or distort the truth if incentivized to do so. Indeed, research shows that MTurk workers lie for money (see here), but a close reading of the paper will show that they may lie less than “real world” participants who participated in similar studies on participant honesty. This may explain why workers misreport US residency. US workers are paid in cash, those in many other countries are paid in Amazon credit.

Workers like other people are forgetful and workers who “refuse” to opt out of studies they have already completed should surprise nobody. Large proportions of people forget things like spending the night in a hospital or being the victim of a violent crime (see here), all of which are more important to their lives then Study 3 of your dissertation. Researchers who want to avoid duplicate workers (they should) should make life easy for both workers and themselves by preventing duplicates automatically.

It is true that you cannot know what work workers have completed for other researchers, but these concerns can be greatly reduced if researchers took the time to create their own stimuli. I am sometimes surprised at the laziness of researchers. Gabriele Paolacci and I used a simple attention check (“have you ever had a fatal heart attack”) once, three years ago. We mentioned this in a paper and it shows up verbatim all the time on MTurk. The Oppenheimer “Instructional Manipulation Check” is also frequently copied verbatim. Seriously. Stop. It. Now. 

If there is one thing that workers hate, it is negative feedback. This means they will generally bend over backwards to accommodate requesters. They generally understand that researchers do not like people talking about the contents of their HITs and try to avoid this. When they do communicate information, they seem to assume that the details they reveal will not matter, and methodologically problematic slips (e.g. discussing information in one condition but not another) are inadvertent. However, they also hate it when requesters reject their work because they failed an “attention check.” From a worker’s perspective, this probably feels unfair in only the way that an elite private school refusing to give out nickels can. Oh, and this problem is not unique to MTurk, sharing information for mutual benefit happens in college samples too.

Are Panels any Good?

All of these concerns about the quality of data collected on MTurk assume that workers are somehow different from respondents in other sample pools, and that these issues will simply go away if only data were collected somewhere else. This may be true, but how much do we really know about panel respondents and panel data quality? It is unfair to compare observed data in MTurk against a Platonic ideal of a survey panel. If MTurk workers lie to be eligible for studies (like our malingerers), why wouldn’t panel members lie for yet larger incentives? Likewise, if we are going to worry that MTurk samples are not representative because workers look at naked people in the internet, then perhaps we should worry about whether panels built using random digit dialing are representative, given that almost every normal person  screens their calls

Researchers who use other pay panels should be as critical toward these samples as Dan would like us to all be toward Mechanical Turk. Paid sources vary a lot in methodology and it is likely that beyond differences in how they are supposed to be constructed, there are yet larger differences in how the panel design is executed. Research always seems cleaner when you don’t know how the sausage gets made. Dig deep. Get worried. While data quality, representativeness and honesty may be issues that are particularly salient for MTurk samples, we (as in social scientists who are not survey research methodologists) may simply know more about their issues because the sample is relatively transparent and somebody happened to look.

The Take Home Message

In sum, Dan notes issues with Mechanical Turk that I agree are potential problems. However, I think the most important lessons that can be drawn from this discussion are what questions to ask about our hypotheses and our sample, and how to collect data from them, rather than who to collect data from. Further the solutions to the problems he identifies lie ultimately in better research design, with or without finding a better sample population. 




Proof of ideologically motivated reasoning--strong vs. weak

A couple of weeks ago I posted the abstract & link to Nam, Jost & Van Bavel’s “Not for All the Tea in China!” Political Ideology and the Avoidance of Dissonance, and asked readers to comment on whether they thought the article made a good case for the “asymmetry thesis.”

The "asymmetry thesis"—a matter I’ve actually commented on about a billion times on this blog (e.g., herehereherehere,
 here  . . .)—is the claim that individuals who subscribe to a conservative or “right-wing” political orientation are uniquely or disproportionately vulnerable to closed-minded resistance to evidence that challenges their existing beliefs. 

The readers' responses were great.

Well, I thought I’d offer my own view at this point.  

I like the study. It's really interesting.  

Nevertheless, I don't think it supplies much if any additional evidence for treating the asymmetry thesis as true than one would have had before the study. Consequently, if one didn't find the thesis convincing before (I didn't), then NJV-B doesn't furnish much basis for reconsidering.

One reason the study isn't very strong is that NJV-B relied on a Mechanical Turk sample.  I just posted a two-part set of blog entries explaining why I think MT samples do not support valid inferences relating to cultural cognition and like forms of motivated reasoning.

But even leaving that aside, the NJV-B study, in my view, rests on a weak design, one that defeats confident inferences that any ideological “asymmetries” observed in the study correspond to how citizens engage real-world evidence on climate change, gun control, the death penalty, health care, or other policies that turn on contested empirical claims.

NJV-B purported to examine whether “conservatives” are more averse to “cognitive dissonance” than “liberals” with respect to their respective political positions—a characteristic that would, if true, suggest that the former are less likely to expose themselves to or credit challenging evidence.

They tested this proposition by asking subjects to write “counterattitudinal essays”—ones that conflicted with the positions associated with subjects’ self-reported ideologies—on the relative effectiveness of Democratic and Republican Presidents.  Democrats were requested to write essays comparing Bush favorably to Obama, and Reagan favorably to Clinton; Republicans to write ones comparing Obama favorably to Bush, and Clinton favorably to Reagan.

They found that a greater proportion of Democrats complied with these requests. On that basis, they concluded that Republicans have a lower tolerance for actively engaging evidence that disappoints their political predispositions.

Well, sure, I guess.  If the two groups had demonstrated an equal likelihood to resist writing such essays, then I suppose that would count as evidence of “symmetry,” so their unwillingness to do so by the same token is evidence the other way.

The problem is that it’s not clear that the intensity of the threat that the respective tasks posed to Republicans’ and Democrats’ predispositions was genuinely equal.  As a result, it’s not clear whether the “asymmetry” NJV-B observed in the willingness of the subjects to perform the requested tasks connotes a comparable differential in the disposition of Democrats and Republicans to engage open-mindedly with evidence that challenges their views in real-world political conflicts.

By analogy, imagine I hypothesized that Southerners were lazier than Northerners. To test this proposition, I asked Southerners to run 5 miles and Northerners to do 50 sit-ups. Observing that a greater proportion of Northerners agreed to my request, I conclude that indeed Southerners are lazier—more averse to physical and likely all other manner of exertion—than Northerners are.

This is obviously bogus.  One could reasonably suspect that doing 50 sit-ups is less taxing than running 5 miles. If so, then we’d expect agreement from fewer members of a group of people asked to do the former than from members of a group asked to do the latter—even if the two groups’ members are equally disposed to exert themselves.

Well, is it as “dissonant” for a Democrat to compare Bush favorably to Obama, and Reagan favorably to Clinton, as it is for a Republican to compare Obama favorably to Bush and Clinton favorably to Reagan? 

I think we could come up with lots of stories—but the truth is, who the hell knows? We don’t have any obvious metric by which to compare how threatening or dissonant or “ideologically noncongruent” such tasks are for the respective groups, and hence no clear way to assess the probative significance of differences in the willingness of each to engage in the respective tasks they were requested to perform.

So, sure, we have evidence consistent with “asymmetry” in NJV-B—but since we have no idea what weight or strength to assign it, only someone motivated to credit the “asymmetry” thesis could expect a person who started out unconvinced of it to view this study as supplying much reason to change his or her mind, given all the evidence out there that is contrary to the asymmetry thesis.

The evidence contrary to the asymmetry thesis rests on study designs that don’t have the sort of deficiency that NJV-B displays.  Specifically, the studies I have in mind use designs that measure how individuals of diverse ideologies assess one and the same item of evidence, and show that they are uniformly disposed to credit or discredit it selectively, depending on whether the researcher has induced the study subjects to believe that the piece of evidence in question supports or challenges, affirms or threatens, a position congenial to their respective group commitments.

One example involved the CCP study featured in the paper They Saw a Protest. There, subjects, acting as jurors in a hypothetical trial, were instructed to view a videotape of a political protest and determine whether the demonstrators physically threatened bystanders. Half the subjects were told that the demonstrators were anti-abortion activists protesting outside of an abortion clinic, and half that they were pro-gay/lesbian activists protesting “don’t ask, don’t tell” outside of a military recruitment center.

We found that what “Republicans” and “Democrats” alike reported seeing—protestors “blocking” and “screaming” in the face of “fearful” bystanders or instead noncoercive advocacy inducing shame, embarrassment, and resentment among those seeking to enter the facility—flipped depending on which type of protest they believed they were watching.

Are Republicans and Democrats (actually, we used cultural worldview measures, but also reported the results using partisan self-identification, too) “equally” invested in their respective positions on abortion and gay rights?

I don’t know.  But I don't need to in order to draw inferences from this design.  For however strongly each feels, they both were equally prone to conform their assessment of evidence to the position that was most congenial to their ideologies.

That’s evidence of symmetry in motivated reasoning. And I think it is pretty darn strong.

I’ve addressed this point more generally in previous posts that describe what counts as a “valid” design for an ideologically motivated reasoning experiment. In those posts, I’ve shown how motivated reasoning relates to a Bayesian process of information processing

Bayesianism describes the logical operations necessary for assimilating new information or evidence with one’s existing views (which themselves reflect an aggregation of all the other evidence at one’s disposal).  Basically, one revises (updates) one’s existing view of the probability of a proposition (or hypothesis) in proportion to how much more consistent the new evidence is with that proposition as opposed to some other, alternative hypothesis—a property of the information known as the “likelihood ratio” (a ratio of how likely the proposition is to be true given the evidence and how likely it is to be false given the evidence).

In Bayesian terms, the reasoning deficiency associated with motivated reasoning consists in the opportunistic adjustment of the likelihood ratio.  When they display ideologically or culturally motivated reasoning, individuals treat the new information or evidence as “more consistent” or “less consistent” with the proposition in question (the film shows the protestor “blocked entry to the building” or instead “made an impassioned verbal appeal”) depending on whether the proposition is one that gratifies or disappoints their motivating ideological or cutlural commitments.

When people's reasoning reflects motivated cognition, their ideological commitments shape both their prior beliefs and the likelihood ratio they attach to new evidence.  As a result, they won't update their “prior beliefs” based on “new evidence,” but rather assign to new evidence whatever weight best "fits" their ideologically determined priors.  

Under these conditions, ideologically diverse people won’t converge in their assessments of a disputed fact (like whether the earth is heating up as a result of human CO2 emissions), even when they are basing their assessments on the very same evidence.

The study in They Saw a Protest involved a design aimed at testing whether individuals do this.  The information that the subjects received--the images displayed in the video--were held constant, while the ideological stake the subjects had in giving that information effect with respect to whether the protestors resorted to physical intimidation was manipulated.

The study found that subjects gave selective effect to the evidence--opportunistically adjusted the likelihood ratio in Bayesian terms--in a manner that gratified their ideologies.  Moreover, they did that whether their outlooks were "liberal" or "conservative."

So again, I believe that’s convincing evidence of “symmetry” in the vulnerability of ideologically diverse citizens to motivated reasoning--evidence that is a lot more probative (has a much higher likelihood ratio, in Bayesian terms!) than what NJV-B observed in their study given the relative strength of the respective study designs.

Nor is our Saw a Protest study the only one that used this kind design to look at ideologically motivated reasoning. In a companion follow-up post, I’ll identify a variety of others, some by CCP researchers and some by others, that use the same design and reach the same conclusion.

All the studies I am aware of that use this design for testing motivated reasoning (one, again, that manipulates the ideological motivation that subjects have to credit or discredit evidence, or opportunistically adjust the "likelihood ratio" they assign to one and the same piece of information) reach the conclusion that ideologically motivated reasoning is symmetric.

The only studies that support the asymmetry thesis are ones that use designs that either are not valid or that suffer from a design limitation that defeats reliable comparison of the reasoning styles of subjects of opposing predispositions.

NJV-B is in the latter category. As a result, I give it a likelihood ratio of, oh, 1.001 in support of the asymmetry thesis.

Some references 



Let's keep discussing M Turk sample validity

Lots of great discussion on “Fooled twice, shame on who?,” part 2 of the 2-part set on validity of M Turk samples for study of individual differences in cognition rooted in ideological, cultural & like dispositions.  Indeed, some of the discussion appears over at Stats Legend Andrew Gelman’s Legendary Statistical Modeling, Causal Inference, and Social Science blog.

The comments make for more interesting reading than anything I would have to say today, and maybe others will want to add to them.

But here are some of the interesting points that have come up & that have furnished me w/ reason to reflect on whether & how what I had to say warrants refinement, qualification, revision etc:

1. Contextualization

I wanted to be clear that the sort of “sample validity” issue I was raising about M Turk was specific to study of a particular class of psychological dynamics—the ones that I myself am most interested in—involving the interaction of critical thinking dispositions and the sort of group commitments that typically are assessed with measures of ideology, cultural worldviews & the like. That was why I broke my discussion into two posts, the first of which stressed that “sample validity” shouldn’t be identified with some checklist of abstract properties like “representativeness” but instead addressed in a fine-grained manner aimed at determining whether subjects selected in a particular fashion support reliable and valid inferences about the psychological dynamics being investigated.

But I’m convinced I didn’t do a good enough job on this. 

Part of what made me realize that was a comment by Hal Pashler in the discussion at Statistical Modeling, Causal Inference. Pashler argued convincingly that researchers had through careful testing confirmed the validity of M Turk samples for a range of fundamental cognitive dynamics (primarily ones involving rapid, automatic processing of visual stimuli).

I fully accept this and agree with the overall thrust of Pashler's comment! But the need for him to make it (in part in response to the course of the discussion at the SMCI blog) was proof to me that I had failed—in part by having neglected to identify dynamics that differ in relevant respects from the one I was focusing on (again, the influence of group values in assessment of evidence on societal risks & related policy-relevant facts) & that as a result might well admit of valid study w/ M Turk samples.

So: avoid generalization; determine “sample validity” by focusing on the particular issues relevant to determining whether reliable, valid inferences can be drawn from any given sample about the psychological dynamic under investigation; and recognize, then, that M Turk samples might be “valid” for some purposes and not others.  Check!

2. Validation of “partisan typicality”

One of the main reasons I don’t regard M Turk samples as valid for studying individual differences in cognition related to ideology is that I think there is reason to believe the self-described “conservatives” who are participating in M Turk samples are not typical of self-described conservatives in the general population.

Solomon Messing convincingly pointed out that the way to address this is to look at studies that address exactly that by comparing how MT subjects respond to questions in relation to how ones included in familiar samples such as those in American National Election Studies surveys—and he cited studies that do exactly that (here & here).

He’s right; I’m eager to read those papers.

Jarret Crawford amplified this point, referring to studies he’s done (here & here; I have read those; they are excellent & reflect ingenious designs; I’ve been meaning to run a blog post on them!) that furnish evidence of the “symmetry” of motivated reasoning in conservatives & liberals, a convergence with non-MT sample studies that ought to give us more confidence in MT samples (provided, of course, the designs of the studies are valid).

I have a hunch that the Messing & Crawford responses demonstrate that even in assessing the validity of M Turk for studying public opinion & political partisanship, one needs to be very precise about the fit between MT samples and the kinds of hypotheses being tested.  But in any case, they show I need to think more.  Good.

3. “Fixing” M Turk

Messing also discusses the possibility that the defects in M Turk samples might be “fixed” with some appropriate types of protocols, a matter that Chandler, Mueller & Paolacci address in their new study.

This is indeed a point that merits further discussion.  As I suggested in some of my own responses, I think what CMP suggest needs to be done actually can’t be feasibly expected to happen. 

In effect, to avoid the “repeat exposure” of MT subjects to cognitive-performance measures, there would have to be a “central registry” that would keep track of all the ID numbers of MT “workers” who have participated in social science studies and the measures that have been administered to them.

Who is going to set up this registry? Who will administer it? How will compliance of researchers with the registry be monitored and enforced?

Don't look at Amazon! It’s not a survey firm & could care less about whether MT workers furnish a valid source of subjects for social science research or, if they do at t1, about making sure they continue to at t2, t3, . . . tn.

Even if we started the registry today, moreover, we still wouldn't know whether the "newly registered" M Turk subjects hadn't participated already in studies featuring CRT and other common measures.

And what do we do now, as we wait for such a registry to be created? Should researchers be continuing to use M Turk for studies featuring measures the validity of which is compromised by prior exposure? And should journals be continuing to accept such studies?

* * * *

So still plenty more to discuss! Add your own thoughts ( in the discussion thread following the “Fooled Twice” post)!


Fooled twice, shame on who? Problems with Mechanical Turk study samples, part 2

From Mueller, Chandler, & Paolacci, Soc'y for P&SP, 1/28/12This is the second post in a two-part series on what I see as the invalidity of studies that use samples of Mechanical Turk workers to test hypotheses about cognition and political conflict over societal risks and other policy-relevant facts.

In the first, I discussed the concept of a “valid sample” generally.  Basically, I argued that it’s a mistake to equate sample “validity” with any uniform standard or any single, invariant set of recruitment or stratification procedures.

Rather, the validity of the sample depends on one thing only: whether it supports valid and reliable inferences about the nature of the psychological processes under investigation.

College student samples are fine, e.g., if the dynamic being studied is reasonably understood to be uniform for all people.

A nonstratified general population sample will be perfectly okay for studying processes that vary among people of different characteristics so long as (1) there are enough individuals from subpopulations whose members differ in the relevant respect and (2) the recruitment procedure didn’t involve methods that might have either discouraged participation by typical members of those groups or unduly encouraged participation by atypical ones.

Indeed, a sample constructed by methods of recruitment and stratification designed to assure “national representativeness” might not be valid (or at least not support valid inferences) if the dynamic being studied varies across subgroups whose members aren’t represented in sufficient number to enable testing of hypotheses relating specifically to them.


Now I will explain why, on the basis of this pragmatic understanding of what sample validity consists in, MT samples aren’t valid for the study of culturally or ideologically grounded forms of “motivated reasoning” and like dynamics that it is reasonable to believe account for polarization over climate change, gun control, nuclear power, and other facts that admit of empirical study.

I don’t want to keep anybody in suspense (or make it necessary for busy people to deal with more background than they think they need or might already know), so I’ll just start by listing what I see as the three decisive “sample validity” problems here. I’ll then supply a bit more background—including a discussion of what Mechanical Turk is all about, and a review of how this service has been used by social scientists—before returning to the three validity issues, which I’ll then spell out in greater detail

Ready? Here are the three problems:

1.  Selection bias.  Given the types of tasks performed by MT workers, there’s good reason to suspect subjects recruited via MT differ in material ways from the people in the world whose dispositions we are interested in measuring, particularly conservative males.

2.  Prior, repeated exposure to study measures.  Many MT workers have participated multiple times in studies that use performance-based measures of cognition and have discussed among themselves what the answers are. Their scores are thus not valid.

3.  MT subjects misrepresent their nationality.  Some fraction of the MT work force participating in studies that are limited to “U.S. residents only” aren't in fact U.S. residents, thereby defeating inferences about how psychological dynamics distinctive of U.S. citizens of diverse ideologies operate. 

That’s the short answer. Now some more detail.

AWhat is MT? To start, let’s briefly review what Mechanical Turk is—and thus who the subjects in studies that use MT samples are.

Operated by, MT is essentially an on-line labor market.  Employers, who are known as “requesters,” post solicitations for paid work, which can be accepted by “workers,” using their own computers.

Pay is very modest: it is estimated that MT workers make about $1.50/hr.

The tasks they perform are varied: transcription, data entry, research, etc.

But MT is also a well-known instrument for engaging in on-line fraud.

MT workers get paid for writing fake product or service reviews—sometimes positive, sometimes negative, as the requester directs.

They can also garner a tiny wage for simply “clicking” on specified links in order to generate bogus web traffic at the behest of “requesters” who themselves have contracted to direct visitors to legitimate websites, who are in this case the victims of the scam.

These kinds of activities are contrary to the “terms of use” for MT, but that doesn’t restrain either “requesters” from soliciting “workers” or “workers” form agreeing to engage in them.

Another common MT labor assignment—one not contrary to MT rules—is the indexing of sex acts performed in internet pornography.

MT Requester solicitation for porn indexing, July 10, 2013

B. The advent of MT “study samples.” A lot of MT workers take part in social science studies.  Indeed, many workers take part in many, many such studies.

The appeal of using MT workers in one’s study is pretty obvious. They offer a reasearcher a cheap, abundant supply of eager subjects.  In addition, for studies that examine dynamics that are likely to vary across different subpopulations, the workers offer the prospect of the sort of diversity of characteristics one won’t find, say, in a sample of college students.

A while back researchers from a variety of social science disciplines published studies aimed at “validating” MT samples for research that requires use of diverse subjects drawn from the general population of the U.S. Encouragingly, these studies reported that MT samples appeared reasonably “representative” of the general population and performed in manners comparable to how one would expect members of the general public generally to perform.

On this basis, the floodgates opened, and journals of all types—including ones in elite journals—began to publish studies based on MT samples.

To be honest, I find the rapidity of the decision of these journals to embrace MT samples mystifying.  

Even taking the initial studies purporting to find MT samples “representative” at face value, the fact remains that Amazon is not in the business of supplying valid social science research samples.  It is in the business (in this setting) of brokering on-line labor contracts. To satisfy the booming demand for such services, it is constantly enrolling new “workers.”  As it enlarges its MT workforce, Amazon does nothing—zip—to assure that the characteristics of its “workers” won’t change in ways that make them unsuited for social science research.

In any case, the original papers—which reflect data that are now several years old—certainly can’t be viewed as conferring a “life time” certification of  validity on MT samples.  If journals care about sample validity, they need to insist on up-to-date evidence that MT samples support valid inferences relating to the matters under investigation.

The most recently collected evidence—in particular Chandler, Mueller, Paolacci (in press) [actually, now published!] & Shapiro, Chandler & Mueller (2013)—doesn’t justify that conclusion.  On the contrary, it shows very convincingly that MT samples are invalid, at least for studies of individual differences in cognition and their effect on political conflict in the U.S.

C.  Three major defects MT samples for the study of culturally/ideological motivated reasoning

1.  Selection bias

Whatever might have been true in 2010,  it is clear that the MT workforce today is not a picture of America.

MT workers are “diverse,” but are variously over- and under-representative of lots of groups.

Like men: researchers can end up with a sample that is 62% female.

African Americans are also substantially under-represented: 5% rather than the 12% they make up in the general population.

There are other differences too but the one that is of most concern to me—because the question I’m trying to answer is whether MT samples are valid for study of cultural cognition and like forms of ideologically motivated reasoningis that MT grossly underrepresents individuals who identify themselves as “conservatives.”

This is clear in the frequencies that researchers relying on MT samples report. In Pennycook et al. (2012),  e.g., 53% of the subjects in their sample self-identified as liberal and 25% identified as conservative.  Stratified national surveys (from the same time as this study) suggest that approximately 20% of the general population self-identifies as liberal and 40% as conservative.

In addition to how they “identify” themselves, MT worker samples don’t behave like ones that consisted of ordinary U.S. conservatives (a point that will take on more significance when I return to their falsification of their nationality).  In an 2012 Election Day survey, Richey & Taylor (2012)  report that “73% of these MTurk workers voted for Obama, 15% for Romney, and 12% for ‘Other’ ” (this assumes we can believe they were eligible to vote in the U.S. & did; I’ll get to this).

But the reason to worry about the underrepresentation of conservatives in MT samples is not simply that the samples are ideologically “unrepresentative” of the general population.  If that were the only issue, one could simply oversample conservatives when doing MT studies (as I’ve seen at least some authors do).

The problem is what the underrepresentation of conservatives implies about the selection of individuals into the MT worker “sample.” There’s  something about being part of the MT workforce, obviously, that is making it less appealing to conservatives.

Maybe conservatives are more affluent and don’t want to work for $1.50/hr.

Or maybe they are more likely to have qualms about writing fake product reviews or watching hours of porn and indexing various sex acts. After all,  Jonathan Haidt & others have found that conservatives have more acute  disgust sensibilities than liberals.

But in any case, since we know that conservatives by and large are reticent to join the MT workforce, we also can infer there is something different about the conservatives who do sign up from the ones who don’t.

What's different about them, moreover, might well be causing them to respond differently in studies from how ordinary conservatives in the U.S. population would.  There must be if we consider how many of them claim to have voted for Obama or a third-party candidate in the 2012 election!

If they are less partisan, then, they might not demonstrate as strong a motivated reasoning effect as ordinary conservatives would.

Alternatively, their decision to join the MT workforce might mean they are less reflective than ordinary conservatives and are thus failing to ponder the incongruity between indexing porn, say, and their political values.

For all these reasons, if one is interested in learning about how dispositions to engage in systematic information  processing are affected by ideology, one just can’t be sure that what we see in “MT conservatives” will generalize to the real-world population of conservatives.

I’ve seen one study based on an MT sample that reports a negative correlation between “conservativism” and scores on the Cognitive Reflection Test, the premier measure of the disposition to engage in conscious, effortful assessment of evidence—slow, “System 2” in Kahneman’s terms—as opposed the rapid, heuristic-driven, error-prone evidence neglectful sort (“System 1”).

That was the study based on the particular MT sample I mentioned as grossly overrepresenting liberals and underrepresenting conservatives.

I’ve collected data on CRT and ideology in multiple general population surveys—ones that were designed to and did generate nationally representative panels by using recruitment and stratification methods validated by the accuracy of surveys using them to predict national election results. I consistently find no correlation between ideology and CRT.

In short, the nature of the MT workforce—what it does, how it is assembled, and what it ends up generating—makes me worry that the underrepresentation of conservatives reflects a form of selection bias relative to the sort of individual differences in cognition that I’m trying to measure.

That risk is too big for me to accept in my own research, and even if it weren't, I'd expect it to be too big for many consumers of my work to accept were they made aware of the problem I'm identifying. 

BTW, the only other study I’ve ever seen that reports a negative correlation between conservativism and CRT also had serious selection bias issues.  That study used subjects enticed to participate in an experiment at an internet site that is targeted to members of the public interested in moral psychology. As an incentive to participate in the study, researchers promised to tell the subjects what their study results indicated about their cognitive style. One might think that such a site, and such an incentive, would appeal only to highly reflective people, and indeed the mean CRT scores reported for study participants (liberals, conservatives, and libertarians) rivaled or exceeded the ones attained by students at elite universities and were (for all ideological groups) much higher than those typically attained by members of the general public.   As a colleague put it, purporting to infer how different subgroups will score on the CRT from such a sample is the equivalent of a researcher reporting that “women like football as much as men” based on a sample of visitors to!

2. Pre- & multiple-exposure to cognitive performance measures

Again, isn’t in the business of furnishing valid study samples.  One of the things that firms that are in that business do is keep track of what studies subjects they recruit have participated in so that researchers won’t be testing people repeatedly with measures that don’t generate reliable results in subjects who’ve already been exposed to them.

The Cognitive Reflection Test fits that description.  It involves three questions, each of which seems to have an obvious answer that is in fact wrong; people disposed to search for and reflect on evidence that contradicts their intuitions are more likely to get those answers right.

But even the most unreflective, visceral thinker is likely to figure out the answers eventually, if he or she sees the questions over & over. 

That’s what happens on M Turk.  Subjects are repeatedly recruited to participate in studies on cognition that use the CRT and similar test of cognitive style.

What’s more they talk about the answers to such tests with each other.  MT workers have on-line “hangouts” where they share tips and experiences.  One of things they like to talk about are the answers to the CRT.  Another is why researchers keep administering an “intelligence test” (that’s how they interpret the CRT, not unreasonably) that we clearly know the answers to?

These facts have been documented by Chandler, Mueller, and Paolacci in an article in press [now out--hurry & get yours before news stand sells out!] in Behavior Research Methods.

Not surprisingly, MT workers achieve highly unrealistic scores on the CRT, ones comparable to those recorded among students at elite universities and far above those typically reported for general population samples.

Other standard measures relating to moral reasoning style--like the famous "trolley problem"--also get administered to and answered by the same MT subjects over & over, and discussed by them in chat forums.  I'm guessing that's none to good for the reliablility/validity of responses to those measures either.

As Chandler, Mueller, Paolacci note, 

There exists a sub-population of extremely productive workers which is disproportionately likely to appear in research studies. As a result, knowledge of some popular experimental designs has saturated the population of those who quickly respond to research HITs; further, workers who read discussion blogs pay attention to requester reputation and follow the HITs of favored requesters, leading individual researchers to collect fans who will undoubtedly become familiar with their specific research topics.

There’s nothing that an individual researcher can effectively do to counteract this problem.  He or she can’t ask Amazon for help: again, it isn’t a survey firm and doesn’t give a shit whether its workforce is fit for participation in social science studies.

The researcher can, of course, ask prospective MT “subjects” to certify that they haven’t seen the CRT questions previously.  But there is a high probability that the workers—who know that their eligibility to participate as a paid study subject requires such certification—will lie.

MT workers have unique id numbers.  Researchers have told me that they have seen plenty of MT workers who say they haven’t taken the CRT before but who in fact have—in those researchers’ own studies.  In such cases, they simply remove the untruthful subject from their dataset.

But these and other researchers have no way to know how many of the workers they’ve never themselves tested before are lying too when they claim to be one of the shrinking number of MT workers who have never been exposed to the CRT. 

So researchers who collect data on performance-based cognition measures from MT workers really have no way to be sure  that these very high-scoring subjects are genuinely super reflective or just super dishonest.

I sure wouldn’t use take a risk like this in my own research.  And I’m also not inclined to take the risk of being misled by relying on studies of searchers who have disregarded it in reporting how scores on CRT or other cognitive performance measures relate to ideology (or religion or any other individual difference of interest). 

3. Misrepresentation of nationality (I know who these guys are; but who are MT workers? I mean—really?)

Last but by no means least: Studies based on MT samples don’t support valid inferences about the interaction of ideology and cognition in polarizing U.S. policy debates because it’s clear that some fraction of the MT subjects who claim to be from the U.S. when they contract to participate in a study aren’t really from the United States.

This is a finding from Shapiro, Chandler and Muller (2013), who in a survey determined that a “substantial” proportion of the MT workers who are “hired” for studies with “US only” eligibility are in fact participating in them via foreign internet-service providers.  

I also know of cases in which researchers have detected MT subjects using Indian IP addresses participating in their "US only" studies. 

Amazon requires MT workers to register their nationality when joining the MT labor force. But because MT workers recognize that some “requesters” attach “US worker only” eligibility criteria to their labor requests, MT workers from other countries—primarily India, the second largest source of MT labor outside the U.S.—have an incentive to misrepresent their nationality. 

I'm not sure how easy this is to pull off since Amazon now requires US citizens to supply Social Security numbers and non-US citizens who reside in the US to supply comparable information relevant to tax collection.

But it clearly isn't impossible for determined, internet-savvy and less-than-honest people to do. 

Part of pulling off the impersonation of a US resident involves signing up for MT through an account at a firm that uses a VPN to issue US IP addresses to internet users outside the U.S.  Indeed, aspiring non-US MT workers have an even bigger incentive to do that now because Amazon, in response to fraudulent use of its services, no longer enrolls new non-US workers into the MT labor force.

Shapiro, Chandler & Muller recommend checking the IP addresses of subjects in “US only” studies and removing from the sample those whose IP addresses showed they participated from India or another country.

But this is not a very satisfying suggestion.  Just as MT workers can use a VPN to misrepresent themselves as U.S.-residents when they initially enroll in MT, so they can use a VPN to disguise the location from which they are participating in U.S.-only studies. 

Why wouldn’t they? If they didn’t lie, they might not be eligible to “work” as a study subjects--or work period if they signed up after the period in which Amazon stopped enrolling non-US workers. 

True, lying is dishonest.  But so are a great many of the things that MT workers routinely do for paying MT requesters.

Charmingly, Shapiro, Chandler and Muller (2013) also found that MT subjects, who are notorious for performing MT tasks at the office when they are supposed to be working, score high on a standard measure of the disposition to engage in “malingering.”

That’s a finding I have complete confidence in. Remember, samples that are not “valid” for studying certain types of dynamics can still be perfectly valid for studying others.

* * * *

The name for Amazon’s “Mechanical Turk” service comes from a historical episode in the late 18th century in which a con artist duped amazed members of the public into paying him a small fee for the chance to play chess against “the Turk”—a large, turban-wearing, pipe-smoking manikin who appeared to be spontaneously moving his own pieces with his mechanized arm and hand.

The profitable ruse went on for decades, until finally, in the 1820s, it was discovered that the “Turk” was being operated by a human chess player hidden underneath its boxy chassis.

Today social scientists are lining up to pay a small fee—precisely because it is so much smaller than what it costs to recruit valid general population sample—to collect data on Amazon’s “Mechanical Turk.”

But if the prying open of the box reveals that the subjects performing the truly astonishing feats of cognition being observed in these researchers’ studies are “malingering” college students in Mumbai posing as  “U.S. Democrats” and “Republicans” in between jobs writing bogus product reviews and cataloging sex acts in on-line porn clips, I suspect these researchers will feel more foolish than anyone who paid to play chess with the original “Turk.”

Some references

Berinsky, A. J., Huber, G. A., & Lenz, G. S. (2011). Using Mechanical Turk as a subject recruitment tool for experimental research. Political Analysis, 20(3), 351-368. 

Chandler, J., Mueller, P., & Paolacci, G. Methodological Concerns and Advanced Uses of Crowdsourcing in Psychological Research (in press) Behavior Research Methods.

Experimental Turk: a blog on social science experiments on Amazon Mechanical Turk

Mueller, Chandler & Paolacci, Advanced uses of Mechanical Turk in psychological research, presentation at Society for Personality & Social Psychology, Jan. 28, 2012.

Pennycook, G., Cheyne, J. A., Seli, P., Koehler, D. J., & Fugelsang, J. A. (2012). Analytic cognitive style predicts religious and paranormal belief. [doi: 10.1016/j.cognition.2012.03.003]. Cognition, 123(3), 335-346.

Richey, S,., & Taylor, B. How Representatives Are Amazon Mechanical Turk Workers? The Monkey Cage,(2012).

Shapiro, D. N., Chandler, J., & Mueller, P. A. (2013). Using Mechanical Turk to Study Clinical Populations. Clinical Psychological Science. doi: 10.1177/2167702612469015



What's a "valid" sample? Problems with Mechanical Turk study samples, part 1

It’s commonplace nowadays to see published psychology studies based on samples consisting of “workers” hired to participate in them via Amazon’s “Mechanical Turk,” a proprietary system that enables Amazon to collect a fee for brokering on-line employment relationships.

I’ve been trying to figure out for a while now what I think about this practice.

After considerable reading and thinking, I’ve concluded that “MT” samples are in fact a horribly defective basis for the study of the dynamics I myself am primarily interested in—namely, ones relating to how differences in group commitments interact with the cognitive processes that generate cultural or political polarization over societal risks and other facts that admit of scientific study.

I’m going to explain why, and in two posts.  To lay the groundwork for my assessment of the flaws in MT samples, this post will set out a very basic account of how to think about the “validity” of psychology samples generally.

Sometimes people hold forth on this as if sample validity were some disembodied essence that could be identified and assessed independently of the purpose of conducting a study. They say things like, “That study isn’t any good—it’s based on college students!” or make complex mathematics-pervaded arguments about “probability based stratification” of general population samples and so forth.

The reason to make empirical observations is to generate evidence that gives us more reason or less than we otherwise would have had to believe some proposition or set of propositions (the ones featured in the study hypotheses) about how the world works.

The validity of a study sample, then, depends entirely on whether it can support inferences of that sort. 

Imagine someone is studying some mental operation that he or she has reason to think is common to all people everywhere—say, “perceptual continuity,” which involves the sort of virtual, expectation-based processing of sensory stimuli that makes people shockingly oblivious to what seem like shockingly obvious but unexpected phenomena, like the sudden appearance of a gorilla among a group of basketball players or the sudden substitution of one person for another during a conversation between strangers.

Again, on the researcher's best understanding of the mechanisms involved, everyone everywhere is subject to this sort of effect, which reflects processes that are in effect “hard wired” and invariant.  If that’s so, then pretty much any group of people—so long as they haven’t suffered some sort of trauma that might change the operation of the relevant mental processes—will do.

So if a reasearcher wants to test whether a particular intervention—like telling people about this phenomenon—will help to counteract it, he or she can go ahead and test it on any group of normal people that researcher happens to have ready access to—like college undergraduates.

But now imagine that one is studying a phenomenon that one has good reason to believe will generate systematic differences among individuals identified with reference to certain specific characteristics. 

That’s true of “cultural cognition” and like forms of motivated reasoning that figure in the tendency of people to fit their assessments of information—from scientific “data” to expository arguments to the positions of putative experts to (again!) their own sense impressions—to positions on risk and like facts that dominate among members of their group.

Because the phenomenon involves individual differences, a sample that doesn’t contain the sorts of individuals who differ in the relevant respects won’t support reliable inferences.

E.g., there’s a decent amount of evidence that white males with hierarchic and individualistic values (or with “conservative” political orientations; cultural values and measures of political ideology or party affiliation are merely alternative indicators of the same latent disposition, although I happen to think cultural worldviews tend to work better) are motivated to be highly skeptical of environmental and technological risks. Such risk claims, this work suggests, are psychically threatening to such individuals, because their status and authority in society tends to be bound up with commercial and industrial activities that are being identified as dangerous, and worthy of regulation.

If one wants to investigate how a particular way of “framing” information might dissipate dismissiveness and promote open-minded engagement with evidence on climate change, then it makes no sense to test such a hypothesis on, say, predominantly female undergraduates attending a liberal east-coast university.  How they respond to the messages in question won’t generate reliable inferences about how white, hierarchical individualistic males will—and they are the group in the world that we have reason to believe is reacting in the most dismissive way to scientific evidence on climate change.

Obviously, this account of “sample validity” depends on one being right when one thinks one has “good reason to know” that the dynamics of interest are uniform across people or vary in specific ways across subpopulations of them.

But there’s no getting around that! If one uses a “representative” general population sample to study a phenomenon that in fact varies systematically across subpopulations, then the inferences one draws will also be faulty, unless one both tests for such individual differences and assures that the sample contains a sufficiently large number of the subpopulation members to enable detection of such effects. Indeed, to assure that there are enough of members of the subpopulations--particularly if one of them is small, like, say, a racial minority--is to oversample, generating a nonrepresentative sample!

The point is that the validity of a sample depends on its suitability for the inferences to be drawn about the dynamics in question.  That feature of a sample can’t be determined in the abstract, according to any set of mechanical criteria.  Rather it has to be assessed in a case-specific way, with the exercise of judgment. 

And like anything of that sort—or just anything that one investigates empirically—the conclusions one reaches will need to be treated as provisional insofar as later on someone might come along and show that the dynamics in question involved some feature that evaded detection with one’s sample, and thus undermines the inferences one drew.  Hey, that's just the way science works!

Maybe on this account Mechanical Turk samples are “valid” for studying some things.   

But I’m convinced they are not valid for the study of how cultural or ideological commitments influence motivated cognition: because of several problematic features of such samples, one cannot reliably infer from studies based on them how this dynamic will operate in the real world.

I’ll identify those problematic features of MT samples in part two of this series.