follow CCP

Recent blog entries
popular papers

Science Curiosity and Political Information Processing

What Is the "Science of Science Communication"?

Climate-Science Communication and the Measurement Problem

Ideology, Motivated Cognition, and Cognitive Reflection: An Experimental Study

'Ideology' or 'Situation Sense'? An Experimental Investigation of Motivated Reasoning and Professional Judgment

A Risky Science Communication Environment for Vaccines

Motivated Numeracy and Enlightened Self-Government

Making Climate Science Communication Evidence-based—All the Way Down 

Neutral Principles, Motivated Cognition, and Some Problems for Constitutional Law 

Cultural Cognition of Scientific Consensus

The Tragedy of the Risk-Perception Commons: Science Literacy and Climate Change

"They Saw a Protest": Cognitive Illiberalism and the Speech-Conduct Distinction 

Geoengineering and the Science Communication Environment: a Cross-Cultural Experiment

Fixing the Communications Failure

Why We Are Poles Apart on Climate Change

The Cognitively Illiberal State 

Who Fears the HPV Vaccine, Who Doesn't, and Why? An Experimental Study

Cultural Cognition of the Risks and Benefits of Nanotechnology

Whose Eyes Are You Going to Believe? An Empirical Examination of Scott v. Harris

Cultural Cognition and Public Policy

Culture, Cognition, and Consent: Who Perceives What, and Why, in "Acquaintance Rape" Cases

Culture and Identity-Protective Cognition: Explaining the White Male Effect

Fear of Democracy: A Cultural Evaluation of Sunstein on Risk

Cultural Cognition as a Conception of the Cultural Theory of Risk

« Why expect people to *know* evolution? A question that deserves a good answer | Main | Weekend update: More on the wisdom of SE Fla's *political science* of climate change »

We need a CRT 2.0! And IRT should be used to develop it

I really really really like the Cognitive Reflection Test--or "CRT" (Frederick 2005).

The CRT is a compact three-item assessment of the disposition to rely on conscious, effortful, "System 2" reasoing as opposed to rapid, heuristic-driven "System 1" reasoning.  An objective or performance-based measure, CRT has been shown to be vastly superior to self-report measures like "need for cognition" ("agree or disagree-- 'thinking is not my idea of fun'; 'The notion of thinking abstractly is appealing to me' . . .") in predicting vulnerability to the various biases that reflect over-reliance on System 1 information processing  (Toplak, West & Stanovich 2011).

As far as I’m concerned, Shane Frederick deserves a Nobel Prize in economics for inventing this measure every bit as much Daniel Kahneman deserved his for systematizing knowledge of the sorts of reasoning deficits that CRT predicts.

Nevertheless, CRT is just not as useful for the study of cognition as it ought to be. 

The problem is not that the correct answers to its three items are too likely to be known at this point by M Turk workers—whose scores exceed those of MIT undergraduates (Chandler, Mueller & Paolacci 2014).

This is what CRT score distribution looks like when test is administered to normal people (i.e., not M Turk workers, ivy league college students, people who fill out surveys at on-line sites that solicit study subjects who want to learn their CRT scores, etc)Rather the problem is that CRT is just too darn hard when used to study legitimate study subjects.

The mean score when it is administered to a general population sample is about 0.65 correct responses (Kahan 2013; Weller, Dieckmann, Tusler, Mertz, Burns & Peters 2012; Campitelli & Labollita, 2010).

The median score is 0.

Accordingly, if we want to study how individual differences in System 1 vs. System 2 reasoning styles  interact with other dynamics—like motivated reasoning—or respond to interventions designed to improve engagement with technical information, then for half the population CRT necessarily gives us zero information.

Unless one makes the exceedingly implausible assumption that there's no variance to measure among this huge swath of people, this is a severe limitation on the value of the measure.

I've addressed this previously on this blog but I had occasion to underscore and elaborate on this point recently in correspondence with a friend who does outstanding work in the study of cognition and who (with good reason) is a big fan of CRT.

Here are some of the points I made:

I don’t doubt that CRT measures the disposition to use System 2 information processing more faithfully than, say, Numeracy [a scale that assesses proficiency in quantitative reasoning]. 

But the fact remains that Numeracy outperforms CRT in predicting exactly what CRT is supposed to predict—namely vulnerability to heuristic biases (Weller et al. 2012; Liberali 2012). Numeracy is getting a bigger piece of the latent disposition that CRT measures—and that's strong evidence of the need for a better CRT.

Or consider the Ordinary Science Intelligence assessment, “OSI_2.0,” the most recent version of a scale I've been working on to measure a disposition to recognize and give appropriate effect to scientific information relevant to ordinary, everyday decisions (Kahan 2014).  

Cognitive reflection is among the combination of reasoning proficiencies that this (unidimensional) disposition comprises.

But for sure, I didn't construct OSI_2.0 to be "CRT_2.0.”  I created it to help me & others do a better job in investigating how to asses the relationship between science comprehension and dynamics that constrain the effectiveness of public science communication.

With Item Response Theory, one can assess scale reliability continuously along the range of the underlying latent disposition (DeMars 2010).  Doing so for OSI_2.0, it can be seen that what CRT contributes to OSI_2.0’s measurement precision is concentrated at the very upper end of the range of the "ordinary science intelligence" aptitude:


This feature of CRT can be shown to make CRT less effective at what it is supposed to do—viz., predict individual differences in the disposition to resist over-reliance on heuristic processing.

The covariance problem is considered diagnostic of that sort of disposition (Stanovich 2009, 2011). Those vulnerable to over-reliance on heuristic processing tend to make snap judgments based on the relative magnitudes of the numbers in “cell A” and either “cell B” or “cell C” in a 2x2 contingency table or equivalent. Because they don't go to the trouble of comparing the ratio of A to B with the ratio of C to D, people draw faulty inferences about the significance of the information presented (Arkes & Harkness 1983).

As it should, CRT predicts resistance to this bias (Toplak, West & Stanovich 2011).

But not as well as OSI_2.0.


These are scatter plots of performance on the covariance problem (N = 400 or so) in relation to OSI_2.0 & CRT, respectively, w/ lowess regression plots superimposed.

The crook in  profile of the OSI_2.0 plot compared to the flat, boring profile of CRT shows that the former has superior discrimination (that is, identifies in a more fine-grained way how differences in reasoning ability affect the probability of getting the right answer).

Relatedly, the interspersing of the color-coded observations on the OSI_2.0 scatter plot shows how CRT is dividing people into groups that are both under- & over-inclusive w/r/t to proficiencies that OSI_2.0 is sorting out more reliably.

Or more concretely still, if I had only CRT, then I'd predict that  there is only a 40% probability that someone who is +1 on OSI_2.0-- just short of "1" on CRT -- would get the covariance problem correct, when in fact the probability such a person will get the right answer is about  60%. 

Similarly, if I used CRT to predict how someone at +1.5 on OSI_2.0 is likely to do on the problem, I'd predict about a 50% probability of him or her selecting the correct response -- when in fact the probability of a correct response for that person is closer to 75%.

Essentially, I'm going to be as satisfied with CRT as I am in OSI_2.0 only if  my interest is to predict performance of those who score either 2 or 3 on CRT -- the 90th percentile or above in a general population sample. 

But as can be seen from the OSI_2.0 scatter plot, it’s simply not the case that there’s no variance in people’s vulnerability to this particular heuristic bias in the rest of the population.  A measure that can't enable examination of how so substantial a fraction of the population thinks should really disappoint cognitive psychologists, assuming their goal is to study critical reasoning in human beings generally.

click on me-- your CRT score will instantly jump 2 points!Now, it's absolutely no surprise that OSI_2.0 dominates CRT in this regard: the CRT items are all members of  the OSI_2.0 scale, which comprises 18 items the covariance structure of which is consistent with measurement of a unidimensional latent disposition.  So of course it is going to be a more discerning measure of whatever it is CRT is itself measuring -- even if CRT_2.0 isn't faithfully measuring only that, as CRT presumably is. 

But that’s the point: we need a “better” CRT—one that is as tightly focused as the current version on the construct the scale is supposed to measure but that gets at least as big a piece of the underlying disposition as OSI_2.0, Numeracy or other scales that outperform CRT in predicting resistance to heuristic biases.

For that, "CRT 2.0" is going to need not only more items but items that add information to the scale in the middle and lower levels of the disposition that CRT is assessing.  IRT is much more suited for identifying such items than are the methods that those working on CRT scale development now seem to be employing.

I could certainly understand why a researcher might not want a scale with as many as 18 items. 

But again IRT can help here: use it to develop a longer, comprehensive battery of such items, ones that cover a large portion of the range of the relevant disposition.  Then administer an "adaptive testing" battery that uses strategically selected subsets of items to zero in on any individual test-taker’s location on the range of the measured “cognitive reflection” disposition (DeMars 2010).  Presumably, no one would need to answer From Mueller, Chandler, & Paolacci, Soc'y for P&SP, 1/28/12more than half dozen in order to enable a very precise measure of his or her proficiency -- assuming one has a good set of items in the adaptive testing battery.

Anyway, I just think it is obvious that researchers here can and should do better--and not just b/c MTurk workers have all learned at this point that the ball costs 5 cents!


Arkes, H.R. & Harkness, A.R. Estimates of Contingency Between Two Dichotomous Variables. J. Experimental Psychol. 112, 117-135 (1983).

Campitelli, G. & Gerrans, P. Does the cognitive reflection test measure cognitive reflection? A mathematical modeling approach. Memory & Cognition, 1-14 (2013).

Chandler, J., Mueller, P. & Paolacci, G. Nonnaïveté among Amazon Mechanical Turk workers: Consequences and solutions for behavioral researchers. Behavior research methods 46, 112-130 (2014).

DeMars, C. Item response theory (Oxford University Press, Oxford ; New York, 2010).

Frederick, S. Cognitive Reflection and Decision Making. Journal of Economic Perspectives 19, 25-42 (2005).

Kahan, D.M. Ideology, Motivated Reasoning, and Cognitive Reflection. Judgment and Decision Making 8, 407-424 (2013). 

Kahan, D.M. "Ordinary Science Intelligence: A Science Comprehension Measure for Use in the Study of Science Communication, with Notes on "Belief in" Evolution and Climate Change. CCP Working Paper No. 112 (2014).

Liberali, J.M., Reyna, V.F., Furlan, S., Stein, L.M. & Pardo, S.T. Individual Differences in Numeracy and Cognitive Reflection, with Implications for Biases and Fallacies in Probability Judgment. Journal of Behavioral Decision Making (2011).

Stanovich, K.E. Rationality and the reflective mind (Oxford University Press, New York, 2011).

Stanovich, K.E. What intelligence tests miss: the psychology of rational thought (Yale University Press, New Haven, 2009).

Toplak, M., West, R. & Stanovich, K. The Cognitive Reflection Test as a predictor of performance on heuristics-and-biases tasks. Memory & Cognition 39, 1275-1289 (2011).

Weller, J.A., Dieckmann, N.F., Tusler, M., Mertz, C., Burns, W.J. & Peters, E. Development and testing of an abbreviated numeracy scale: A Rasch analysis approach. 

PrintView Printer Friendly Version

EmailEmail Article to Friend

Reader Comments (10)

Before reinventing the wheel, those interested in a CRT that is more reliable and valid as a result of being LONGER and including other types of items might take a look at this paper (which now may be available on line, unfortunately behind a paywall):

Baron, J., Scott, S., Fincher, K., & Metz, S. E. (in press). Why does the Cognitive Reflection Test (sometimes) predict utilitarian moral judgment (and other things)? Journal of Applied Research in Memory and Cognition, special issue on Modeling and Aiding Intuitions in Organizational Decision Making, edited by J. Marewski & U. Hoffrage.

December 17, 2014 | Unregistered CommenterJon Baron


Baron, J., Scott, S., Fincher, K. & Emlen Metz, S. Why does the Cognitive Reflection Test (sometimes) predict utilitarian moral judgment (and other things)? Journal of Applied Research in Memory and Cognition.


if this is the "CRT 2.0" wheel, then at least we are the point in history where we are entitled to be impressed w/ novelty! (& unlike the inventor of the literal wheel, you should be sure to file a patent application!)

I see you did Rasch analysis; does the test information function show that the enlarged scale has good measurement precision/information at lower levels of the cognitive reflection disposition? (Sorry if that info is in article-- I haven't had chance yet to read as carefully as I should.)

Am I misreading or is it case that (w/ exception of bat & ball 38% in study 1), samples seemed to be scoring well over 50% correct on each of the 3 CRT items?


December 17, 2014 | Registered CommenterDan Kahan

Hi Dan,

I agree with you completely and I believe Keith Stanovich would as well (in fact, if I'm not mistaken, he is currently working on Templeton funded project working on this very issue).

Here's the problem: If a) the CRT assesses the propensity to question our intuitions via resource demanding analytic reasoning processes (as the evidence indicates) and b) humans are miserly information processors / are not overly disposed to high levels of analytic reasoning except for in special cases (as supported by decades of work, going back to even before Kahneman and Tversky), then c) theoretically, you would not expect any 'CRT' item to be easy.

Having said that, there simply has to be a way to capture variation in the propensity for analytic thought within the 'levels' of CRT performance (particularly in the '0' group) without sacrificing the logic of the measure. One solution would be to devise a series of items that have incorrect intuitive responses that move from just barely intuitive (making them easier) to very intuitive (difficult, e.g., the bat & ball problem). This seems rather straightforward until try to imagine what this might look like. Perhaps I just don't know enough about intuition.

The easier option would be to add problems that involve more rudimentary cognitive skills. This would successfully increase accuracy, but the new variation picked up by the task would be related to cognitive ability and not thinking disposition. Such a task would probably emerge as a stronger predictor than the 3-item CRT because a) it has better psychometric properties and b) most/many DV's of interest are related to both cognitive ability and style (thinking disposition). This would be problematic because, depending on one's tactic with the new task, it would result in either an overemphasis on thinking disposition (if the 'easier CRT' was incorrectly considered a measure of, primarily, cognitive style) or an overemphasis on cognitive ability and underemphasis on thinking disposition (if, on account of the easier items, task was used primarily as a cognitive ability measure).

Anyway, just thought I would share my thoughts on the matter.

- Gord

December 17, 2014 | Unregistered CommenterGord Pennycook


Actually, I should have mentioned Toplak, M.E., West, R.F. & Stanovich, K.E. Assessing miserly information processing: An expansion of the Cognitive Reflection Test. Thinking & Reasoning, 1-22 (2013). I have tested some of the additional items in their expanded CRT & found that they in fact don't extend range of information along the latent disposition -- they only increase discrimination at upper end. Maybe some of the other items they added do; but they didn't use IRT to assess scale performance so it is not possible to say (I should write & ask them.)

You must be right that there is some sort of information/discrimination cliff as critical reflection drops off. But for sure it isn't at 85th or 90th percentile in population!

Actually, the way to figure this out would be to look at the general population peformance on the range of problems thought to manifest over-reliance on heuristic reasoning -- like conjunction falllacy, gamblers fallacy, baserate neglect, denominator neglect, covariance etc.--all the ones performance on which is used to validate CRT. One could in fact just use *those* problems to form a scale & see how much information there is in *it* across the range of the latent disposition that they are indicators of. A 'better CRT" should have reliability/discrimination over that range.

Of course, if you did that, why not just use the scale formed by aggregating those problems to construct a measure of the disposition in question?

December 17, 2014 | Registered CommenterDan Kahan

A question; Even with perfect tools to measure people's ability or willingness to apply more careful purposeful 'conscious' (System 1) reasoning, does that necessarily mean they are any better at overcoming the vast forces that motivate them to reason the way they do? As Damasio described with his subject Elliot in Descartes Error, even high System Two function does not necessarily equate to more evidence-based reasoning. Quick summary; Eliot aced all the cognitive tests - don't know if he took the CRT - but his behavior was dysfunctional and 'irrational' by common standards. Turns out he couldn't make any choices, because surgery had severed connections between his pre-frontal cortex and limbic areas. Essentially, he had all the facts, and the willingness and capacity to fully apply System 2 to evidence, but without their emotional valence he couldn't use the facts to prefer one option over another in any choice he faced.
So if the search is for science communication that can develop "interventions designed to improve engagement with technical information" is it enough to help people overcome "vulnerability to heuristic biases". There is way more than System 1 or System 2 variance in the Affect Heuristic that, as Eliot shows, inescapably shapes how way people FEEL about the facts.

December 17, 2014 | Unregistered CommenterDavid Ropeik

@Fearless Dave:

1. Damasio's work suggests that it is wrong to think that one can make rational decisions without well-functioning affective perception.

I think the sort of work he has done supplies good reason to doubt a discrete, hierarchical" canception of System 1/2.

It really makes no sense to think that people who are "better" at System 2 are better independently of being "better" System 1 too.

2. It is pretty clear two that people better at system 2 are in fact more likely to display certain kinds of risk-perception patholgies -- like identity-protective cognition.

So on that account, too, it is wrong to think that system 1 is source of bias & system 2 the source of correction.

The popular view of System 1/System 2 -- the sort of thing you'll find in Cass Sunstein's work on risk, e.g., -- is clearly just plain wrong.

December 17, 2014 | Registered CommenterDan Kahan

It feels like there's a lot to untangle here. In no particular order:

1) Any given cognitive task can move from System 2 to System 1 (the numeracy point).

2) The majesty of Kahneman's project derives significantly from its self-contradiction (or its "aporia", if you'd prefer) : he says over and over again that studying System 1 does not allow the student to bypass it, but at the same time he has dedicated his life's System 2 resources to explicating its conjoined twin's dominance, and a decade or so to *recruiting a popular audience to engage in the same Sisyphean work by reading Thinking Fast and Slow*. Which leads directly to:

3) Is "cognitive reflection" a mode of neural activity, or a skill? A process, or a result? Kahneman stresses the *effortfulness* of System 2, but these tests measure *accuracy*. Kahneman describes a kind of meticulous dual cognitive mirror he and Tversky constructed: "if we both made the same mistake" and all that (my copy of TFaS is not to hand). Stipulating the validity of his approach, can there even in principle exist a population of cognitive tasks whose *output* reliably maps to the System 1 / System 2 *behavioral* divide in a society-sized population?

4) Do we really want to laud *Kahneman's* System 2 -- behavior which is defined signficantly by, e.g., how many calories the brain is consuming -- or do we want to laud something more like "participation in Habermas' public sphere" (or your Liberal Repubic of Science)? It feels like you sometimes want to assert a cognitive continuum, and sometimes (more persuasively to me), you want to separate neural/materialist/micro-behavioral findings from citizenship. Kahneman suggests bridging the tragic/materialist and self-help/intellectual aspects of his work with a managerial checklist approach: "what are the top three ways that System 1 will lead me to screw up evaluating this job candidate?" and all that. Your heuristics for avoiding identity-anchored cognition have a similar feel to them, but is there really an underlying cognitive connection? To what extent does the shared cognition in a business meeting of 10 people map to the shared cognition of 100,000 Republicans considering sea-level-rise policy? In that case, might there be a *negative* correlation as I believe you've said? How does it all add up?

Thanks for the enormously stimulating posts and papers.

December 28, 2014 | Unregistered CommenterSam Penrose

@Sam.... There is now a problem for me (or anyone else) here. I make my System 1 pass over the argument & see that at step 4 you want to de-emphasize system 2. But I can see that it will require a large investment of System 2 -- that likely I will be able to treat myself to a nice triple-scoop ben & jerry's for all the brainwork involved -- in fully taking in the argument here... Yet you are, as I said, qustioning the value of doing so ...

I'm sure if I used some system 2, the paradox would disappear-- or suspect it would. And I like ice cream. Problem solved.

More presently...

December 28, 2014 | Registered CommenterDan Kahan

Dan --

Apparently I ate too much ice cream and the sugar rush caused me to keyboard-frenzy all over a simple point:

Kahneman: System 2 is a behavior.
CRT 2.0 / Kahan: System 2 is an output.


January 4, 2015 | Unregistered CommenterSam Penrose


It is impossible to eat too much ice cream; both systems crave it

September 18, 2017 | Registered CommenterDan Kahan

PostPost a New Comment

Enter your information below to add a new comment.

My response is on my own website »
Author Email (optional):
Author URL (optional):
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>