follow CCP

Recent blog entries
popular papers

Science Curiosity and Political Information Processing

What Is the "Science of Science Communication"?

Climate-Science Communication and the Measurement Problem

Ideology, Motivated Cognition, and Cognitive Reflection: An Experimental Study

'Ideology' or 'Situation Sense'? An Experimental Investigation of Motivated Reasoning and Professional Judgment

A Risky Science Communication Environment for Vaccines

Motivated Numeracy and Enlightened Self-Government

Making Climate Science Communication Evidence-based—All the Way Down 

Neutral Principles, Motivated Cognition, and Some Problems for Constitutional Law 

Cultural Cognition of Scientific Consensus

The Tragedy of the Risk-Perception Commons: Science Literacy and Climate Change

"They Saw a Protest": Cognitive Illiberalism and the Speech-Conduct Distinction 

Geoengineering and the Science Communication Environment: a Cross-Cultural Experiment

Fixing the Communications Failure

Why We Are Poles Apart on Climate Change

The Cognitively Illiberal State 

Who Fears the HPV Vaccine, Who Doesn't, and Why? An Experimental Study

Cultural Cognition of the Risks and Benefits of Nanotechnology

Whose Eyes Are You Going to Believe? An Empirical Examination of Scott v. Harris

Cultural Cognition and Public Policy

Culture, Cognition, and Consent: Who Perceives What, and Why, in "Acquaintance Rape" Cases

Culture and Identity-Protective Cognition: Explaining the White Male Effect

Fear of Democracy: A Cultural Evaluation of Sunstein on Risk

Cultural Cognition as a Conception of the Cultural Theory of Risk

« Is the perverse effect of AOT on political polarization confounded by a missing variable? Nah. | Main | Fake news vs. "counterfeit social proof"--lecture summary & slides »

Reflections on "System 2 bias"--part 2 of 2

Part 2 of 2 of refletions on Miller & Sanjuro. Part 1 here.

So “yesterday”™ I presented some reflections on what I proposed calling “System 2 bias” (S2b).

As I explained, good System 2 reasoning in fact depends on intuitions calibrated to perceive a likely System 1 error and to summon the species of conscious, effortful information processing necessary to avoid such a mistake.

S2b occurs when one of those well trained  intuitions misfires.  Under its influence, a normally strong reasoner will too quickly identify and correct a judgment he or she mistakenly attributes to over-reliance on system 1, heuristic reasoning. 

As such, S2b will have two distinctive features.  One is that it will be made, paradoxically, much more readily by proficient reasoners, who possess a well stocked inventory of System 2-enabling intuitions, than by nonproficient ones, who don’t. 

The other is that reasoners who display this distinctive form of biased information processing will strongly resist the correction of it. The source of their mistake is a normally reliable intuition essential to seeing that a particular species of judgment is wrong or fallacious.  It is in the nature of all reasoning intuitions that they provoke a high degree of confidence that one’s perception of a problem and one’s solution to it are correct. It is the absence or presence of that feeling that tells a reasoner when to turn on his or her capacity for conscious, effortful information processing, and when to turn it off and move on.

I suggested that S2b was at the heart of the Miller-Sanjurjo affair.  Under the influence of S2b, GVT and others too quickly endorsed—and too stubbornly continue to defend—an intuitively pleasing but flawed analytical method for remedying patterns of thought that they believe reflect the misidentification of independent events (successes in basketball shots) as interdependent ones.

But this account is a product of informed conjecture only.  We should try to test it, if we can, by experiments that attempt to lure strong reasoners into the signature errors of S2b.

This is where the “Margolis” (1996, pp. 53f; a problem identified, helpfully, by Josh Miller as an adaptation of “Bertrand’s Box paradox”) comes in.

The right answers to “a,” “b,” and “c” are in fact “67%-67%-67%.” (If you are scratching your head on this, then realize that there are twice as many ways to get red if one selects the red-red chip than if one selects the blue-red one; accordingly, if one is picking from a vessel with red-red and red-blue, “red side up” will come twice as often for the red-red chip as it will for the red-blue one”…. Or realize that if you answered “67%” for “c,” then logically it must be 67% for “a” and “b” as well—for it surely doesn’t matter for purposes of “c” which color the slected chip displays…).

But “50%-50%-67%” is an extremely seductive “lure.”  We might predict then, that as reasoning proficiency increases, study subjects will become progressively more and more likely to pick “67%-67%-67%” rather than “50%-50%-67%.”

But that’s not what we see!

In fact, the likelihood of “50%-50%-67%” increases steadily as one’s Cognitive Reflection Test score increases.  In other words, one has to be pretty smart even to take the bait in the “Margolis”/“Bertrand Box Paradox” problem.  Those who score low on CRT are in fact all over the map: “33%-33%-33%,” “50%-50%-50%,” etc. are all more common guesses for subjects with low CRT scores than is “67%-67%-67%).”

Hence, we have an experimental model here of how “System 2 bias” works, one that demonstrates that certain types of error are more likely, not less, as cognitive proficiency increases.  For more of the same, see Peters et. al 2006, 2018)

This is a finding, btw, that has important implications for using the Margolis/Bertrand question as part of a standardized cognitive-proficiency assessment.  In short, either one shouldn’t use the item, b/c it has a negative correlation with performance of the remaining assessment items, or one should use the “wrong answer” as the right one for measuring the target reasoning disposition, since in fact getting the wrong answer is a better indicator of that disposition than is getting the right one.

As I said, the other signature attribute of this bias is how stubbornly those who display System 2 bias cling to the wrong answers it begats.  There is anecdotal evidence for this in Margolis (1996, pp. 53-56), which corresponds nicely to my own experience in trying to help those high in cognitive proficiency to see the “right” answer to this problem. Also, consider how many smart people tried to dismiss M&S when Gelman first freatured this M&S on his blog.

But it would be pretty cool to have an experimental proof of this aspect to the problem, too.  Any ideas anyone?

In any event, here you go: an example of an “S2b” problem where being smart correlates negatively with the right answer.

It’s not a knock down proof that S2b explains the opposition to the Miller-Sanjurjo proof.  But it’s at least a “brick’s worth” of evidence to that effect.


Margolis, H. Dealing with risk : why the public and the experts disagree on environmental issues (University of Chicago Press, Chicago, IL, 1996).

Miller, Joshua B. and Sanjurjo, Adam, Surprised by the Gambler's and Hot Hand Fallacies? A Truth in the Law of Small Numbers, Econometrica (2018). Available at SSRN: or

Peters et al., The loss‐bet paradox: Actuaries, accountants, and other numerate people rate numerically inferior gambles as superior. Journal of Behavioral Decision Making (2018), available at

Peters, E., et al. Numeracy and Decision Making. Psychol Sci 17, 407-413 (2006).

PrintView Printer Friendly Version

EmailEmail Article to Friend

Reader Comments (29)

Why I got 50%-50%-67% initially, I think (a-posteriori confabulation caveats!) was because the math part of the problem (in A and B) attracted my S2 attention too much for me to accurately attend to the "word" part of the problem (alignment of events with linguistic description, caring for semantics properly). I wasn't sequentially processing the problem - perhaps because my S1 alarm trigger for S2 said something like "Whoa - math skill needed over there - let's devote more energy to that part!".

However, when Joshua mentioned 67% for A and B, I had (or think I had - more a-posteriori caveats) an immediate intuition that somehow 67% is preferable to 50% for A and B, but that intuition wasn't accessible on my own, even with suspicion of a trick.

What's even weirder is that I happen to have very recently read a paper about controversy in the probability word-problem setup of multiverse/fine-tuning arguments, and so should have had probability word-problem setup priming galore. But, noooo. Darn math alarm rings too loud.

So - here's a hypothesis - what if one splits high CRT folks into two camps - the purely STEM types vs. those with a liberal arts background. Will those with a more liberal arts background attend to the word problem setup more, and so be more likely to get this problem and the hot-hand-fallacy-fallacy issue correct?

June 26, 2018 | Unregistered CommenterJonathan

Dan - your problem is well known, probably since Poe's The Purloined Letter
and maybe as far back as Zeno's paradox, but, at least among economists, certainly since Keynes's "beauty contest" (will post complete article further down).

For brevity: as long as the problem is purely quantitative, there is no known solution, because you run into infinite regress.

To avoid that trap, you must introduce a cultural variable involving some emotional context. Actual pictures of people work - as used by Keynes. I am disregarding here rigging via hacking - see note in longer post. Or, as Poe notes in his introduction, Nil sapientiae odiosius acumine nimio. - Seneca.

June 26, 2018 | Unregistered CommenterEcoute Sauvage

Fair use of copyrighted material

Keynes’s ‘beauty contest’
Richard Thaler on the economist as the true father of behavioural finance and ‘third level’ thinking

Richard Thaler JULY 10, 2015

In 1978 the financial economist Michael Jensen wrote: “I believe there is no other proposition in economics which has more solid empirical evidence supporting it than the efficient market hypothesis.” If it is possible to “jinx” a scientific hypothesis, Professor Jensen may have done it. Consider the history since that time.

First, there was the crash in stock prices in October 1987. The late 1990s saw a spectacular rise and fall in technology stocks. The irrational exuberance shifted to real estate, leading up to the peak in August 2006, followed by a crash that helped cause the global financial crisis. Even former chairman of the Federal Reserve Alan Greenspan apologised: “Those of us who have looked to the self-interest of lending institutions to protect shareholders’ equity — myself especially — are in a state of shocked disbelief.”

Many other economists who were ardent supporters of the efficient market hypothesis (EMH) have also been surprised by recent history but there is one man who would not have been “shocked”: John Maynard Keynes.
Keynes is remembered for his view that governments should spend money in recessions to regain full employment, an argument made famous in The General Theory of Employment, Interest, and Money (1936). Few, however, realise that Keynes was a true forerunner of behavioural finance. Had more people, including Greenspan, studied the chapter of The General Theory on financial markets, the crisis might have been avoided.

Keynes thought markets had been more “efficient” at the beginning of the 20th century, when managers owned most of the shares in a company and knew what it was worth. As shares became more widely dispersed, “the element of real knowledge in the valuation of investments by those who own them or contemplate purchasing them . . . seriously declined”.
By the time of The General Theory, Keynes had concluded that markets had gone crazy. “Day-to-day fluctuations in the profits of existing investments, which are obviously of an ephemeral and non-significant character, tend to have an altogether excessive, and even an absurd, influence on the market.”

To buttress his point, he noted the fact that shares of ice companies were higher in summer months when sales are higher. This fact is surprising because in an efficient market, stock prices reflect the long-run value of a company, and do not rise in good seasons. Recent academic studies show this pattern is still true.

Keynes was also sceptical that professional money managers would perform the role of the “smart money” that EMH defenders rely upon to keep markets efficient. Rather, he thought they were more likely to ride a wave of irrational exuberance than to fight it. One reason is that it is risky to be a contrarian. “Worldly wisdom teaches that it is better for reputation to fail conventionally than to succeed unconventionally.”

Instead, Keynes thought that professional money managers were playing an intricate guessing game. He likened it to a common newspaper game “in which the competitors have to pick out the six prettiest faces from 100 photographs, the prize being awarded to the competitor whose choice most nearly corresponds to the average preferences of the competitors as a whole: so that each competitor has to pick, not those faces that he himself finds prettiest, but those that he thinks likeliest to catch the fancy of the other competitors, all of whom are looking at the problem from the same point of view . . . We have reached the third degree where we devote our intelligences to anticipating what average opinion expects the average opinion to be. And there are some, I believe, who practise the fourth, fifth, and higher degrees.”

I believe Keynes’s beauty-contest analogy remains an apt description of how financial markets work, as well as of the key role played by behavioural factors. To understand his analogy, try out this puzzle that Tim Harford recently posed on my behalf to FT readers:
Guess a number from zero to 100, with the goal of making your guess as close as possible to two-thirds of the average guess of all those participating in the contest. To help you think about this puzzle, suppose there are three players who guessed 20, 30 and 40 respectively. The average guess would be 30, two-thirds of which is 20, so the person who guessed 20 would win.

If you did not enter the contest, you might consider what your guess might have been.
Now that you have thought, consider what I will call a zero-level thinker. He says: “I don’t know. This seems like a maths problem. I will just pick a number at random.” Lots of people guessing a number between zero and 100 at random will produce an average guess of 50.

How about a first-level thinker? She says: “The rest of these players don’t like to think much, they will probably pick a number at random, averaging 50, so I should guess 33, two-thirds of 50.”
A second-level thinker will say: “Most players will be first-level thinkers and think that other players are a bit dim, so they will guess 33. Therefore I will guess 22.”
A third-level thinker: “Most players will discern how the game works and will figure that most people will guess 33. As a result they will guess 22, so I will guess 15.”

Of course, there is no convenient place to get off this train of thinking. Do you want to change your guess?
Here is another question: what is the Nash equilibrium for this scenario? Named for John Nash, the mathematician and subject of the film A Beautiful Mind who sadly was recently killed in a car crash, the Nash equilibrium in this game is a number that if everyone guessed it, no one would want to change their guess. The only Nash equilibrium in this game is zero.

To see why, suppose everyone guessed three. Then the average guess would be three and you would want to guess two-thirds of that, or two. But if everyone guessed two you would want to guess 1.33, and so forth. If, and only if, all participants guessed zero would no one want to change his or her guess.
Formally, this game is identical to Keynes’s beauty contest: you have to guess what other people are thinking that other people are thinking. In economics, the “number guessing game” is commonly referred to as the “beauty contest”.

Thanks to the FT, this is the second time I have run this experiment on a large scale [see panel]. In 1997 we offered two business-class tickets to North America. Now, in these days of austerity, entrants were offered what I have been assured is a posh travel bag. Personally, I am also throwing in an autographed copy of my recent book Misbehaving, on which this essay is based.

How have things changed? Well, one finding will comfort tradition-bound economists. When the prize was two business-class tickets we had 1,382 contestants. With only a travel bag on offer, entrants dropped to 583. Economic theory is redeemed!

Even with the smaller number of entrants, the results were nearly identical. In 1997 the average guess was 18.9, meaning the winning guess was 13. This time the average guess was 17.3, leading to a winning guess of 12. The distribution of guesses also looks like the one from 1997.

Many contestants were able to figure out the Nash equilibrium and guessed zero or one, thinking everyone else would be as clever as they were. A large number also guessed 22, showing second-level thinking. Just as last time, there was an assortment of pranksters who guessed 99 or 100, trying to skew the results.
Keynes’s beauty-contest analogy remains an apt description of what money managers do. Many investors call themselves “value managers”, meaning they try to buy stocks that are cheap. Others call themselves “growth managers”, meaning they try to buy stocks that will grow quickly. But of course no one is seeking to buy stocks that are expensive or stocks of companies that will shrink. So what these managers are really trying to do is buy stocks that will go up in value — or, in other words, stocks that they think other investors will later decide should be worth more.
Buying a stock that the market does not fully appreciate today is fine, as long as the rest of the market comes around to your point of view sooner rather than later. Remember another of Keynes’s famous lines: “In the long run we are all dead.” The typical long run for a portfolio manager is no more than a few years; often just a few months! So to beat the market a money manager has to have a theory about how other investors will change their minds. In other words, their approach has to be behavioural.

Richard Thaler is Professor of Behavioural Science and Economics at the Booth School of Business, University of Chicago.


Several people identified 12, the winning number, but Richard Thaler picked Anatoly Lebedev, executive director, commodities electronic trading, at Goldman Sachs for his logic. Lebedev added this excellent warning: “If the competition was checked by a computer, there would be a ‘hacker’ solution of submitting a billion times one same number from fake accounts and then calling two-thirds of the number from a real account.” Saboteurs, watch out!

June 26, 2018 | Unregistered CommenterEcoute Sauvage


That sounds similar to Doug Hofstadter's superrationality. It doesn't sound at all relevant Dan vs. the hot-hand, because there's no recursion there. At least no more relevant than this does to your ft article.

June 26, 2018 | Unregistered CommenterJonathan

No, Jonathan. Recursion is not the same as infinite regress - read carefully.

June 26, 2018 | Unregistered CommenterEcoute Sauvage


I consider recursion to include all finite or infinite cycles that repeat important decision aspects with possibly different parameterization at each cycle. Blame too many years spent coding.

June 26, 2018 | Unregistered CommenterJonathan

To see the difference, try explaining to an algorithm why calling George Bush a chimp is OK, but calling Michelle Obama an ape is not. The algorithm has to learn by frequency analysis - many more people will click on an item mentioning the second than the first, and therefore the outrage merchants will end up dominating the news media. Raving lunatics like Charles Blow of the NYT are effectively promoting white nationalist memes - but they're too stupid to realize this

I personally hope they will continue, maybe even escalate to the level of the Obama official portrait painter, habitually joking about "kill whitey". The Hitler memes about illegal aliens wonderfully aid the process. Recursion not involved, but as Thaler notes in the FT article, the Nash equilibrium is at zero. Infinite regress. There is only one end to continuing culture war.

June 26, 2018 | Unregistered CommenterEcoute Sauvage

Ha ha, Jonathan, you can't see it because of too many years coding, but none hacking!

Try the distinguished law professor here - before he deletes this pearl:

June 26, 2018 | Unregistered CommenterEcoute Sauvage

@Ecoute-- I think the problem has 1 unimpeachable solution--no regress of any sort in 67-67-67.

June 26, 2018 | Unregistered CommenterDan Kahan

The regress consists in having the 50-50-67 types get annoyed with being beaten about the head and shoulders with the 67-67-67 solution!

Unless the problem specification provides for some way to distinguish between the 2 identical sides - which I don't see in the original specs - the first solution is as valid as the second. That's my story and I'm sticking to it!

June 26, 2018 | Unregistered CommenterEcoute Sauvage


Perhaps you are familiar with the paradox of tolerance, also known as Popper's paradox. This is resolvable within a Russell-like type theory - as meta-level tolerance and object-level tolerance could then be distinct predicates - and having maximal object-level tolerance would then be allowed to consistently require meta-level intolerance to object-level intolerance.

Unfortunately for myself, I haven't developed sufficient meta-level intolerance. Apparently, a case where my S1 math alarm isn't sufficient to drown out my S1 egalitarian-communitarian heuristics. But, some more of these awkward digressions, and I might have to develop the necessary skills.

June 26, 2018 | Unregistered CommenterJonathan

Hardly digressions - I was addressing Dan's original question on what would constitute an effective test. Keynes's test is best: ask respondents to rank pictures of faces according to what they think the consensus ranking would be.

I'll bet the ranking would reflect the idea of Christoph Meiners (historian of the Goettingen School) who divided humans into 2 races, the beautiful and the ugly. Popper (and Rawls) would disagree - I guess.

Should be an easy enough test to run.

June 26, 2018 | Unregistered CommenterEcoute Sauvage

Speaking of bad math, the first identity politics column in 538 has some illustrations:

Am quite embarrassed that dems though 44% of pubz make >= $250K.

June 26, 2018 | Unregistered CommenterJonathan

@Ecoute-- well, apparently, right or wrong, the more convinced you are that it is 50-50-67 the higher your score on a critical reasoning test

June 27, 2018 | Registered CommenterDan Kahan

Difference-in-difference estimates of opposition to GE food before and after mandatory labeling show that the labeling policy led to a 19% reduction in opposition to GE food. The findings help provide insights into the psychology of consumers’ risk perceptions that can be used in communicating the benefits and risks of genetic engineering technology to the public.

June 27, 2018 | Unregistered CommenterJonathan

@Ecoute-- well, apparently, right or wrong, the more convinced you are that it is 50-50-67 the higher your score on a critical reasoning test

June 27, 2018 | Registered CommenterDan Kahan

Thank you!

June 27, 2018 | Unregistered CommenterEcoute Sauvage

Per Rasmussen poll, a third of the nation believes we are heading into a new civil war

June 27, 2018 | Unregistered CommenterEcoute Sauvage

An early offering in the burgeoning field of Jordan Peterson studies:

Twelve Perspectives on Jordan Peterson
An Antidote to Allergies and Infatuations

June 27, 2018 | Unregistered CommenterJonathan

Jonathan -

I think I posted this earlier - have you seen it?

June 28, 2018 | Unregistered CommenterJoshua

How can anybody trust scale factors calculated by extremists? Ranking of USSC sitting judges and potential nominees makes no sense (and yes I read the linked ranking system). The skew is grotesque.

June 28, 2018 | Unregistered CommenterEcoute Sauvage


No - missed that one. Or, maybe I stopped listening right after the "if you see a cat in the road, pet it" line, and subsequently repressed any memory of it.

June 28, 2018 | Unregistered CommenterJonathan

Been thinking more about the poker chip problem and how high CRT people are prone to getting parts A and B wrong. I think the issue is that the wording of the problem, specifically "Imagine there are three poker chips in a cup.... Without looking, you take one chip out of the cup...", nudges high CRT people to focus on counting events about whole chips instead of individual sides, and that the subtle semantic difference doesn't subsequently trigger any reappraisal of the problem when it turns out to be about sides in parts A and B, but not C. In fact, since part C is about whole chips while still mentioning sides, C could contribute to the failure to reappraise the focus in A and B.

What I want to know is - what lead to the creation of and experimentation on this problem? Was it initially hypothesized to have this effect on high CRT folks?

June 28, 2018 | Unregistered CommenterJonathan

Oops - "led" not "lead" in that penultimate question. Must be compensating for that Page/Plant/Bonham/Jones misspelling. Don't know why - always liked the original Yardbirds better anyway.

June 28, 2018 | Unregistered CommenterJonathan

Jonathan -

If you read that thread I read above.... I think it is something of a language problem. Read Dypoon's suggestion for an alternative wording that, I think, would net quite different results.

June 28, 2018 | Unregistered CommenterJoshua

Thread I linked above.....

June 28, 2018 | Unregistered CommenterJoshua


Got it from downstairs:

OK - sorry I'm late to the party....

June 28, 2018 | Unregistered CommenterJonathan

Jonathan - re your Peterson article, I never got far enough to read about a cat, but stopped reading the abstract after the statement that Peterson has support from the alt-right. He doesn't.

However, while both Peterson and Kahneman are demonstrably frauds >
> I think only Kahneman is fully aware of the fact. The Nobel Economics committee will not make such a mistake again - maybe they should have checked with their Medicine colleagues before awarding that prize.

June 29, 2018 | Unregistered CommenterEcoute Sauvage

Dan - instead of trying to explain "perverse" results on risks of climate change, why not ask about space exploration? It requires at least as much scientific knowledge. And I'll bet you will come up with even more "perverse" results. Here is what an alt-right website has to say about it:

June 29, 2018 | Unregistered CommenterEcoute Sauvage

What if you changed the wording of (c) to read as follows:
If without looking you reach into the cup and take one chip out, what is the chance that it will have BLUE on both sides?
What is the chance it will have RED on both sides?

Now, what is your answer to (a) and (b)?

The correct answer to the original a, b, and c questions are 33-33-67!!

It's amazing how innumerate one becomes when the WORDING of a question is corrected.

July 16, 2018 | Unregistered CommenterKing John's return

PostPost a New Comment

Enter your information below to add a new comment.

My response is on my own website »
Author Email (optional):
Author URL (optional):
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>