follow CCP

Recent blog entries
popular papers

Science Curiosity and Political Information Processing

What Is the "Science of Science Communication"?

Climate-Science Communication and the Measurement Problem

Ideology, Motivated Cognition, and Cognitive Reflection: An Experimental Study

'Ideology' or 'Situation Sense'? An Experimental Investigation of Motivated Reasoning and Professional Judgment

A Risky Science Communication Environment for Vaccines

Motivated Numeracy and Enlightened Self-Government

Making Climate Science Communication Evidence-based—All the Way Down 

Neutral Principles, Motivated Cognition, and Some Problems for Constitutional Law 

Cultural Cognition of Scientific Consensus
 

The Tragedy of the Risk-Perception Commons: Science Literacy and Climate Change

"They Saw a Protest": Cognitive Illiberalism and the Speech-Conduct Distinction 

Geoengineering and the Science Communication Environment: a Cross-Cultural Experiment

Fixing the Communications Failure

Why We Are Poles Apart on Climate Change

The Cognitively Illiberal State 

Who Fears the HPV Vaccine, Who Doesn't, and Why? An Experimental Study

Cultural Cognition of the Risks and Benefits of Nanotechnology

Whose Eyes Are You Going to Believe? An Empirical Examination of Scott v. Harris

Cultural Cognition and Public Policy

Culture, Cognition, and Consent: Who Perceives What, and Why, in "Acquaintance Rape" Cases

Culture and Identity-Protective Cognition: Explaining the White Male Effect

Fear of Democracy: A Cultural Evaluation of Sunstein on Risk

Cultural Cognition as a Conception of the Cultural Theory of Risk

« Weekend update: What's *your* favorite "p-value/NHST fallacy" paper? | Main | What's worse? Macedonian "fake news" or Russian distortions of social proof? »
Tuesday
Mar132018

Do Type S and Type M errors reflect confirmation bias?!

 0. What's this about?

I’ve written a number of posts on the value of Bayesian likelihood ratios (a heuristic cousin of the “Bayes Factor”) as an “evidentiary weight” statistic generally, and its value in particular as a remedy for the inferential barrenness of p-values and related statistics used to implement “null hypothesis testing.”

In this post, I want to call attention to another virtue of using likelihood ratios: the contribution they can make to protecting against the type 1 error risk associated with underpowered studies. Indeed, I’m going to try to make the case for using LRs for this purpose instead of a method proposed by stats legend & former Freud expert Andrew Gelman (Gelman & Carlin 2014).

As admittedly elegant, and as admittedly valuable they have been in making people aware of click here to see this kind of amazing stuff!a serious problem, G&C’s statistical indexes inject a form of confirmation bias into the practical assessment of the weight to be afforded empirical studies. Using LRs to expose the “type 1” error risk associated with underpowered studies avoids that.

Or at least that’s what I think. I must be crazy, huh?

CONTINUE--IF YOU DARE!

PrintView Printer Friendly Version

EmailEmail Article to Friend

Reader Comments (14)

Dan:

Yes, type M and type S errors are more of a demonstration than anything else. Or, to put it another way, they're a way to evaluate other people's published inferences, but I don't see them as directly useful in making inferences for a new study. For that, I'd prefer a Bayesian approach. One reason that I've generally presented type M and S errors non-Bayesianly is in order to communicate with all the researchers out there using non-Bayesian methods.

If I say, I don't trust your results because they're purely data-based and my prior dominates your likelihood, I'm afraid they'll just say: Who cares? But if I say, I don't trust your results because you're following a procedure which has poor frequency properties and is hugely biased, then . . . OK, they'll still just say: Who cares? At least that economist who did that Jamaica study never ever ever ever responded to any of my emails on the topic. But maybe someone, somewhere, might listen. Maybe a law professor somewhere? Who knows.

Anyway, I'd rather do my inferences Bayesianly and be explicit about my prior distribution. I find Type M and Type S errors as a useful way of understanding the problems that arise when people don't do that.

It's similar to my paper (with Hill and Yajima) on why we don't care about multiple comparisons. The quick answer: if you do an appropriate multilevel analysis, there's no reason to care about multiple comparisons. But if you do an analysis full of selection, then yeah, you need to care.

March 13, 2018 | Unregistered CommenterAndrew Gelman

"The occasion for G&C’s very important paper, and Gelman’s campaign generally to alert people to the Type 1 error risk associated with small sample size, is the inferential deficiency of NHT."

Well I don't know. I agree null hypothesis testing has its problems, but I'm not sure this is one of them. I think the problem here is that people really don't understand what NHT was originally intended to achieve.

The two parts of this are:

"So if we conclude that there’s no meaningful relationship between x and y based on a small sample study, we are at risk of making a mistake."

and

"Only those who confuse “significant at p< 0.05” for either “this is the ‘true’ effect size” or “x therefore is the cause of y” would ever make the mistake..."

The first is the classic error of thinking that failure to reject the null means acceptance of the null. That's completely wrong. The point is that the null is the hypothesis you are trying to disprove. If the null is not rejected, you conclude nothing. The experiment tells you nothing at all about whether the null is true. That would be the fallacy of confirming the consequent.

Conversely, rejecting the null does not imply the alternative hypothesis is true. Firstly, this would again be a case of confirming the consequent, and secondly, the 95% significance test is only supposed to say "this is enough evidence to be worth paying attention to", not "this is sufficient for belief".

I think the confirming the consequent fallacy is one of the most common issues with science. This is the reasoning that goes: A implies B, B is true, therefore A is true.

Thus, you can say: if the null hypothesis is true then the observation will probably be lower than X, the observation is seen to be lower than X, therefore the null is probably true.

You can also say: If the alternative hypothesis is true then the observation will probably be above X, the observation is seen to be above X, therefore the alternative is probably true.

You don't need Gelman's sophisticated examples to see these are both wrong.

I think Gelman's point is not specific to NHT, but to any sort of probabilistic evidence. Any probabilistic test with non-zero failure rates can give the wrong answer by chance. The less resolution the test has, the bigger those errors will be. If you say it's probably within an inch and get unlucky, it will be more than an inch away. If you can only say it's probably within a mile and get unlucky, it will be more than a mile away. Less sensitive measurements have bigger errors. It's true enough, but shouldn't be news to anyone.

The null hypothesis test is really a sort of approximated Bayesian test for when you don't have a complete model of the likelihoods. You're trying to disprove a specific hypothesis H that makes specific predictions P(O|H), but you have no idea how to calculate P(O|~H) because ~H could include anything. Fit a polynomial to your experimental output, and you have a putative hypothesis that would predict the results exactly.

You don't have a complete and exhaustive set of hypotheses and likelihood functions with which to calculate likelihood ratios. That's precisely what you're trying to develop. What's the probability of seeing outcome O if your only hypothesis is wrong? What does the Unknown contain?

So given that you have no idea how to calculate the probability of observations under the most general alternative, you just shrug and pick a fixed value for all possible outcomes. It's the 'Principle of Indifference', just applied to the likelihood function rather than the prior.

At which point, the P(O|~H) factors out of the likelihood ratio, and the LR is simply proportional to 1/P(O|H). The evidence for rejecting the null hypothesis P(O|~H)/P(O|H) is proportional to how unlikely the observation would be under the null hypothesis. NHT is a Bayesian method.

The problems with NHT are basically the same as the problems with the Indifference Principle. It goes badly wrong when the alternative is actually already known to be very non-uniform, and when there are no unambiguously-defined uniform distributions, as in Bertrand's Paradox. And of course you still get all the same problems inherited from Bayesian methods, like the problem of picking priors, and the reliance on correct likelihood functions.


And this is where we come to the problem with:

"To avoid confirmation bias, we need to determine the LR or weight to assign evidence on the basis of criteria independent of our priors."

The problem is that we calculate LRs with models of the world that are inevitably derived from prior experience.

In a world where sources of evidence can be unreliable, we would not be making best use of the information if we didn't account for the source's credibility. Even the evidence of our own eyes can be suspect, as stage magicians demonstrate. When watching the show, you have to take account of your prior knowledge of magicians in evaluating the evidence you see.

In the Bayesian framework, your hypothesis has two parts: prior credibility of the claim, and prior credibility of the source of evidence. When you see the evidence, you have to consider four alternatives: claim true, source reliable; claim true, source unreliable; claim false, source reliable; and claim false, source unreliable. You have to update your belief in all four, and extraordinary claims may lead you to discount the source's reliability far more than it increases your belief in the claim.

My prior belief is that it's impossible to saw a woman in half and put her back together again, and that stage magicians are unreliable. If I see an especially convincing performance of the illusion, I upgrade my belief in his unreliability - he's evidently a *very* good illusionist!

So the question is, how in the Bayesian framework should a stage magician persuade people that he really *can* magically bisect and re-assemble a woman? Are people not "locked in" to their opinions, once he's been identified as an illusionist? Is it right that it is easier for him to convince people who don't know that a priori? That different people with different prior knowledge weigh the evidence of his performance differently?

March 16, 2018 | Unregistered CommenterNiV

How much do academic publishers matter? Spoiler alert - not enough to disconfirm the null hypothesis:

https://www.techdirt.com/articles/20180308/03225939387/research-shows-that-published-versions-papers-costly-academic-titles-add-almost-nothing-to-freely-available-preprints-they-are.shtml

March 16, 2018 | Unregistered CommenterJonathan

So reinforcing to see someone else connects Piaget to the climate wars. That much the better that he's a really smart dude - which goes to show with me that...blind squirrels and stopped clocks have some probationer value:


https://www.perspectiva-insideout.com/home/the-unrecognised-genius-of-jean-piaget

March 17, 2018 | Unregistered CommenterJoshua

Geeze!!! Have some PROBATIVE value....

March 17, 2018 | Unregistered CommenterJoshua

Joshua,

I'm afraid I skimmed over a lot of the middle part of that essay, where he was talking about "life fields" and eastern philosophy and our panorganic conversation with the energy field, so I may be missing something - but what was his actual point? Could you give me the TL;DR summary of what he's trying to say?

The only part relevant to the climate communication issue I noticed was this bit:

"When we learn in climate communication studies that people on the right of the political spectrum are more likely to deny or express scepticism towards anthropogenic climate change, in Piagetian terms that’s often because they can’t readily assimilate it within their existing schemas, and are not sufficiently motivated to accommodate the information by creating new ones."

We can of course reword this to an equivalent statement:

"When we learn in climate communication studies that people on the left of the political spectrum are more likely to deny or express scepticism towards anthropogenic climate change scepticism, in Piagetian terms that’s often because they can’t readily assimilate it within their existing schemas, and are not sufficiently motivated to accommodate the information by creating new ones."

But that seems a minor insight to have required such a long and convoluted essay to explain. People tend to disagree with claims that conflict with their existing belief system, and can't be bothered to change their belief system to accommodate it.

Besides noting that the inability to see that the two paragraphs above are logically equivalent is an example of a failure to achieve Piaget's final stage of child development (abstraction, self-reflection, thinking about thinking, etc.), I don't see the connection. Or what he's proposing to do about it.

There's also this:
"Those subject to a particular view or worldview cannot always grasp why other views or worldviews need to be accommodated (!)."

But that seems to be saying the same thing. Do ACC believers not grasp why climate sceptic views need to be accommodated?! Is he even thinking of it that way round? And if he is, what is he proposing to do about it?

Would you care to enlighten me?

March 18, 2018 | Unregistered CommenterNiV

NiV -

Truth told, I excitedly posted the link before I had read much of the article. I have to say upon reading the whole thing, I did find it rather rambling and somewhat incoherent. I hope that the promised subsequent piece will be more focused.

I don't disagree with your flip of the script (I mentally conducted a similar exercise myself when I read the piece).

For me, the bottom line is the crossover between Piaget's insight into how people learn and the climate wars - one part of which is the basic construct of accommodation and assimilation - which I think have very broad implications... and so I disagree with your assessment of the "only" part that has to do with climate communication - although that may well be the only segment where that phrase is used.

I certainly don't consider that to be a "minor" insight.

One of the criticisms that I have with most of what I read related to climate communication or "the science of science communication," based in my background as an educator, is that isn't well-grounded in the fundamentals of epistemology, and developmental and educational psychology.

I came to the Rowson piece by way of this discussion he had with Peterson:

https://youtu.be/lSmVdGmrQ6U

The subject of climate change comes up directly a couple of times in that discussion.

At one point Rowson asks a question related to addressing the problem of climate change, and Peterson’s first response is to ask whether Rowson drove to the interview. Rowson says that he took public transportation, to which Peterson is silenced just a bit (very unusual, indeed, for him to be silenced, as indeed it is unusual that he takes time away from self-congratulatory just-so storifying to actually ask a question of someone else), giggles, looks to the heavens with great glee as he gathers his response…but Rowson takes the opportunity of Peterson's retooling to move the discussion in a more productive direction.

Looking beyond the aborted Tu Quoque/ad hom aspects of Peterson’s gambit, it’s instructive that initially, Peterson has to reformulate new line of attack. Despite the irrelevancy (IMO) of the question Peterson asked, Rowson being able to say that he took public transportation is effective within the small frame rhetorical arena. Irrespective of whether being able to justify his own behavior reflects a positive dynamic that’s generalizable to the larger framework of climate change dialog, it does seem that being able to do so creates a rhetorical open space for a moment in time. I would question the generalizability, of course. But there’s little doubt that Jordan’s gambit is a “common sense” one, that is and will be repeated endlessly – whether just as a product of “common sense” logic or as a product of rhetorical gamesmanship and identity aggressive behaviors. Thus, it may not matter that Peterson's gambit reflects fallacious thinking – because fallacious thinking can nonetheless have a rhetorical practicality.

It is also interesting that later in the discussion, Jordan plays the teh modulz gambit. (To his credit, once again, Rowson does not get distracted). It seems that Jordan is pretty well centered in the “skeptic” camp – which, of course, relies on a criticism of teh modulz, even though he plays the “I’m not an expert” gambit along with the “no one understands climate change” gambit. .

After watching the video, I wondered if Peterson might, afterwards, re-watch it himself and reflect on the feedback he was given from someone, who for all appearances (as least as I could see) was acting in good faith as an interlocutor. If Peterson does value humility (as he preaches for others), and indeed is motivated by good faith, I would think that he might benefit from doing so.

At any rate, Peterson also has praise for Piaget in that interview. An interesting connection. And so, for me, the part of the Roswon piece that most connected was the following:

This is deep. Kegan takes Piaget’s intellectual background as a biologist and naturalist seriously. He is interpreting Piaget as saying something like: It’s not that the world is comprised of things and contexts and they change each other; it’s that the fundamental thing is not a thing at all — it’s a process defined by the relationship between thing and context, and that relationship is primary and the relational process is always in motion. Such a viewpoint is of course axiomatic to much of eastern philosophy and systems theory.

There is a lot of crossover there with the parts of Peterson's schtick that I agree with..and, IMO, with important implications to "climate communication." It relates, directly, so some of the stuff I've rambled on here about - related to the tension and balance between archetypal opposites.

March 18, 2018 | Unregistered CommenterJoshua

@NiV--

thanks for the reflective comments.

You say--

the problem is that we calculate LRs with models of the world that are inevitably derived from prior experience.

True, but I think you are using "derived from prior experience" in a manner that elides the distinction between past experience reflected in Bayesian prior and past experience reflected in the derivation of the likelihood ratio.

Consider:

1. A sound-reasoning law enforcement officer views your erratic driving and concludes, on the basis of experience watching drunk drivers drive, that you might be intoxicated. When she pulls you over, she administers a "breathalizer," which by experience she views as reliable, that indicates your blood alcohol level is above the legal limit. She updates her belief you are drunk by a factor that represents the accuracy of the test result in detecting intoxication (true positive divided by false-positive error rate).

No problem w/ that: her likelihood ratio for the test and her prior that you were intoxicated are both "derived from past experience" -- but the "past experience" of how drunk drivers drive is not the same "past experience" that informs evidentiary weight (likelihood ratio) of the breathalyzer.

2. A police officer's "drug sniffing" dog alerts when it gets a sniff of your suitcase. The officer updates her belief that your bag contains drugs by a factor that represents the likelihood ratio for the dog alerting. She then removes a sample of powder from the the bags of white powder in your luggage & administers a field test that has a known accuracy rate (i.e., a likelihood ratio) & based on result (positive or negative) appropriately updates her belief that you are carrying drugs.

Again, no problem. The validity & reliability of the drug-sniffing dog, on one hand, and of the substance analyzer, on other, are based on "past experiences" -- but not the same ones.

March 18, 2018 | Registered CommenterDan Kahan

Jonathan -

Perhaps some hard evidence (indirectly) in support of a genetically-based asymmetry?

https://www.theguardian.com/news/2018/mar/17/data-war-whistleblower-christopher-wylie-faceook-nix-bannon-trump

March 18, 2018 | Unregistered CommenterJoshua

Joshua,

Thanks for that. I think I understand. "it’s that the fundamental thing is not a thing at all — it’s a process defined by the relationship between thing and context, and that relationship is primary and the relational process is always in motion."
It sounds like he's using some sort of object-relationship duality to seek an alternative viewpoint on the familiar. I'm not sure I have the patience to pick through the rest of the essay, but I agree there's a deeper insight there than I had previously seen.

--

Dan,

Thanks for the response, but I don't understand the distinction you're making.

"but the "past experience" of how drunk drivers drive is not the same "past experience" that informs evidentiary weight (likelihood ratio) of the breathalyzer."

So far as I can see, both are likelihood ratios derieved from past experience in the same way.

When the car is first seen, you have a prior expectation based on how many drunk drivers are around generally. The observation of erratic driving adds evidence based on the probability of erratic driving when drunk, versus the probability of erratic driving under any other circumstance. Maybe they're having a brain hemorrhage. Maybe they're having an emotional breakdown, following some family trauma. Maybe there's an angry wasp buzzing around the car, and they have a phobia/allergy. There are all sorts of alternatives. But they all have probabilities which we estimate based on our prior experience.

If one police officer has past experience of numerous erratic driving cases that turned out to be due to some alternative cause, they may weight the evidence differently to someone who only knows of (or is only looking for) the one possible cause. A police officer with experience of breathalyzers giving false readings due to faulty manufacturing, damage due to mis-use by other officers, miscalibration, mouth alcohol, or similar effects, may give a different assessment to one who believe's the manufacturer's advertising and who has no experience to the contrary. Experience counts.

Our model of the world is based on our prior experience (or that of the people who tell us things). Our model of the potential causes of erratic driving depend on our experiences of the world - of how people usually behave on the roads, and the sort of medical and psychological issues that can arise in people. Our model of potential causes of high breathalyzer readings depends on our experiences using them, or the knowledge we've picked up from other people using them, of how they work, and of how they're used in practice. And people with different experiences of the world, or different social networks, can legitimately make different assessments of those probabilities.

Likelihoods are basically just conditional probabilities, which are themselves just ratios of probabilities. (e.g. P(erratic|sober) = P(erratic and sober)/P(sober).) And all our pre-existing estimates of probability are priors based on learnt experience. We're born with none of them.

So I'm not sure what distinction you're making. Maybe you mean intuitive and subconscious estimates versus quantified experiment? Personal experience versus communicated by others? Controlled experimentation versus real-life randomness? Or do you really just mean that they're applying the same LR method but based on different sets of experiences?

March 18, 2018 | Unregistered CommenterNiV

An interesting alternative framework for considering the phenomenon of "motivated reasoning": cognitive empathy."

https://theintercept.com/2018/03/17/new-york-times-iran-israel-washington-think-tanks/

March 18, 2018 | Unregistered CommenterJoshua

@NiV-- then maybe I misunderstood the language I quoted from your comment. I understood you to be saying that there is no escaping the endogeneity of prior & likelihood ratio given that "we calculate LRs with models of the world that are inevitably derived from prior experience." As long as the "prior experience" we use to calculate LR is not the same experience that informs our prior, we can use Bayesian updating w/o the endogeneity. That was point of my examples. But if you didn't mean what I thought, then the examples may well be unresponsive to your point.

March 19, 2018 | Registered CommenterDan Kahan

Joshua,

Longitudinal study alert:

https://www.psychologicalscience.org/news/releases/adults-political-leanings-linked-with-early-personality-traits.html

non-paywall version:

https://pure.royalholloway.ac.uk/portal/files/28838166/Accepted_version.pdf

March 19, 2018 | Unregistered CommenterJonathan

"As long as the "prior experience" we use to calculate LR is not the same experience that informs our prior, we can use Bayesian updating w/o the endogeneity."

Ah! I see what you mean now. Agreed. Priors must be by definition derived prior to the current observation. There can be no inferential circularity!

But I'd add an important caveat to that - it's possible to take a single data set and apply a chain of several Bayesian inferences derived from it, each posterior being the prior to the next step in the chain, each asking different questions of the same data. Thus I could take a long record of coin tosses and first derive evidence from it that the coin is fair, then use that assumption to derive likelihood ratios to assess the evidence for more complicated claims based on the same dataset. The claims being assessed cannot be circular, but there's nothing to say the data/experience used for each step has to be separate. (Although it obviously helps to avoid the possibility of more subtle dependence problems if it is.)

Bayesian Belief Networks are required to be directed acyclic graphs, and each inference node must be conditionally independent of its non-descendants given its parents (i.e. locally Markov). The acyclic property prevents circularity. Conditional independence ensures information is not double-counted on parallel paths.

March 19, 2018 | Unregistered CommenterNiV

PostPost a New Comment

Enter your information below to add a new comment.

My response is on my own website »
Author Email (optional):
Author URL (optional):
Post:
 
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>