Deja voodoo: the puzzling reemergence of invalid neuroscience methods in the study of "Democrat" & "Republican Brains"
I promised to answer someone who asked me what I think of Schreiber, D., Fonzo, G., Simmons, A.N., Dawes, C.T., Flagan, T., Fowler, J.H. & Paulus, M.P. Red Brain, Blue Brain: Evaluative Processes Differ in Democrats and Republicans, PLoS ONE 8, e52970 (2013).
The paper reports the results of an fMRI—“functional magnetic resonance imagining”— study that the authors describe as showing that “liberals and conservatives use different regions of the brain when they think about risk.”
They claim this finding is interesting, first, because, it “supports recent evidence that conservatives show greater sensitivity to threatening stimuli,” and, second, because it furnishes a predictive model of partisan self-identification that “significantly out-performs the longstanding parental model”—i.e., use of the partisan identification of individuals’ parents.
So what do I think? Not much, frankly.
Actually, I think less than that: the paper supplies zero reason to adjust any view I have—or anyone else does, in my opinion—on any matter relating to individual differences in cognition & ideology.
To explain why, some background is necessary.
About 4 years ago the burgeoning field of neuroimaging experienced a major crisis. Put bluntly, scores of researchers employing fMRI for psychological research were using patently invalid methods—ones the defects in which had nothing to do with the technology of fMRIs but rather with really simple, basic errors relating to causal inference.
1. Vul, E., Harris, C., Winkielman, P. & Pashler, H. Puzzlingly High Correlations in fMRI Studies of Emotion, Personality, and Social Cognition, Perspectives on Psychological Science 4, 274-290 (2009); and
The invalidity of the studies that used the offending procedures (ones identified by these authors through painstaking detective work, actually; the errors were hidden by the uninformative and opaque language then typically used to describe fMRI research methods) is at this point beyond any dispute.
Not all fMRI studies produced up to that time displayed these errors. For great ones, see any done (before and after the crisis) by Joshua Greene and his collaborators.
Today, moreover, authors of “neuroimaging” papers typically take pain to explain—very clearly—how the procedures they’ve used avoid the problems that were exposed by the Vul et al. and Kriegeskorte et al. critiques.
And again, to be super clear about this: these problems are not intrinsicto the use of fMRI imaging as a technique for testing hypotheses about mechanisms of cognition. They are a consequence of basic mistakes about when valid inferences can be drawn from empirical observation.
So it’s really downright weird to see these flaws in a manifestly uncorrected form in Schreiber et al.
I’ll go through the problems that Vul et al. & Kriegeskorte et al. (Vul & Kriegeskorte team up here) describe, each of which is present in Schreiber et al.
1. Opportunistic observation. In an fMRI, brain activation (in the form of blood flow) is measured within brain regions identified by little three dimensional cubes known as “voxels.” There are literally hundreds of thousandsof voxels in a fully imaged brain.
That means there are literally hundreds of thousands of potential “observations” in the brain of each study subject. Because there is constantly varying activation levels going on throughout the brain at all time, one can always find “statistically significant” correlations between stimuli and brain activation by chance.
This was amusingly illustrated by one researcher who, using then-existing fMRI methodological protocols, found the region that a salmon cleverly uses for interpreting human emotions. The salmon was dead. And the region it was using wasn’t even in its brain.
Accordingly, if one is going to use an fMRI to test hypotheses about the “region” of the brain involved in some cognitive function, one has to specifyin advance the “region of interest” (ROI) in the brain that is relevant to the study hypotheses. What’s more, one has to carefully constrain one’s collection of observations even from within that region—brain regions like the “amygdala” and “anterior cingulate cortex” themselves contain lots of voxels that will vary in activation level—and refrain from “fishing around” within ROIs for “significant effects.”
Schreiber et al. didn’t discipline their evidence-gathering in this way.
They did initially offer hypotheses based on four precisely defined brain ROIs in "the right amygdala, left insula, right entorhinal cortex, and anterior cingulate."
They picked these, they said, based on a 2011 paper (Kanai, R., Feilden, T., Firth, C. & Rees, G. Political Orientations Are Correlated with Brain Structure in Young Adults. Current Biology 21, 677-680 (2011)) that reported structural differences—ones, basically, in the size and shape, as opposed to activation—in theses regions of the brains of Republican and Democrats.
Schreiber et al. predicted that when Democrats and Republicans were exposed to risky stimuli, these regions of the brain would display varying functional levels of activation consistent with the inference that Repubicans respond with greater emotional resistance, Democrats with greater reflection. Such differences, moreover, could also then be used, Schreiber et al. wrote, to "dependably differentiate liberals and conservatives" with fMRI scans.
But contrary to their hypotheses, Schreiber et al. didn’t find any significant differences in the activation levels within the portions of either the amygdala or the anterior cingulate cortex singled out in the 2011 Kanai et al. paper. Nor did Schreiber et al. find any such differences in a host of other precisely defined areas (the "entorhinal cortex," "left insula," or "Right Entorhinal") that Kanai et al. identified as differeing structurally among Democrats and Republicans in ways that could suggest the hypothesized differences in cognition.
In response, Schreiber et al. simply widened the lens, as it were, of their observational camera to take in a wider expanse of the brain. “The analysis of the specific spheres [from Kanai et al.] did not appear statistically significant,” they explain,” so larger ROIs based on the anatomy were used next.”
Using this technique (which involves creating an “anatomical mask” of larger regions of the brain) to compensate for not finding significant results within more constrained ROI regions specified in advance amounts to a straightforward “fishing” expedition for “activated” voxels.
This is clearly, indisputably, undeniably not valid. Commenting on the inappropriateness of this technique, one commentator recently wrote that “this sounds like a remedial lesson in basic statistics but unfortunately it seems to be regularly forgotten by researchers in the field.”
Even after resorting to this device, Schreiber et al. found “no significant differences . . . in the anterior cingulate cortex,” but they did manage to find some "significant" differences among Democrats' and Republicans' brain activation levels in portions of the “right amygdala” and "insula."
2. “Double dipping.”Compounding the error of opportunistic observation, fMRI researchers—prior to 2009 at least—routinely engaged in a practice known as “double dipping.” After searching for & zeroing in on a set of “activated” voxels, the researches would then use those voxels and only those to perform statistical tests reported in their analyses.
This is obviously, manifestly unsound. It is akin to running an experiment, identifying the subjects who respond most intensely to the manipulation, and then reporting the effect of the manipulation only for them—ignoring subjects who didn’t respond or didn’t respond intensely.
Obviously, this approach grossly overstates the observed effect.
Despite this being understood since at least 2009 as unacceptable (actually, I have no idea why something this patently invalid appeared okay to fMRI researchers before then), Schreiber et al. did it. The “[o]nly activations within the areas of interest”—i.e., the expanded brain regions selected precisely because they contained voxel activations differing among Democrats and Republicans—that were “extracted and used for further analysis,” Schreiber et al. write, were the ones that “also satisfied the volume and voxel connection criteria” used to confirm the significance of those differences.
Vul called this technique “voodoo correlations” in a working paper version of his paper that got (deservedly) huge play in the press. He changed the title—but none of the analysis or conclusions in the final published version, which, as I said, now is understood to be 100% correct.
3. Retrodictive “predictive” models. Another abuse of statistics—one that clearly results in invalid inferences—is to deliberately fit a regression model to voxels selected for observation because they display the hypothesized relationship to some stimulus and then describe the model as a “predictive” one without in fact validating the model by using it to predict results on a different set of observations.
Vul et al. furnish a really great hypothetical illustration of this point, in which a stock market analyst correlates changes in the daily reported morning temperature of a specified weather station with daily changes in value for all the stocks listed on the NYSE, identifies the set of stocks whose daily price changes are highly correlated with the station's daily temperature changes, and then sells this “predictive model” to investors.
This is, of course, bogus: there will be some set of stocks from the vast number listed on the exchange that highly (and "significantly," of course) correlate with temperature changes through sheer chance. There’s no reason to expect the correlations to hold going forward—unless (at a minimum!) the analyst, after deriving the correlations in this completely ad hoc way, validates the model by showing that it continued to successfully predict stock performance thereafter.
Before 2009, many fMRI researchers engaged in analyses equivalent to what Vul describes. That is, they searched around within unconstrained regions of the brain for correlations with their outcome measures, formed tight “fitting” regressions to the observations, and then sold the results as proof of the mind-blowingly high “predictive” power of their models—without ever testing the models to see if they could in fact predict anything.
Schreiber et al. did this, too. As explained, they selected observations of activating “voxels” in the amygdala of Republican subjects precisely because those voxels—as opposed to others that Schreiber et al. then ignored in “further analysis”—were “activating” in the manner that they were searching for in a large expanse of the brain. They then reported the resulting high correlation between these observed voxel activations and Republican party self-identification as a test for “predicting” subjects’ party affiliations—one that “significantly out-performs the longstanding parental model, correctly predicting 82.9% of the observed choices of party.”
This is bogus. Unless one “use[s] an independent dataset” to validate the predictive power of “the selected . . .voxels” detected in this way, Kriegeskorte et al. explain in their Nature Neuroscience paper, no valid inferences can be drawn. None.
BTW, this isn’ta simple “multiple comparisons problem,” as some fMRI researchers seem to think. Pushing a button in one’s computer program to ramp up one’s “alpha” (the p-value threshold, essentially, used to avoid “type 1” errors) only means one has to search a bit harder; it still doesn’t make it any more valid to base inferences on “significant correlations” found only after deliberately searching for them within a collection of hundreds of thousands of observations.
The 2011 Kanai et al. structural imaging paper that Schreiber et al. claim to be furnishing “support” for didn’t make this elementary error. I’d say “to their credit,” except that such a comment would imply that researchers who use valid methods deserve “special” recognition. Of course, using valid methods isn’t something that makes a paper worthy of some special commendation—it’s normal, and indeed essential.
* * *
I did happen to notice that the Schreiber et al. paper seems pretty similar to a 2009 working paper they put out. The only difference appears to be an increase in the sample size from 54 to 82 subjects.
Also some differences in the reported findings: in their 2009 working paper, Schreiber et al. report greater “bilateralamygdala” activation in Republicans, not “right amygdala” only. The 2011 Kanai paper that Schreiber et al. describe their study as “supporting,” which of course was published after Schreiber et al. collected the data reported in their 2009 working paper, found no significant anatomical differences in the “left amygdala” of Democrats and Republicans.
So, like I said, I really don’t think much of the paper.
What do others think?