van der Linden, Leiserowitz, Feinberg & Maibach (2015) posted the data from their study purporting to show that subjects exposed to a scientific-consensus message “increased” their “key beliefs about climate change” and “in turn” their “support for public action” to mitigate it.
Christening this dynamic the "gateway belief" model, VLFM touted their results as “the strongest evidence to date” that “consensus messaging”— social-marketing campaigns that communicate scientific consensus on human-caused global warming—“is consequential.”
At the time they published the paper, I was critical because of the opacity of the paper’s discussion of its methods and the sparseness of the reporting of its results, which in any case seemed underwhelming—not nearly strong enough to support the strength of the inferences the authors were drawing.
But it turns out the paper has problems much more fundamental than that.
As I describe in my reanalysis, VLFM fail to report key study data necessary to evaluate their study hypotheses and announced conclusions.
Their experiment involved measuring the "before-and-after" responses of subjects who received a “consensus message”—one that advised them that “97% of climate scientists have concluded that human-caused climate change is happening”—and those who read only “distractor” news stories on things like a forthcoming animated Star Wars cartoon series.
In such a design, one compares the “before-after” response of the “treated” group to the “control,” to determine if the "treatment"—here the consensus message—had an effect that differed significantly from the control placebo. Indeed, VLFM explicitly state that their analyses “compared” the response of the consensus-message and control-group subjects
But it turns out that the only comparison VLFM made was between the groups' respective estimates of the percentage of climate-change scientists who subscribe to the consensus position. Subjects who read a statement that "97% of climate scientists have concluded that climate-change is happening" increased theirs more than did subjects who viewed only a distractor news story.
But remarkably VLFM nowhere report comparisons of the two groups' post-message responses to items measuring any of the beliefs and attitudes for which they conclude perceived scientific consensus as a critical "gateway" .
But when I analyzed the VLFM data, I realized that, with the exception of the difference in "estimated scientific consensus," all the "pre-" and "post-test" means in the table had combined the responses of consensus-message and control-group subjects.
There was no comparison of the pre- and post-message responses of the two group of subjects; no analysis of whether their responses differed--the key information necessary to assess the impact of being exposed to a consensus message.
Part of what made this even harder to discern is that VLFM presented a complicated “path diagram” that can be read to imply that exposure to a consensus message initiated a "cascade" (their words) of differences in before-and-after responses, ultimately leading to “increased support for public action”—their announced conclusion.
But this model also doesn't compare the responses of consensus-message and control-group subjects on any study measure except the one soliciting their estimates of the "percentage of scientists [who] have concluded that human-caused climate change is happening."
That variable is the only one connected by an arrow to the "treatment"--exposure to a consensus message.
As I explain in the paper, none of the other paths in the model distinguishes between the responses of subjects “treated” with a consensus message and those who got the "placebo" distractor news story. Accordingly, the "significant" coefficients in the path diagram reflect nothing more than correlations between variables one would expect to be highly correlated given the coherence of people’s beliefs and attitudes on climate change generally.
In the paper, I report the data necessary to genuinely compare the responses of the consensus-message and control-group subjects.
It turns out that, subjects exposed to a consensus message didn’t change their “belief in climate change” or their “support for public action to mitigate it” to an extent that significantly differed, statistically or practically, from the extent to which control subjects changed theirs.
Indeed, the modal and median effects of being exposed to the consensus message on the 101-point scales used by VLFM to measure "belief in climate change" and "support for action" to mitigate it were both zero--i.e., no difference in "after" or "before" responses to these study measures.
No one could have discerned that from the paper either, because VLFM didn't furnish any information on what the raw data looked like. In fact, both the consensus-message and placebo news-story subjects' '"before-message" responses were highly skewed in the direction of belief in climate change and support for action, suggesting something was seriously amiss with the sample, the measures, or both--all the more reason to give little weight to the the study results.
But if we do take the results at face value, the VLFM data turn out to be highly inconsistent with their announced conclusion that "belief in the scientific consensus functions as an initial ‘gateway’ to changes in key beliefs about climate change, which in turn, influence support for public action.”
The authors “experimentally manipulated” the expressed estimates of the percentage of scientists who subscribe to the consensus position on climate change.
Yet the subjects whose perceptions of scientific consensus were increased in this way did not change their level of "belief" in climate change, or their support for public action to mitigate it, to an extent that differed significantly, in practical or statistical terms, from subjects who read a "placebo" story about a Star Wars cartoon series.
That information, critical to weighing the strength of the evidence in the data, was simply not reported.
VLFM have since conducted an N = 6000 "replication." As I point out in the paper, "increasing sample" to "generate more statistically significant results" is recognized to be a bad research practice born of a bad convention--namely, null-hypothesis testing; when researchers resort to massive samples to invest minute effect sizes with statistical significance, "P values are not and should not be used to define moderators and mediators of treatment" (Kraemer, Wilson, & Fairburn 2002, p, 881). Bayes Factors or comparable statisics that measure the inferential weight of the data in relation to competing study hypotheses should be used instead (Kim & Je 2015; Raftery 1995). Reviewers will hopefully appreciate that.
But needless to say, doing another study to try to address lack of statistical power doesn't justify claiming to have found significant results in data in which they don't exist. VLFM claim that their data show that being exposed to a consensus message generated a “a significant increase” in “key beliefs about climate change” and in "support for public action" when “experimental consensus-message interventions were collapsed into a single ‘treatment’ category and subsequently compared to [a] ‘control’ group” (VLFM p. 4). The data -- which anyone can now inspect-- say otherwise.
Hopefully reviewers will pay more attention too to how a misspecified SEM model can conceal the absence of an experimental effect in a study design like the one reflected here (and in other "gateway belief" papers, it turns out...).
As any textbook will tell you, “it is the random assignment of the independent variable that validates the causal inferences such that X causes Y, not the simple drawing of an arrow going from X towards Y in the path diagram” (Wu & Zumbo 2007, p. 373). In order to infer that an experimental treatment affects an outcome variable, “there must be an overall treatment effect on the outcome variable”; likewise. in order to infer that an experimental treatment affects an outcome variable through its effect on a “mediator” variable, “there must be a treatment effect on the mediator” (Muller, Judd & Yzerbyt 2005, p. 853). Typically, such effects are modeled with predictors that reflect the “main effect of treatment, main effect of M [the mediator], [and] the interactive effect of M and treatment” on the outcome variable (Kraemer, Wilson, & Fairburn 2002, p, 878).
Because the VLFM structural equation model lacks such variables, there is nothing in it that measures the impact of being “treated” with a consensus message on any of the study’s key climate change belief and attitude measures. The model is thus misspecified, pure and simple.
To illustrate this point and underscore the reporting defects in this aspect of VLFM, I'll post "tomorrow" the results of a fun statistical simulation that helps to show how the misspecified VLFM model-- despite its fusillade of triple-asterisk-tipped arrows--is simply not capable of distinguishing the results a failed experiment from one that actually does support something like the “gateway model” they proposed.
BTW, I initiatlly brought all of these points to the attention of the PLOS One editorial office. On their advice, I posted a linke to my analyses in the comment section, after first soliciting a response from VLFM.
A lot of people are critical of PLOS ONE.
I think they are being unduly critical, frankly.
The mission of the journal--to create an outlet for all valid work-- is a valuable and admirable one.
Does PLOS ONE publish bad studies? Sure. But all journals do! If they want to make a convincing case, the PLOS ONE critics should present some genuine evidence on the relative incidence of invalid studies in PLOS ONE and other journals. I at least have no idea what such evidence would show.
But in any case, everyone knows that bad studies get published all the time-- including in the "premier" journals.
What happens next-- after a study that isn't good is published --actually matters a lot more.
In this regard, PLOS ONE is doing more than most social science journals, premier ones included, to assure the quality of the stock of knowledge that reserchers draw on.
The journal's "open data" policy and its online fora for scholarly criticsm and discussion supply scholars with extremely valuable resources for figuring out that a bad study is bad and for helping other scholars see that too.
If what's "bad" about a study is that the inferences its data support are just much weaker than the author or authors claim, other scholars will know to give the article less weight.
If the study suffers from some a serious flaw (like unreported material data or demonstrably incorrect forms of analysis), then the study is much more likely to get corrected or retracted than it would be if it managed to worm its way into a "premier" journal that lacked an open-data policy and a forum for online comments and criticism.
Peer review doesn't end when a paper is published. If anything, that's when it starts. PLOS ONE gets that.
I do have the impression that in the social sciences, at least, a lot of authors think they can dump low quality studies on PLOS ONE. But that's a reason to be mad at them, not the journal, which if treated appropriately by scholars can for sure help enlarge what we know about how the world works.
So don't complain about PLOS ONE. Use the procedures it has set up for post-publication peer review to make authors think twice before denigrating the journal's mission by polluting its pages with bull shit studies.
Kraemer, H.C., Wilson, G.T., Fairburn, C.G. & Agras, W.S. Mediators and moderators of treatment ef-fects in randomized clinical trials. Archives of general psychiatry 59, 877-883 (2002).
Muller, D., Judd, C.M. & Yzerbyt, V.Y. When moderation is mediated and mediation is moderated. Journal of personality and social psychology 89, 852 (2005).
van der Linden SL, Leiserowitz A.A., Feinberg G.D., Maibach E.W. The Scientific Consensus on Climate Change as a Gateway Belief: Experimental Evidence. PLoS ONE (2015), 10(2): e0118489. doi:10.1371/journal.pone.0118489.
Wu, A.D. & Zumbo, B.D. Understanding and Using Mediators and Moderators. Social Indicators Re-search 87, 367-392 (2007).