This post collects a bunch of graphic presentations of data from the latest CCP paper “Ideology” or “Situation Sense”? An Experimental Investigation of Motivated Reasoning and Professional Judgment.
Graphic presentation of data is the common carrier of reason in empirical studies. It’s what makes it possible for any curious, reflective person to critically engage the study findings independently of their facility with statistics.
In my opinion, scholars who rely on statistical analyses that are not accessible to all curious, reflective people are engaged in a species of intimidation, not communication. There’s a very high likelihood, too, that they themselves don’t really get what they are doing.
1. Showing the data. Pretty much always the first step in competent data reporting is to show the reader the “raw data.”
If someone has done a valid experiment to test some hypothesis, then he or she should be able to show readers—just by holding the data out in front of their eyes—that the experiment either supports or undermines that hypothesis.
The pont of applying a statistical model is to discipline and extend the infernce one is drawing from results one can actually see; it isn’t to magically cause to appear out of a mass of tangled observations a result that can’t otherwise be seen.
So how to “show the raw data”? This is not as straightforward as it sounds!
If one does an experiment, e.g., in which one posits that there will be an interaction between predictors (say, “cognitive reflection” and “political outlooks,” or "religiosity" & "science comprehension") that varies in relation to experimental treatments, one has to figure out a way to display the observations that makes that pattern (or the lack thereof) visible to the naked eye. That can be darn tricky! Maybe some sort of appropriately color-coded scatter plot will work, or, if there are too may observations or too many points of contrast to make that feasible, lowess regression lines will help.
“Showing the data” wasn’t very hard, though, in “Ideology” or “Situation Sense.”
The rival hypotheses in that paper had to do with the relative responsiveness of different subject types—judges, lawyers, law students, and members of the public—to experimental manipulations designed to trigger identity-protective reasoning in two statutory interpretation problems. As discussed previously, the manipulations altered the identities in a manner expected to generate this form of bias among egalitarian communitarians and hierarchical individualists in one case, and among egalitarian individualists and hierarchical communitarians in the other.
So the simple thing to do was just to show for each subject type (judge, lawyer, student, and member of the public) the impact of the experimental assignment on the proportion of subjects with the relevant worldview (determined by their score on the two worldview scales) who construed the statute to have been violated:
These results make it apparent that the experimental assignment affected the interpretations of members of the public of opposing worldviews, particularly in the “Littering” problem (which involved whether leaving reusable water containers in a desert constituted “discarding . . . deberis” in a wildlife preserve).
Judges and lawyers, in contrast, were not affected to a meaningful degree (or in the patterns suggestive of identity-protective cognition) in either the “Littering” problem or the “Disclosure” problem (the latter of which involved a statutory ambiguity relating to release of law enforcement investigatory information to a member of the public).
Law students were somewhere in between.
These results were consistent with the hypothesis that the sort of “professional judgment” lawyers and judges acquire through training and experience (and which law students possess in an incipient form) protect them from the impact of identity-protective cogntion in cases that predictably polarize culturally diverse members of the public.
2. Simulating the statistical model. The apparent corroboration of that hypothesis was probed more systamatically with a multivariate regression model designed to assess the respective impacts of subject-type, cultural worldview, and experimental assignment on the subjects’ responses.
One way in which the model enhances our insight relative to inspection of the raw data is by measuring the impact of cultural worldviews as continuous variables. So in addition to helping us overcome the anxiety that what looks like signal is just noise, the model measures the impact of the cultural worldviews in a manner more sensitive to the varying intensity of individuals’ commitments than does simply assigning individuals to “groups” based on their scores in relation to the means on the two scales.
Someone who understands multivariate regression analysis can, with patience and persistence, make sense of the coefficient and standard error for each predictor.
But even that person will not be able to assess from the face of the regression output what all this information signifies in relation to the study hypotheses.
Accordingly, a researcher who proclaims that his or her hypothesis is “confirmed” (or worse, “proven” etc.) by the signs and “statistical significance” of the regression model coeffricients (even one that is much simpler than this) is engaged in an embarrasing display of handwaiving (& someone who does that after reporting a pile of fit statistics of the sort associated with an ANOVA—ones that don’t convey anything about effect size --is not even getting that close to relating something of value).
The necessary information has to be extracted from the model by using it to genrate outcomes that reflect those combinations of predictor values relevant to the study hypotheses.
One way to do this is by monte carlo simulation. In effect, a monte carlo simulation uses the specified predictor values to estimate the outcome a zillion times (1000 times actually is sufficient), adding to each estimate a random increment calibrated to the measurement error of the relevant predictors.
In the end, one gets a bell-curved distribution of values that indicates the relative probability of outcomes associated with the specified combination of predictors. The most likely outcome is the mean one, at the peak of the curve; values progressively larger or smaller are progressively less likely. One can, if one wants, figure out the 0.95 CI by identifying the values at the 2.5th and 97.5th percentiles.
But the best thing about using monte carlo simulations (particularly for logistic regression, which estimates the probability of one outcome of a dichotomous variable) is that the resulting probability distributions can be graphically displayed in a manner that enables any reflective, curious person to see exactly what the model has to say about the inference one is using it to assess (King, Tomz & Wittenberg 2000).
Here, e.g., it can be seen, from how spaced out the probability distributions are, how unlikely it is that an egalitarian communitarian member of the public is to agree with a hierarchical individualist one in a particular version of the “Littering” problem—or with a member of the public who shares his or her values but who was assigned to the other version.
Likewise, it can be seen from how bunched together the probability distributions are just how low the probability is that judges of opposing worldviews are to disagree. Same for lawyers.
Again, students are in the middle.
One can also use the model to estimate the size of the differences in the impact of the experimental manipulation on various types of subjects, or the average impact on one or another subject across the two problems.
This is not only 10^9 times more informative for any curious, reasoning being—one who actually would like to think for him or herself than be told what to think by someone who probably doesn’t really know what he or she is doing—than being shown a regression output with a bunch of asterisks; it’s 10^6 more informative than being told “the effect is x%, p < 0.05,” and 10^3 more than being told “p%, ± q% at 0.95” (Gelman, Pasaria & Dodhia 2002).
3. Likelihood ratios. But in my view, the very best thing we did in in “Ideology” or “Situation Sense” was graphically display the likelihood ratios for opposing hypotheses relating to the effect of identity-protective cogntion on particular subject types.
I’ve already posted an excerpt from the paper that addresses what we were doing here.
But in sum, a likelihood ratio specifies how much more consistent a piece of evidence is with one hypothesis than another and is the factor in proportion to which one revises one’s assessment of the probability of that hypothesis under Bayes’s Theorem.
As such, it characterizes the weight of a piece of evidence—something that a p-value, contary to an obscenely prevalent misconception, does not do, no matter how friggin’ small it is (Good 1995).
Where one is doing an experiment or otherwise making an empirical estimate subject to measurement error, the likelihood ratio just is the relative probabilities of observing the experimental result under the relevant hypotheses (Goodman 1999a, 1999b, 2005).
One can visualize that by juxtaposing the probability distributions associated with the relevant hypotheses—and comparing how like the observed experimental result is under the respective distributions.
If we assume the distributions have the standard error (which determines the slope, basically, of the bell curve) as the experimental result, the ratio of the heights of the observed result on the two distributions is the likelihood ratio associated with the experiment for the rival hypotheses (Morey 2014).
In my view, researchers ought to convey the likelihood ratio or its conceptual equivalent. By doing that, they make it plain for the reader exactly what an empirical finding (if based on valid methods) truly is: not conclusive “proof” of any particular proposition, but evidence of some degree of probative force to be added, along with all the other evidence one has and ever will get one's hands on, to the scale one is using to weigh the relative strength of competing hypotheses.
The menagerie of fit statistics (p-values, chi-squares, omnibus F-statistics, etc) associated with conventional null hypothesis testing obscure that—indeed, necessarily fail to convey the information one would need to treat empirical data that way.
But even if a researcher is considerate and reflective enough to use a form of statistical analysis that yields the weight of the evidence, there is still the task of making that information comprehensible to the curious, reflective reader who is not trained in statistics.
Graphic display is the way to do that.
So, do you get it?
If not, and you are carrying through on your end of the bargain to apply your reason here (if, understandably, the discussion of the monte carlo simultions is too compact for you to fully grasp here, then go to the relevant discussion of them in the paper; for more on the logic of likelihood ratios and the graphic presentation of them, go back to the previous post; read all this closely, & think things through; you can’t learn anything if you don’t make the effort to teach yourself), then the inaccessibility of my statistics is my problem, not yours.
Tell me and I’ll try even harder.
Gelman, A., Pasarica, C. & Dodhia, R. Let's Practice What We Preach: Turning Tables into Graphs. Am Stat 56, 121-130 (2002).
Good, I.J. Weight of evidence: A brief survey. in Bayesian statistics 2: Proceedings of the Second Valencia International Meeting (ed. J.M. Bernardo, M.H. DeGroot, D.V. Lindley & A.F.M. Smith) 249-270 (Elsevier, North-Holland, 1985).
Goodman, S.N. Introduction to Bayesian methods I: measuring the strength of evidence. Clin Trials 2, 282 - 290 (2005).
Goodman, S.N. Towards Evidence-Based Medical Statistics. 1: The P Value Fallacy. Ann Int Med 130, 995 - 1004 (1999a).
Goodman, S.N. Toward evidence-based medical statistics. 2: The Bayes factor. Annals of internal medicine 130, 1005-1013 (1999b).
Kahan, D., Hoffman, D., Evans, D., Lucci, E., Devins, N., Cheng, K. “Ideology” or “Situation Sense”? An Experimental Investigation of Motivated Reasoning and Professional Judgment. Univ. Pa. L. Rev. (in press).
King, G., Tomz, M. & Wittenberg., J. Making the Most of Statistical Analyses: Improving Interpretation and Presentation. Am. J. Pol. Sci 44, 347-361 (2000).