follow CCP

Recent blog entries
« ASTAR: bringing the culture of science to law--and the culture of law to science | Main | The aporetic judge »
Thursday
Oct042012

Graphing interactions so that curious people can actually *understand* them

A friend & collaborator asked me,

So...could you send me a quick tip/reference on how to best graph interactions in regression? I'm just thinking of simple line-charts, comparing divergent slopes for two or three different groups after controlling for the other vars in the equation. I'm *sure* this is easily done, but I'm blanking on how. I mean, it's easy enough to draw the slope based on the unstandardized coefficient. And the Y-intercept to start that line from is...what? the B of the constant?

My response:

I'm sure you are asking b/c you are unsatisfied, understandably in my view, w/ the graphing recommendations that appear in references like Aiken, L.S., West, S.G. & Reno, R.R. Multiple Regression: Testing and Interpreting Interactions. (Sage Publications, Newbury Park, Calif.; 1991) &  Jaccard, J. & Turrisi, R. Interaction Effects in Multiple Regression, Edn. 2nd. (Sage Publications, Thousand Oaks, Calif.; 2003) -- even though those are definitely the best references for understanding the statistical logic of interactions & making intelligent modeling choices.

There are excellent papers that reflect general disatisfaction w/ how social scientists tend to graphically report (or not) the results of multivariate regression models. They include:
  • Gelman, A., Pasarica, C. & Dodhia, R. Let's Practice What We Preach: Turning Tables into Graphs. Am Stat 56, 121-130 (2002).
  • King, G., Tomz, M. & Wittenberg., J. Making the Most of Statistical Analyses: Improving Interpretation and Presentation. Am. J. Pol. Sci 44, 347-361 (2000).
  • Kastellec, J.P. & Leoni, E.L. Using Graphs Instead of Tables in Political Science. Perspectives on Politics 5, 755-771 (2007).
They don't deal w/ interactions per se, but b/c they address the objective of how to make regression model results intelligible in general, you can easily derive from them ideas about strategies that work w/ models that include cross-product interaction terms.

I'll show you some examples below but here are some general tips I'd give: 

a. *don't* graph data after splitting sample (e.g., into "high," "medium" & "low" in political sophistication)... Graph the results of the model that includes all the relevant predictors & cross-product interaction terms as applied to the entire sample; those are the results you are trying to display & splitting sample will change/bias the parameter estimates.

b. consider z-score normalization for the outcome variable: you won't have to worry about the intercept (it should be zero, of course), you'll avoid lots of meaningless "white space" if values within of +/-1 or +/-2 SDs (the end points for y-axis) are concentrated within a middling portion of the  outcome measure. Also for most readers, reporting the impact in terms of SDs of the outcome variable will be more intelligible than differences in raw units of some arbitrary scale (the sort you'd get by summing the likert items to form a composite likert scale, e.g.)

c. rather than graphing *slopes*, consider plotting regression estimates based on sensibly contrasting values for the predictors (and corresponding values for the cross-product interaction term); the "practical effect" of the interaction is likely to be easier to grasp that way than comparison of visual differences in slopes

d. if you are using OLS to model responses to a likert item, consider using ordered logit instead -- maybe you should be doing this anyway, but in any case, probabilities of responding at particular level (or maybe range of levels; say "agree either slight, moderately or strongly vs disagres slighly, modreately, or strongly") conditional on levels of predictor & moderator are graphically more intelligible  than estimated values on an arbitrary continuous scale.

e. consider graphing estimated *differences* (& corresponding CIs) in the outcome variable at different levels of moderator; e.g, if difference increases between subjects who are from different groups (or who vary  +/- 1 SD on some continuous predictor) conditional on whether the value of some continuous moderator, then use bar graph w/ CIs or some such to show how much greater the estimated difference between the two groups is at the two levels of the moderator 

f. consider monte carlo simulation of estimated impact of contrasting sets of predictors & moderators (& associated interactions); do kernel-density plots for 1,000 or 2,000 values of each -- it's a *really* good way to show both the contrast in the estimates & the precision of the estimates (much better than standard CIs). See King et al. above 

g. usually prefer connected lines to bar graphs to display contrasts; former are more comprehensible

h. in general, don't use standardized regression coefficients but do center continuous predictors (or convert them to z-scores) so that people who are reading the table can more readily interpret them

Have attached [reproduced below] a bunch of CCP study examples that reflect one or another of these strategies or related ones. BTW, of course, all of these reflect things that I learned  to do from collaborating w/ Don [Braman], who like all great teachers teaches people how to teach themselves.

note: all examples below are clickable thumbnails that expand to larger size for closer inspection

 

 

 

PrintView Printer Friendly Version

EmailEmail Article to Friend

Reader Comments

There are no comments for this journal entry. To create a new comment, use the form below.

PostPost a New Comment

Enter your information below to add a new comment.

My response is on my own website »
Author Email (optional):
Author URL (optional):
Post:
 
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>