I had some correspondence off-line with loyal listener @Steve (aka @sjgenco) about the classic "what does a valid measure of climate-change risk-perceptions look like graph?" Inspired by loyal listner @FrankL (now that they've finally discovered " missing Malaysia Airlines Flight MH370"--or at least a piece of it--maybe someone will find @FrankL, or at least a piece of him, too), the WDVMCCRLLG graphic has of course achieved iconic status and is pretty much ubiquitous in popular culture.
But it is pretty darn old. Isn't it time for something new? Can't we do better?
But everything, no matter how wonderful, admits of incremental improvement as human knowledge continues to expand as a result of science and improved sports drink formulas.
In response to @Steve's inquiry, I revealed the secret formula for generating the graphic. When Steve said he wasn't enamored of "jitters" as a way to handle overplotting & preferred "bubbles" scaled to reflect observation densities, I directed @Steve to a CCP dataset he could use (one posted with "codebook" the last time the CCP blog was the site for a furious display of graphic genius on the part of @thompn4) to perfect his own improvements.
Here's what he wrote back:
Hi Dan,I've been playing around with jitters in R. I like your Gervais jitters. Keeping the clouds more separate helps. That's harder to do when your x-var is continuous, like your libcon variable in your "challenge" dataset.Your dataset was like catnip so I've squandered a couple of days trying to brush up on my R to see if I could implement my bubble plot idea with your data. For what it's worth, I seem to have succeeded so I thought I'd forward my results. (I use RStudio, btw, I highly recommend it.)First, I was able to replicate your colored jitter charts in R (seems to require less code than in stata). Here's gwrisk by libcon (making the points 50% transparent also helps highlight the clustering imho):
When I figured out how to put bubbles representing the frequency of responses around each datapoint on the same plot, it looked like this:
It does show the densities nicely, I think. For comparison, here's the bubble plot for scicomp by gwrisk:
You can really see that scicomp clusters in the middle vs. libcon, and how those densities are going to generate a flat regression.You can also combine the two plots, which is kind of interesting:
Note how the jittering on libcon stretches out the values along the x-axis. There actually aren't any "real" values above 2 or below -2.I've attached a PPT with all my results, a commented R script for running the plots, and the Rdata image I created for inputting the data.It was a good excuse for digging into R again.
So what do people think? Time to retire WDVMCCRLLG? Time to adopt one of @Steve's alternatives as the new symbol of the Un-United States of Risk Perception?
Voice your opinoin --as with everything else relating to this blog, matters will be decided by a democratic vote of the site's 14 billion regular readers -- and by all means try your own hand at devising a graphic that conveys the information in WDVMCCRLLG in an even more compelling, cool way!
And if you want, you can go back to @thompn4's project to create the perfect 3D graphic presentation that incorporates in addition the impact of science comprehension in magnifying polarization over climate change risk.
I'd offer one of our standard CCP prizes, but obviously the fame of being the originator of the successor of WDVMCCRLLG is incentive enough!
@Steve has formulated some comments & additional cool graphics in response to the conversation. Here they are:
Using the square-root of the circle sizes, as @Paul Matthews suggests, does make the range of sizes less extreme. I think this is a good adjustment.
I agree that the rainbow color set is a bit "muppets", but I was trying to work in the established idiom. :)
There is a nice color palette package for R called RColorBrewer that provides a bunch of palettes to choose from, including sequential (light to dark), diverging (light in the middle, contrasting darks at the extremes), and qualitative (no sequencing implied, just sets of related colors). The graphs below use the qualitative palette "Set3".
On Paul's point about the arbitrary binning of the continuous libcon variable, that's definitely a trade-off. I think it is ameliorated if the raw data is displayed underneath the circles. I could also imagine the circles being extended into a confidence-range kind of overlay, creating a more continuous representation of the densities. As a general point, I think the arbitrary binning of the circles is less distorting of the underlying data than the jittering effect of extending the apparent range of the data points.
The basic idea behind the bubble graph is to emphasize the pattern of densities across the tables. Visually, it does this very well for my eye (even better with the sqrt transform), better than the jitter. It also has the virtue of continuing to tell its story when the image gets very small, and that is useful for eye-ball comparisons, such as seeing immediately how and where gwrisk is highly skewed across the libcon divide, while nukerisk is not:
Also, other things "jump out" in this depiction of the data. For example, you can easily see that people are much more willing to give gwrisk a "zero" than they are nukerisk. And you can also spot things like that little cluster at gwrisk=3 among the lower science comprehension folks. Perhaps "3" is a good compromise when you don't really have a good basis for an opinion?
One final note. I tried the sequential palettes, thinking they would make a good fit with the ordinal nature of the risk variables, but I found that the light colors toward the bottom tended to obscure the densities down there, compared to darker colors toward the top. This is especially true when the image is small:
Although I do like the "armageddon-like" quality of the "Reds" palette!