After a productive holiday weekend, I've whittled my "to be done ... IMMEDIATELY" list down to 4.3x10^6 items.
One of them (it's smack in the middle of the list) is to construct a "CCP data playground."
The idea would be to have a section of the site where people could ready access to CCP data files & share their own analyses of them.
I've had this notion in mind for a while but one of things that increased my motivation to actually get it done was the cool stuff that @thompn4 (aka "Nicholas Thompson"; aka "Nucky Thompson"; "aka "Nicky Scarface"; aka "'Let 'em eat yellowcake' Nicky" etc.) has been doing with graphics that try to squeeze three dimensions of individual difference -- either political outlooks vs. risk perception vs. science comprehension; or risk perception 1 vs. risk perception 2 vs. science comprehension -- into one figure.
I typically just rely on two figures to do this-- one (usually a scatterplot) that relates risk perceptions to political outlooks & another that relates risk perception to science comprehension separately for subjects to the "right" and "left" of the mean on a political outlook scale:
@thompn4 said: why not one figure w/ 3 dimensions?
That inspired me to produce this universally panned prototype of a 3d-scatter plot:
Since then he has come up with some more cool graphics:
This one effectively maps mean perceived level of risk across the two dimensional space created by political outlooks and science comprehension. It's a 2d graph, obviously, but conveys the third dimension, very vividly, by color coding the risk perceptions, and in a very intuitive way (from blue for "low/none" to "red" for "high").
It's pretty mesmerizing!
But does it convey information in an accessible and accurate way?
I think it comes pretty close. My main objection to it is that by saturating the entire surface of the 2-dimensional plane, the graphic creates the impression that one can draw inferences with equal confidence across the entire space.
In fact, science comprehension is normally distributed, and political outlooks, while not perfectly normal, are definitely not uniformly distributed across the right left spectrum. As a result, the corners--and certain other patches-- are thinly populated with actual observations. One could easily be lulled into drawing inferences from noise in places where the graph's colors reflect the responses of only a handful of respondents.
To illustrate this, I constructed scatterplot equivalents of these two @thompn4 graphics. Here's the one for nuclear:
Actually, I'm not sure why @thompn4's lower right corner is so darkly blue, or the coordinates at/around -1.0, -2.0 are so red. But I am sure that the eye-grabbing feature of those parts of his figure will understandably provoke reflection on the part of viewers about what's going on that could "explain" those regions. The answer has to be "nothing": the number of observations there -- basically people who are either extreme right & moderate left but utterly devoid of science comprehension-- are too few in number to draw any reliable inferences.
Here's global warming:
I don't see as much "risk" (as it were) of mistaken inferences here. Plus I really do think the bipolar red & blue, which get more pronounced as one moves up the science literacy axis, is extremely effective in conveying that climate change risk perceptions are both polarized and that they become dramatically more so as individuals become more science comprehending. (Kind of unfortunate that "red = high"/"blue = low" risk perception coding conflicts with the conventional "blue = Democrat" & "red = Republican" scheme; but the latter is lame-- we all know the Democrats are Reds!)
That's what the "2 graphic strategy" above shows, of course, but in 2 graphs; be great if this could be done with just one.
But I still think that it is essential for a graphic like this to convey the relative density of observations across the dimensions that are being compared.
The point of this exercise, in my view, is to see if there is a way to make it possible for a reflective, curious person to see meaningful contrasts of interest in the "raw data" (that is, in the actual observations, arrayed in relation to values of interest, as opposed to statistically derived summaries or estimates of the relationships in the data; those should be part of the analysis too, to discipline & refine inference, but being able to see the data should come first, so that consumers know that "findings" aren't being fabricated by statistical artifice!).
A picuture of the raw data would make the density of the observations at the coordinates of the 3 dimensions visible--and certainly has to avoid inviting foreseeable, mistaken inferences that neglect to take the non-uniform distribution of people across those dimensions into account.
I made a suggestion -- to try to substituting a "transparency" rendering of the scatter plot for the fully saturated rendering of the information in @thompn4's... Maybe he or someone else will try this or some variant thereof.
Loyal listener @NiV makes some suggestions, too, in the comment thread for the last post, and very generously supplies the R code he constructed, so that others can try their hand at refining it.
The bigger point-- or the one I started with at the beginning of this post -- is that this sort of interactive engagement with CCP data is really really cool & something that I'd love to try to make a regular part of this site.
The ideas blog readers have about how to analyze and report CCP data benefit me, that's for sure. The risk perception vs. ideology color-coded scatterplot, which I use a lot & know people really find (validly) informative, is (I've aknoweldged, but not as often as I should!) derived from a suggestion that "loyal listner" @FrankL actually proposed, and if Nucky's 3d (or 3 differences in 2 dimensions) graphic generates something that I think is even better, for sure I'll want to make use of it.
I think a "data playground" feature -- one the whole point of which is to let users do what @thompn4 has been up to-- would predictably increase that benefit, both for me & for others who can learn something from the data that I & my collaborators have a hand in collecting.
So I'm moving the creation of this sort of feature for the site up 7,000 places on my "to do ... IMMEDIATELY" list! Be sure to keep tuning in everyday so you don't miss the exciting news when the "playground" goes "on line" (of course it will be nuclear powered, in honor of @thompn4!).