Let’s keep discussing M Turk sample validity

Lots of great discussion on “Fooled twice, shame on who?,” part 2 of the 2-part set on validity of M Turk samples for study of individual differences in cognition rooted in ideological, cultural & like dispositions.  Indeed, some of the discussion appears over at Stats Legend Andrew Gelman’s Legendary Statistical Modeling, Causal Inference, and Social Science blog.

The comments make for more interesting reading than anything I would have to say today, and maybe others will want to add to them.

But here are some of the interesting points that have come up & that have furnished me w/ reason to reflect on whether & how what I had to say warrants refinement, qualification, revision etc:

1. Contextualization

I wanted to be clear that the sort of “sample validity” issue I was raising about M Turk was specific to study of a particular class of psychological dynamics—the ones that I myself am most interested in—involving the interaction of critical thinking dispositions and the sort of group commitments that typically are assessed with measures of ideology, cultural worldviews & the like. That was why I broke my discussion into two posts, the first of which stressed that “sample validity” shouldn’t be identified with some checklist of abstract properties like “representativeness” but instead addressed in a fine-grained manner aimed at determining whether subjects selected in a particular fashion support reliable and valid inferences about the psychological dynamics being investigated.

But I’m convinced I didn’t do a good enough job on this.

Part of what made me realize that was a comment by Hal Pashler in the discussion at Statistical Modeling, Causal Inference. Pashler argued convincingly that researchers had through careful testing confirmed the validity of M Turk samples for a range of fundamental cognitive dynamics (primarily ones involving rapid, automatic processing of visual stimuli).

I fully accept this and agree with the overall thrust of Pashler’s comment! But the need for him to make it (in part in response to the course of the discussion at the SMCI blog) was proof to me that I had failed—in part by having neglected to identify dynamics that differ in relevant respects from the one I was focusing on (again, the influence of group values in assessment of evidence on societal risks & related policy-relevant facts) & that as a result might well admit of valid study w/ M Turk samples.

So: avoid generalization; determine “sample validity” by focusing on the particular issues relevant to determining whether reliable, valid inferences can be drawn from any given sample about the psychological dynamic under investigation; and recognize, then, that M Turk samples might be “valid” for some purposes and not others.  Check!

2. Validation of “partisan typicality”

One of the main reasons I don’t regard M Turk samples as valid for studying individual differences in cognition related to ideology is that I think there is reason to believe the self-described “conservatives” who are participating in M Turk samples are not typical of self-described conservatives in the general population.

Solomon Messing convincingly pointed out that the way to address this is to look at studies that address exactly that by comparing how MT subjects respond to questions in relation to how ones included in familiar samples such as those in American National Election Studies surveys—and he cited studies that do exactly that (here & here).

He’s right; I’m eager to read those papers.

Jarret Crawford amplified this point, referring to studies he’s done (here & here; I have read those; they are excellent & reflect ingenious designs; I’ve been meaning to run a blog post on them!) that furnish evidence of the “symmetry” of motivated reasoning in conservatives & liberals, a convergence with non-MT sample studies that ought to give us more confidence in MT samples (provided, of course, the designs of the studies are valid).

I have a hunch that the Messing & Crawford responses demonstrate that even in assessing the validity of M Turk for studying public opinion & political partisanship, one needs to be very precise about the fit between MT samples and the kinds of hypotheses being tested.  But in any case, they show I need to think more.  Good.

3. “Fixing” M Turk

Messing also discusses the possibility that the defects in M Turk samples might be “fixed” with some appropriate types of protocols, a matter that Chandler, Mueller & Paolacci address in their new study.

This is indeed a point that merits further discussion.  As I suggested in some of my own responses, I think what CMP suggest needs to be done actually can’t be feasibly expected to happen.

In effect, to avoid the “repeat exposure” of MT subjects to cognitive-performance measures, there would have to be a “central registry” that would keep track of all the ID numbers of MT “workers” who have participated in social science studies and the measures that have been administered to them.

Who is going to set up this registry? Who will administer it? How will compliance of researchers with the registry be monitored and enforced?

Don’t look at Amazon! It’s not a survey firm & could care less about whether MT workers furnish a valid source of subjects for social science research or, if they do at t1, about making sure they continue to at t2, t3, . . . tn.

Even if we started the registry today, moreover, we still wouldn’t know whether the “newly registered” M Turk subjects hadn’t participated already in studies featuring CRT and other common measures.

And what do we do now, as we wait for such a registry to be created? Should researchers be continuing to use M Turk for studies featuring measures the validity of which is compromised by prior exposure? And should journals be continuing to accept such studies?

* * * *

So still plenty more to discuss! Add your own thoughts ( in the discussion thread following the “Fooled Twice” post)!

Leave a Comment