Fooled twice, shame on who? Problems with Mechanical Turk study samples, part 2 - Cultural Cognition of Health

This is the second post in a two-part series on what I see as the invalidity of studies that use samples of Mechanical Turk workers to test hypotheses about cognition and political conflict over societal risks and other policy-relevant facts.

In the first , I discussed the concept of a “valid sample” generally. Basically, I argued that it’s a mistake to equate sample “validity” with any uniform standard or any single, invariant set of recruitment or stratification procedures.

Rather, the validity of the sample depends on one thing only: whether it supports valid and reliable inferences about the nature of the psychological processes under investigation.

College student samples are fine, e.g., if the dynamic being studied is reasonably understood to be uniform for all people.

A nonstratified general population sample will be perfectly okay for studying processes that vary among people of different characteristics so long as (1) there are enough individuals from subpopulations whose members differ in the relevant respect and (2) the recruitment procedure didn’t involve methods that might have either discouraged participation by typical members of those groups or unduly encouraged participation by atypical ones.

Indeed, a sample constructed by methods of recruitment and stratification designed to assure “national representativeness” might not be valid (or at least not support valid inferences) if the dynamic being studied varies across subgroups whose members aren’t represented in sufficient number to enable testing of hypotheses relating specifically to them.

Now I will explain why, on the basis of this pragmatic understanding of what sample validity consists in, MT samples aren’t valid for the study of culturally or ideologically grounded forms of “motivated reasoning” and like dynamics that it is reasonable to believe account for polarization over climate change, gun control, nuclear power, and other facts that admit of empirical study.

I don’t want to keep anybody in suspense (or make it necessary for busy people to deal with more background than they think they need or might already know), so I’ll just start by listing what I see as the three decisive “sample validity” problems here. I’ll then supply a bit more background—including a discussion of what Mechanical Turk is all about, and a review of how this service has been used by social scientists—before returning to the three validity issues, which I’ll then spell out in greater detail

Ready? Here are the three problems:

1. Selection bias. Given the types of tasks performed by MT workers, there’s good reason to suspect subjects recruited via MT differ in material ways from the people in the world whose dispositions we are interested in measuring, particularly conservative males.

2. Prior, repeated exposure to study measures. Many MT workers have participated multiple times in studies that use performance-based measures of cognition and have discussed among themselves what the answers are. Their scores are thus not valid.

3. MT subjects misrepresent their nationality. Some fraction of the MT work force participating in studies that are limited to “U.S. residents only” aren’t in fact U.S. residents, thereby defeating inferences about how psychological dynamics distinctive of U.S. citizens of diverse ideologies operate.

That’s the short answer. Now some more detail.

A . What is MT? To start, let’s briefly review what Mechanical Turk is—and thus who the subjects in studies that use MT samples are.

Operated by Amazon.com, MT is essentially an on-line labor market. Employers, who are known as “requesters,” post solicitations for paid work, which can be accepted by “workers,” using their own computers.

Pay is very modest: it is estimated that MT workers make about $1.50/hr.

The tasks they perform are varied: transcription, data entry, research, etc.

But MT is also a well-known instrument for engaging in on-line fraud.

MT workers get paid for writing fake product or service reviews—sometimes positive, sometimes negative, as the requester directs.

They can also garner a tiny wage for simply “clicking” on specified links in order to generate bogus web traffic at the behest of “requesters” who themselves have contracted to direct visitors to legitimate websites, who are in this case the victims of the scam.

Motivated Reasoning

Fooled twice, shame on who? Problems with Mechanical Turk study samples, part 2 - Cultural Cognition of Health

Key Insight