follow CCP

Recent blog entries
« Fooled twice, shame on who? Problems with Mechanical Turk study samples, part 2 | Main | No one is afraid but we can still learn a lot from studying nanotechnology risk perceptions »
Monday
Jul082013

What's a "valid" sample? Problems with Mechanical Turk study samples, part 1

It’s commonplace nowadays to see published psychology studies based on samples consisting of “workers” hired to participate in them via Amazon’s “Mechanical Turk,” a proprietary system that enables Amazon to collect a fee for brokering on-line employment relationships.

I’ve been trying to figure out for a while now what I think about this practice.

After considerable reading and thinking, I’ve concluded that “MT” samples are in fact a horribly defective basis for the study of the dynamics I myself am primarily interested in—namely, ones relating to how differences in group commitments interact with the cognitive processes that generate cultural or political polarization over societal risks and other facts that admit of scientific study.

I’m going to explain why, and in two posts.  To lay the groundwork for my assessment of the flaws in MT samples, this post will set out a very basic account of how to think about the “validity” of psychology samples generally.

Sometimes people hold forth on this as if sample validity were some disembodied essence that could be identified and assessed independently of the purpose of conducting a study. They say things like, “That study isn’t any good—it’s based on college students!” or make complex mathematics-pervaded arguments about “probability based stratification” of general population samples and so forth.

The reason to make empirical observations is to generate evidence that gives us more reason or less than we otherwise would have had to believe some proposition or set of propositions (the ones featured in the study hypotheses) about how the world works.

The validity of a study sample, then, depends entirely on whether it can support inferences of that sort. 

Imagine someone is studying some mental operation that he or she has reason to think is common to all people everywhere—say, “perceptual continuity,” which involves the sort of virtual, expectation-based processing of sensory stimuli that makes people shockingly oblivious to what seem like shockingly obvious but unexpected phenomena, like the sudden appearance of a gorilla among a group of basketball players or the sudden substitution of one person for another during a conversation between strangers.

Again, on the researcher's best understanding of the mechanisms involved, everyone everywhere is subject to this sort of effect, which reflects processes that are in effect “hard wired” and invariant.  If that’s so, then pretty much any group of people—so long as they haven’t suffered some sort of trauma that might change the operation of the relevant mental processes—will do.

So if a reasearcher wants to test whether a particular intervention—like telling people about this phenomenon—will help to counteract it, he or she can go ahead and test it on any group of normal people that researcher happens to have ready access to—like college undergraduates.

But now imagine that one is studying a phenomenon that one has good reason to believe will generate systematic differences among individuals identified with reference to certain specific characteristics. 

That’s true of “cultural cognition” and like forms of motivated reasoning that figure in the tendency of people to fit their assessments of information—from scientific “data” to expository arguments to the positions of putative experts to (again!) their own sense impressions—to positions on risk and like facts that dominate among members of their group.

Because the phenomenon involves individual differences, a sample that doesn’t contain the sorts of individuals who differ in the relevant respects won’t support reliable inferences.

E.g., there’s a decent amount of evidence that white males with hierarchic and individualistic values (or with “conservative” political orientations; cultural values and measures of political ideology or party affiliation are merely alternative indicators of the same latent disposition, although I happen to think cultural worldviews tend to work better) are motivated to be highly skeptical of environmental and technological risks. Such risk claims, this work suggests, are psychically threatening to such individuals, because their status and authority in society tends to be bound up with commercial and industrial activities that are being identified as dangerous, and worthy of regulation.

If one wants to investigate how a particular way of “framing” information might dissipate dismissiveness and promote open-minded engagement with evidence on climate change, then it makes no sense to test such a hypothesis on, say, predominantly female undergraduates attending a liberal east-coast university.  How they respond to the messages in question won’t generate reliable inferences about how white, hierarchical individualistic males will—and they are the group in the world that we have reason to believe is reacting in the most dismissive way to scientific evidence on climate change.

Obviously, this account of “sample validity” depends on one being right when one thinks one has “good reason to know” that the dynamics of interest are uniform across people or vary in specific ways across subpopulations of them.

But there’s no getting around that! If one uses a “representative” general population sample to study a phenomenon that in fact varies systematically across subpopulations, then the inferences one draws will also be faulty, unless one both tests for such individual differences and assures that the sample contains a sufficiently large number of the subpopulation members to enable detection of such effects. Indeed, to assure that there are enough of members of the subpopulations--particularly if one of them is small, like, say, a racial minority--is to oversample, generating a nonrepresentative sample!

The point is that the validity of a sample depends on its suitability for the inferences to be drawn about the dynamics in question.  That feature of a sample can’t be determined in the abstract, according to any set of mechanical criteria.  Rather it has to be assessed in a case-specific way, with the exercise of judgment. 

And like anything of that sort—or just anything that one investigates empirically—the conclusions one reaches will need to be treated as provisional insofar as later on someone might come along and show that the dynamics in question involved some feature that evaded detection with one’s sample, and thus undermines the inferences one drew.  Hey, that's just the way science works!

Maybe on this account Mechanical Turk samples are “valid” for studying some things.   

But I’m convinced they are not valid for the study of how cultural or ideological commitments influence motivated cognition: because of several problematic features of such samples, one cannot reliably infer from studies based on them how this dynamic will operate in the real world.

I’ll identify those problematic features of MT samples in part two of this series.

 

PrintView Printer Friendly Version

EmailEmail Article to Friend

Reader Comments (4)

That Gorilla video is my second most favorite video for demonstrating motivated reasoning.

Here's my all-time favorte:

http://www.youtube.com/watch?v=G-lN8vWm3m0

And along similar lines:

http://www.bostonmagazine.com/news/article/2013/06/25/emotions-facial-expressions-not-related/

July 8, 2013 | Unregistered CommenterJoshua

@Joshua:
It is pretty cool. But I'm not sure I'd call it motivated reasoning. There's not some objuective or goal independent of making sense of information that is causing the perceptual glitch.

July 8, 2013 | Unregistered Commenterdmk38

Dan - to me, it is like motivated reasoning in that it reflects our drive to make sense out of what we see by finding patterns and categorizing the information in ways that conforms to preconceptions. I believe that is one of the main mechanisms of motivated reasoning (in the sense that "motivation" does not necessarily mean conscious intent).

I think of how we use "see" in multiple ways - to describe visual perception but also cognitive perception as in "oh, I see what you mean now." Our preconceptions about how information fits into patterns "motivates" what we "see" in both meanings. I think of that as an objective or goal.

I thought the article about the researcher at Northeastern (my brother who is in signal processing there has collaborated with her) as connected - in that she also talks about recognizing emotions through pattern recognition - based on our experiences. Our experiences create patterns that "motivate" what we see when we interpret intrinsically ambiguous information.

July 8, 2013 | Unregistered CommenterJoshua

@Joshua:

I agree w/ all you say. But it is useful to be able to distinguish between (1) mistakes people who are motivated to get the right answer make b/c of their vulnerability to make faulty inferences from patterns & (2) beliefs people form b/c they are motivated to fit evidence to a conclusion that advances some goal or interest independent of getting the right answer. I suppose we could call both "motivated reasoning" & refer to the first as "type 1" & the second as "type 2" etc. But I think it is simpler just to keep "motivated reasoning" confined to 2 & call (1) what those who study it call it -- actually what Simons is observing isn't always called the same thing, but he doesn't call it motivated reasoning & I don't think anyone else does.

But whatever we call them, it's interesting to note that they can actually co-occur! Consider our study of the impact of cultural cognition on people's perceptions of the behavior of political protestors. I think it is plausible to surmise that the subjects' perception of "blocking" & "shouting in the face" vs. "chanting" & "persuading" etc reflected the sort of "filling in" that occurs as a result of the "virtual processing" mechanism that Simons studies. But clearly what people's virtual processing was filling the blanks on the perceptual screen *with* was shaped by the unconscious motivation of the subjects to observe in the film behavior that fit affirmed their cultural predispositions.

Of course, I'm just conjecturing here. I don't know what the "mechanism" underlying the "mechanism" of cultural cognitoin was in in this case. An alternative to what I just described would be a kind of post-observation reconstruction as subjects recall what they observed. This sort of dynamic has been studied too. One could conduct a follow up study & make appropriate observations designed to test competeting hypotheses about which of these processes -- & others too -- was at work.

July 10, 2013 | Unregistered Commenterdmk38

PostPost a New Comment

Enter your information below to add a new comment.

My response is on my own website »
Author Email (optional):
Author URL (optional):
Post:
 
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>