Undertheorized and unvalidated: Stocklmayer & Bryant vs. NSF Indicators “Science literacy” scale part I
The paper isn’t exactly hot off the press, but someone recently lowered my entropy by sending me a copy of Stocklmayer, S. M., & Bryant, C. Science and the Public—What should people know?, International Journal of Science Education, Part B, 2(1), 81-101 (2012).
The piece critiques the NSF’s Science Indicators “factual knowledge” questions.
As is well known to the 9.8 billion readers of this blog (we’re down another couple billion this month; the usual summer-holiday lull, I’m sure), the Indicators battery is pretty much the standard measure for public “science literacy.”
The NSF items figure prominently in the scholarly risk perception/science communication literature.
With modest additions and variations, they also furnish a benchmark for various governmental and other official and semi-official assessments of “science literacy” across nations and within particular ones over time.
I myself don’t think the Indicators battery is invalid or worthless or anything like that.
What exactly a public sicence comprehension scale should measure is itself a difficult and interesting question. But whatever answer one chooses, there is little reason to think the Indicators’ battery could be getting at that.
The Indicators battery seems to reduce “science literacy” to a sort of catechistic assimilation of propositions and principles: “The earth goes around the sun, not the other way ’round”[check]; “electrons are smaller that atoms” [check]; “antibiotics don’t kill viruses—they kill bacteria!,” [check!].
We might expect an individual equipped to reliably engage scientific knowledge in making personal life decisions, in carrying out responsibilities inside of a business or as part of a profession, in participating in democratic deliberations, or in enjoying contemplation of the astonishing discoveries human beings have made about the workings of nature will have become familiar with all or most of these propositions.
What does is a capacity—one consisting of the combination of knowledge, analytical skills, and intellectual dispositions necessary to acquire, recognize, and use pertient scientific or empirical information in specified contexts. It’s hardly obvious that a high score on the NSF’s “science literacy” test (the mean number of correct reponses in a general population sample is about 6 of 9) reliably measures any such capacity—and indeed no one to my knowledge has ever compiled evidence suggesting that it does.
This—with a lot more texture, nuance, and reflection blended in—is the basic thrust of the S&B paper.
The first part of S&B consists of a very detailed and engaging account of the pedigree and career of the Indictors’ factual-knowledge items (along with various closely related ones used to supplement them in large-scale recurring public data collections like the Eurobarometer).
What’s evident is how painfully innocent of psychometric and basic test theory this process has been.
The items, at least on S&B’s telling, seem to have been selected casually, more or less on the basis of the gut feelings and discussions of small groups of scientists and science authorities.
Aside from anodyne pronouncements on the importance of “public understanding of science” to “national prosperity,” “the quality of public and private decision-making,” and “enriching the life of the individual,” they made no real effort to articulate the ends served by public “science literacy.” As a result, they offered no cogent account of the sorts of knowledge, skills, dispositions, and the like that securing the same would entail.
Necessarily, too, they failed to identify the constructs—conceptual representations of particular skills and dispositions—an appropriately designed public science comprehension scale should measure.
Early developers of the scale reported Cronbach’s alpha and like descriptive statistics, and even performed factor analysis that lent support to the inference that the NSF “science literacy” scale was indeed measuring something.
But without any theoretical referent for what the scale was supposed to measure and why, there was necessarily no assurance that what was being measured by it was connected to even the thinly specified objectives the proponents of them had in mind.
So that’s the basic story of the first part of the S&B article; the last part consists in some related prescriptions.
Sensibly, S&B call for putting first things first: before developing a measure, one must thoughtfully (not breezily, superficially) address what the public needs to know and why: what elements of science comprehension are genuinely important in one or another of the contexts, to one or another of the roles and capacities, in which ordinary (nonexpert) members of the public make use of scientific information?
S&B suggest, again sensibly, that defensible answers to these questions will likely support what the Programme for International Student Assessment characterizes as an “assets-based model of knowledge” that emphasizes “the skills people bring to bear on scientific issues that they deal with in their daily lives.” (Actually, the disconnect between the study of public science comprehension and the vast research that informs standardized testing, which reflects an awe-inspiring level of psychometric sophistication, is really odd!)
Because no simple inventory of “factual knowledge” questions is likely to vouch for test takers’ possession of such a capacity, S&B propose simply throwing out the NSF Indicators battery rather than simply supplementing it (as has been proposed) with additional "factual knowledge" items on “topics of flight, pH, fish gills, lightning and thunder and so on.”
Frankly, I doubt that the Indicators battery will ever be scrapped. By virtue of sheer path dependence, the Indicators battery confers value as a common standard that could not easily, and certainly not quickly, be replaced.
In addition, there is a collective action problem: the cost of generating a superior, “assets-based” science comprehension measure—including not only the toil involved in the unglamorous work of item development, but also the need to forgo participating instead in exchanges more central to the interest and attention of most scholars—would be borne entirely by those who create such a scale, while the benefits of a better measure would be enjoyed disproportionately by other scholars who’d then be able to use it.
As they investigate particular issues (e.g., the relationship between science comprehension and climate change polarization), scholars will likely find it useful to enrich the NSF Indicators batter through progressive additions and supplementations, particularly with items that are known to reliably measure the reasoning skills and dispositions necessary to recognize and make use of valid empirical information in everyday decisionmaking contexts.
That, anyway, is the sort of process I see myself as trying to contribute to by tooling around with and sharing information on an “Ordinary science intelligence” instrument for use in risk perception and science communication studies.
Even that process, though, won’t happen unless scholars and others interested in public science comprehension candidly acknowledge the sorts of criticisms S&B are making of Indicators battery; unless they have the sort of meaningful discussion S&B propose about who needs to know what about science and why; and unless scholars who use the Indicators battery in public science comprehension research explicitly address whether the battery can reasonably be understood to be measuring the forms of knowledge and types of reasoning dispositions on which their own analyses depend.
So I am really glad S&B wrote this article!
Nevertheless, “tomorrow,” I’ll talk about another part of the S&B piece—a survey they conducted of 500 scientists to whom they administered the Indicators’ “factual knowledge” items—that I think is very very cool but actually out of keeping with the central message of their paper!