Civic Scientific Literacy: Criticisms and Questions

Note: I have joined the “virtual class” component of Dan Kahan‘s Science of Science Communication course at Yale University. As part of this I am endeavoring to write a response paper in reaction to each week’s set of readings. I will post these responses here on my blog – the second such response is below. The first is here.

I will also be participating in the discussion on Kahan’s own blog. (The discussion on session 2 is here.) Comments are welcome are either site.

By several accounts it seems we have Jon Miller and his colleagues to thank for bringing a certain amount of scientific rigor to the study of science literacy. According to Miller, his work was the first attempt to use item-response theory (IRT) technology to design reliable cross-national estimates of public scientific understanding. Pardo and Calvo write that Miller’s work, and that led by John Durant in Britain, built an empirical foundation for the field through use of “clearly specified dimensions and comparable questionnaires.” Indeed, Miller’s methods were adopted by the National Science Foundation for its science literacy surveys; and likewise his colleague Durant laid the groundwork for the European equivalent, the Eurobarometer.

Miller’s civic scientific literacy (CSL) measure of 1998, however, shows room for improvement in several areas – as does the Eurobarometer.

Pardo and Calvo’s criticisms

Pardo and Calvo in particular focus on the Eurobarometer’s sub-optimal reliability, demonstrated by a Cronbach’s alpha coefficient of 0.66, below the standard 70 cut-off point. True-false questions bear much of the responsibility for the test’s low reliability.

Pardo and Calvo argue that the overall difficulty of the test – and therefore its discriminatory power – is low. “If the test were applied in an educational context, almost all of the items in Q55 and Q56 could be answered correctly by most individuals in the population,” they write. It appears the test’s discriminatory power was strong enough to allow comparisons between countries, but within the more scientifically literate nation, the test is not finely calibrated enough to give insights into sub-populations.

These authors also posit that the test items are not a very representative sample of what constitutes a firm grounding in elementary science, because:

  • they pose questions about basics of scientific theories, alongside others on more specialized “or even esoteric” matters;
  • a majority of questions call on explicitly taught knowledge (memory recall), while two others require recalling and combining several pieces of knowledge;
  • the survey has a poor balance between different scientific subjects.

My criticisms and questions

Several other aspects of Miller’s (and in some cases, Durant’s) method jumped out at me as ripe for refinement or at least questionable, and I would be curious to hear others’ thoughts on this:

  • The 67 percent threshold (location or difficulty level): beyond the fact that this equals two-thirds, is it just arbitrary? Why is two-thirds a good level at which to peg the threshold? I’m tempted to say that with Miller’s questions being as elementary as they are, it’s very hard to think of someone who only gets two-thirds right as scientifically literate at all!
  • Miller doesn’t detail the model underpinning his confirmatory factor analysis. Now, CFA is a new concept for me, so correct me if I’ve got this wrong. His CFA reveals factor loadings that vary from 0.46 to 0.83 for the United States study, and from 0.34 to 0.70 for the European study. He says this process reveals nine items in each study that constitute “a unidimensional measure of construct vocabulary.” So he seems to be saying that these nine items do load on the factor of construct knowledge, i.e., they are good indicators of such knowledge. But what does this actually mean to the reader, without understanding how the loadings were calculated or what assumptions Miller may have used?
  • The methods questions seem flawed to me. For example, the Eurobarometer asks respondents to rate “how scientific” astrology is, on a scale of 1 to 5. The answer, apparently, is 1. Is this an objective fact? While it’s certainly no 5, you could make a convincing argument that astrology deserves perhaps a 2 – it does have a certain system to it that its practitioners follow, albeit one completely ungrounded in physical reality. There is no commonly understood rubric of what it means for a discipline to be a 2, 3 or 4 on a 5-point scale of scientific-ness – so how can you objectively say the answer is definitely 1?

The cultural criticism

However, I wish to take issue with one family of criticisms against the Miller/Durant style of measurement: that it fails to take account of cultural differences. Pardo and Calvo write of the Eurobarometer, “No allowances is made for the idea that some population segments might be influenced in their appropriation of a scientific proposition by values or beliefs in their society’s culture.” This is true, but I don’t think it’s a damning criticism. It simply points out that the NSF and Eurobarometer surveys measure the “what” rather than the “why.” It is important – but not sufficient – to note what percentage of a population agree “all radioactivity is man-made.” Researchers are right to then explore the cultural cognition that goes into creating agreement or disagreement with that statement.

To put it another way, just because we measure the public’s science comprehension, does not mean we necessarily adhere to the science comprehension thesis. We can use science comprehension measurements as a stepping stone on our way to more complex analyses.

A final note: understanding versus belief

I think Dan Kahan raises a more important criticism in his research on evolution. This work demonstrates that people are capable of two simultaneous mental states that, on the surface, appear conflicting: they can *understand* the theory of evolution, and yet *not believe* in the theory. Further surveys on science literacy should tease apart the prevalence of, and interaction between, these very different dimensions.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s