This brought to mind a very interesting article I read last month in the New Yorker, titled "The Truth Wears Off" by Jonah Lehrer. The author describes a variety of effects that were first estimated as being quite strong but were subsequently shown to be weak or non-existent. He attributes the tendency of "decline" to three main forces.
- Publication bias: In order to get published your paper needs to have strong, significant results.
- Selective reporting: Scientists screen their own work, tossing out results that don't fit with their priors.
- Study-level random effects. Although Lehrer doesn't really nail this point, I think what he is getting at is that the significance levels in studies are calculated as if every observation in the study is independent. But what if the "apparatus" used to conduct the multiple measurements has a problem? Then over and over again it will give similar results. If the flaw is random, it doesn't bias the result up or down but it does bias the significance levels so we think that we have something when we don't. In the case of the red-shirt study, there seems to have been something about the 2004 Olympics that tilted contest victories towards red shirts but it was not an enduring effect (i.e. not truly biological or cultural, say) so it didn't show up in the 2008 replication.
No comments:
Post a Comment