make profit, etc.). In addition, psi has no clear grounding in known biological or physical mechanisms.2

Despite the lack of a plausible mechanistic account of precognition, Bem was able to reject the null hypothesis of no precognition in eight out of nine experiments. For instance, in Bem’s first experiment 100 participants had to guess the future position of pictures on a computer screen, left or right. And indeed, for erotic pictures, the 53.1% mean hit rate was significantly higher than chance (t(99) = 2.51, p = .01).

Bem takes these findings to support the hypothesis that people “use psi information implicitly and nonconsciously to enhance their performance in a wide variety of everyday tasks”. In further support of psi, Utts (1991, p. 363) concluded in a Statistical Science review article that “(...) the overall evidence indicates that there is an anomalous effect in need of an explanation” (but see Diaconis, 1978; Hyman, 2007). Do these results mean that psi can now be considered real, replicable, and reliable?

We think that the answer to this question is negative, and that the take home message of Bem’s research is in fact of a completely different nature. One of the discussants of the Utts review paper made the insightful remark that “Parapsychology is worth serious study. (...) if it is wrong [i.e., psi does not exist], it offers a truly alarming massive case study of how statistics can mislead and be misused.” (Diaconis, 1991, p. 386). And this, we suggest, is precisely what Bem’s research really shows. Instead of revising our beliefs regarding psi, Bem’s research should instead cause us to revise our beliefs on methodology: the field of psychology currently uses methodological and statistical strategies that are too weak, too malleable, and offer far too many opportunities for researchers to befuddle themselves and their peers.

The most important flaws in the Bem experiments, discussed below in detail, are the following: (1) confusion between exploratory and confirmatory studies; (2) insufficient at- tention to the fact that the probability of the data given the hypothesis does not equal the probability of the hypothesis given the data (i.e., the fallacy of the transposed conditional); (3) application of a test that overstates the evidence against the null hypothesis, an unfortu- nate tendency that is exacerbated as the number of participants grows large. Indeed, when we apply a Bayesian t-test (G¨onen, Johnson, Lu, & Westfall, 2005; Rouder, Speckman, Sun, Morey, & Iverson, 2009) to quantify the evidence that Bem presents in favor of psi, the evidence is sometimes slightly in favor of the null hypothesis, and sometimes slightly in favor of the alternative hypothesis. In almost all cases, the evidence falls in the category “anecdotal”, also known as “worth no more than a bare mention” (Jeffreys, 1961).

We realize that the above flaws are not unique to the experiments reported by Bem. Indeed, many studies in experimental psychology suffer from the same mistakes. However, this state of affairs does not exonerate the Bem experiments. Instead, these experiments highlight the relative ease with which an inventive researcher can produce significant results even when the null hypothesis is true. This evidently poses a significant problem for the

2Some argue that modern theories of physics are consistent with precognition. We cannot independently verify this claim, but note that work on precognition is seldom published in reputable physics journals (in fact, we failed to find a single such publication). But even if the claim were correct, the fact that an assertion is consistent with modern physics does not make it true. The assertion that the CIA bombed the twin towers is consistent with modern physics, but this fact alone does not make the assertion true. What is needed in the case of precognition is a plausible account of the process that leads future events to have perceptual effects in the past.

