NO EVIDENCE FOR PSI
The Bem experiments were at least partly exploratory. For instance, Bem’s Experi- ment 1 tested not just erotic pictures, but also neutral pictures, negative pictures, positive pictures, and pictures that were romantic but non-erotic. Only the erotic pictures showed any evidence for precognition. But now suppose that the data would have turned out differ- ently and instead of the erotic pictures, the positive pictures would have been the only ones to result in performance higher than chance. Or suppose the negative pictures would have resulted in performance lower than chance. It is possible that a new and different story would then have been constructed around these other results (Bem, 2003; Kerr, 1998). This means that Bem’s Experiment 1 was to some extent a fishing expedition, an expedition that should have been explicitly reported and should have resulted in a correction of the reported p-value.
Another example of exploration comes from Bem’s Experiment 3, in which response time (RT) data were transformed using either an inverse transformation (i.e., 1/RT) or a logarithmic transformation. These transformations are probably not necessary, because the statistical analysis were conducted on the level of participant mean RT; one then wonders what the results were for the untransformed RTs—results that were not reported.
Furthermore, in Bem’s Experiment 5 the analysis shows that “Women achieved a significant hit rate on the negative pictures, 53.6%, t(62) = 2.25, p = .014, d = .28; but men did not, 52.4%, t(36) = 0.89, p = .19, d = .15.” But why test for gender in the first place? There appears to be no good reason. Indeed, Bem himself states that “the psi literature does not reveal any systematic sex differences in psi ability”.
Bem’s Experiment 6 offers more evidence for exploration, as this experiment again tested for gender differences, but also for the number of exposures: “The hit rate on control trials was at chance for exposure frequencies of 4, 6, and 8. On sessions with 10 exposures, however, it fell to 46.8%, t(39) = −2.12, two-tailed p = .04.” Again, conducting multiple tests requires a correction.
These explorative elements are clear from Bem’s discussion of the empirical data. The problem runs deeper, however, because we simply do not know how many other factors were taken into consideration only to come up short. We can never know how many other hypotheses were in fact tested and discarded; some indication is given above and in Bem’s section “The File Drawer”. At any rate, the foregoing suggests that strict confirmatory experiments were not conducted. This means that the reported p-values are incorrect and need to be adjusted upwards.
Problem 2: Fallacy of The Transposed Conditional
The interpretation of statistical significance tests is liable to a misconception known as the fallacy of the transposed conditional. In this fallacy, the probability of the data given a hypothesis (e.g., p(D|H), such as the probability of someone being dead given that they were lynched, a probability that is close to 1) is confused with the probability of the hypothesis given the data (e.g., P (H|D), such as the probability that someone was lynched given that they are dead, a probability that is close to zero).
This distinction provides the mathematical basis for Laplace’s Principle that extraor- dinary claims require extraordinary evidence. This principle holds that even compelling data may not make a rational agent believe that psi exists (see also Price, 1955). Thus, the