[net.physics] Parapsychology Experiment Now

wasser@viking.DEC (John A. Wasser) (06/25/85)

I think the discussion of paranormal abilities has reached a point where it is
time to generate some solid empirical data!  The design of the experiment is 
going to be tricky... It is difficult to make sure proper controls are carried
out when you work in the nebulous realm of the Networks.  I think we should
concentrate on precognition because it is so much easier to control than
postcognition.  I will make every effort to insure that the experiment is
carried out according to proper scientific methods.

I would like to invite anyone interested in testing their precognitive
ability to send me a mail message (see the addresses below) that contains
a guess of the following form:

	Ten lines of text containing nothing but fifty decimal digits.
	No spaces or or other characters between the digits.
	No spaces or other characters before or after the digits.
	No lines between the lines of digits.

You may include any comments you wish before or after the block of
digits.  These comments will be ignored for the purpose of scoring.

After a reasonable amount of time I will write a program to generate
a random set of digits in the same format as above.  This answer
will be posted to net.physics and that posting is what you have to
guess.  The digits in your answer will be compared to the digits
in the same positions in the answer and your score will be the number
of digits correctly guessed.  

The highest score over the 95% confidence level will be awarded a
tasteful brass plaque.

		-John A. Wasser

Work address:
ARPAnet:	WASSER%VIKING.DEC@decwrl.ARPA
Usenet:		{allegra,Shasta,decvax}!decwrl!dec-rhea!dec-viking!wasser

P.S.  Please forward this announcement to anyone you know who might be
interested (including other news groups).

mikes@AMES-NAS.ARPA (06/28/85)

From:  mikes@AMES-NAS.ARPA (Peter Mikes)

  I do not see how the experiment with 10*50 random digits tests precogni-
  tion - or anything else. Perhaps you should submit the design to dis-
  cussion befere jumpimg.

cooper@pbsvax.DEC (Topher Cooper HLO2-3/M08 DTN225-5819) (07/02/85)

>I think the discussion of paranormal abilities has reached a point where it is
>time to generate some solid empirical data!  
>
> <<Format of data>>
>
>After a reasonable amount of time I will write a program to generate
>a random set of digits in the same format as above.  This answer
>will be posted to net.physics and that posting is what you have to
>guess.  The digits in your answer will be compared to the digits
>in the same positions in the answer and your score will be the number
>of digits correctly guessed.  
>
>The highest score over the 95% confidence level will be awarded a
>tasteful brass plaque.
>
>		-John A. Wasser

You have made at least two errors, perhaps three in the design of your
precognition test.  The errors are common ones among people unfamiliar with
parapsychological experiment design.  You are in good company: e.g., Randi's
"Testing Your ESP" (or whatever the title is) is abysmal from an experimental
viewpoint.

The first flaw, and the most serious, is caused by what in the parapsychological
literature is called the "stacking effect."  (Note: this is a real
psychological and statistical effect which is significant in parapsychological
experiments: it is not a parapsychological effect).  If your test were designed
only to allow people to test their own precognitive ability, there would be no
problem with this design.  The problem comes in trying to conclude anything
from the entire set of submitted tests.

Standard statistical tests assume that "trials" are independent.  Since people
respond in patterns rather than randomly, all the trials referring to a
particular target are correlated.  This does not effect the expected number of
hits, either per subject or overall.  It does effect the variance.  This in
turn effects the significance of any deviance from the mean.  As a matter of
fact, it exaggerates it.

To see this imagine an experiment where a group of people try to guess one
digit instead of 500.  In our culture, if you ask a group of people to guess
a single digit, depending on circumstances, roughly 1/3 will pick seven.  If
that happened to be the target digit I would EXPECT to get a "highly
significant" result from a dozen or more people (using a chi-square test of
significance).  Similarly, I would tend to get a very low result if the target
happened to be "6", which is rarely selected by people in this culture under
"neutral" circumstances.

This error was made several times in the early days of parapsychology before it
was recognized.  It is frequently made today by people without training in the
field (not that parapsychologists are the only ones to recognize it, of course).
And even experienced parapsychologists occasionally fall prey to much more
subtle forms of the same problem.  It is also a common pitfall in psychology,
e.g., in the analysis of questionnaires.

Your only course of action with this experimental design is to assume a
correlation of 1 and proceed from there.  The simplest way to do this would be
to simply pick a single person's submission (I'll call this a "run" from now on)
AT RANDOM and do a single test.  A better idea is to assume that each run is
a noisy version of a single underlying run which represents both the "average"
response bias and psi hits (if any).  You can then "recreate" this prototype
run by using the "i'th" trial of each run as a "vote", majority take all, for
the "i'th" trial of the prototype.  Ties should be resolved randomly.  The
single resulting run can be used in a standard test of significance.  More
sophisticated tests exist which should be at least marginally more sensitive
but these are non-standard and their use would almost certainly cause unwanted
controversy.

By the way, choosing the maximum trial is not valid under these circumstances,
the expected maximum value depends on the variance, which is indeterminate.

Your other course of action is to change your experimental design.  Such a
design would be a lot more work, but also much more mean meaningful.

The second flaw may only be a flaw in your presentation.  You MUST present
the statistical technique you plan to use in advance of actually doing any
analysis.  Otherwise, people who don't like your results will claim that you
chose an analysis that would produce that result on the particular data.

Some choices if you do not modify your experimental design are discussed above.
These should be used with the standard chi-square test, or the equivalent.
You should also state in advance whether you are going to use the a one-tailed
or two-tailed test.  The simple chi-square test is intrinsically two-tailed,
but can be modified easily to be one-tailed.  The one-tailed test is more
sensitive then the two-tailed test to psi-hitting (above chance results) but
is completely blind to psi-missing (below chance results).  I think you would
agree that SOMETHING was going on if you got not a single hit out of, say 25
responses, but the one-tailed test would not indicate this.  Experience in
parapsychology has shown that psi-missing is very common.  Two-tailed tests are
therefore standard.

If you do modify your design so the runs are independent several more options
are open to you.  The simplest technique is to do a simple hit-vs-miss
chi-square test on the entire mass of data using an expected number of hits
of 50*R and of misses as 450*R where R is the number of runs submitted.
The problem with this technique is that in a mixed population the psi-hitters
and the psi-missers tend to cancel out, at least somewhat.  The two-tailed test
helps only if there is a preponderance of psi-missers.

An alternative that avoids the problem is to use a 2-by-R chi-square design.
In this case, each percipient (subject in an ESP experiment) is evaluated
independently for both psi-hitting and psi-missing.  Simple analysis of
variance between runs is a popular technique with much the same purpose.

Reading between the lines of your posting, it looked like you might have
intended to evaluate the maximum score for significance.  This is a valid
one-tailed test, if rather insensitive, as long as the runs are statistically
independent.  The most extreme value (with "most extreme" properly defined)
could be used instead to produce a two-tailed test.  However, I don't know
of any standard techniques or tables for evaluating the significance of the
most extreme (or maximum) value, so you would have to develop them yourself,
unless you know where to find them.  If you decide to go this route, I know
how to do the necessary evaluation and would be glad to help you.

The potential flaw has to do with what you mean by "write a program to generate
a random set of digits."  This will have to include a true random element
(for example seeded from a millisecond clock when run).  If you simply choose
an arbitrary seed you will come under the criticism that you chose the seed,
unconsciously, to produce the results you got on the basis of the runs you
had already seen.  Never mind the seeming impossibility of the gigantic
subconscious computation which would be involved: much more extreme assumptions
have been used in the past to explain away experimental results in
parapsychology.

Finally, some general comments.  Psi is notoriously unreliable.  This is as
true on the level of experiment as it is on the level of single trials or single
individuals.  Approximately one out of three experiments conducted by
experienced parapsychologists are significant at the 95% level or higher
(99% is a fairly standard significance level in parapsychological research).
There are no good statistics for inexperienced experimenters but the level is
probably much lower.  Why this is, no one knows.  A negative result from
this experiment cannot be interpreted, therefore, as very much evidence
against the existence of psi.  I know this sounds that I am making psi a
completely unfalsifiable hypothesis; that I am apologizing for a failure in
advance; but I am not.  The evidence for psi lies in the large body of
experimental evidence both positive and negative, it does not lie in any single
experiment.  No single experiment can be strong enough, by itself, to overcome
the very low a-priori probability of psi.  This experiment, positive or
negative, will become part of that body of experimental evidence.

I also think that it is fair to warn you that you are opening yourself up for
a lot of criticism if the experiment happens to come out positive.  I suspect
from your previous postings that you don't think this will happen.  It could,
however, if only by a 1-in-20 coincidence.  People may accuse you of cheating,
or of making the stupidest, most obvious errors.  They feel completely
justified in doing this with absolute confidence, no perhapses, maybes or
mights, because the other alternatives are seen as "impossible".

Good luck.

	    Topher

mann@LaBrea.ARPA (07/06/85)

> The highest score over the 95% confidence level will be awarded a
> tasteful brass plaque.
 
As I'm sure you recognize, if you get 100 responses, the expected number of
scores that are significantly high at the 95% confidence level is 5, even if
no one has any ESP at all.  So I hope you have a good source for a cheap (but
tasteful) brass plaque -- you'll be awarding it.

More seriously, this points up a real pitfall that many ESP researchers have
fallen into.  They know enough about statistics and scientific practice to
get excited about a result that is "significant at the 95% level", and even
more excited about one that's significant at the 99% level.  So they do
hundreds of experiments, and test each one's significance individually.  Lo
and behold, about 1 in 100 experiments is "significant" at the 99% level, so
they report it.  Of course, this is just what would be expected by chance.
"Significant at the 99% level" means there is only 1 chance in 100 of the
given outcome having occured entirely by chance.  If most or all your
experiements are significant at this level, or (far better) if all your
experiments taken together are significant at this level, you are really 
on to something.  If only 1 in 100 is "significant", you have merely
observed the laws of probability in action.

	--Tim