[comp.lang.misc] Array references cannot be made optimal by an algorithm

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/14/90)
Outline of this and my next two articles:

1. A more precise proof of the assertion in the subject line.
2. Responses to Misha's statements.
3. A repost of an article showing how to solve the halting problem in
   practice. Yes, it *can* be done.

Here's a more precise proof (with illustrative examples) of the
assertion in the subject line. I dislike unnecessary formality, but I
will post more detailed explanations of any parts that people don't
accept.

Suppose there exists an algorithm which, given any optimal Pascal
program P, can determine the optimal pointer version of P.

By ``Pascal'' I refer to a general class of computational models, in
which each operation takes constant time, and in which you may *only*
refer to x[a] given a pointer to x[0] and given a. There is *no way* to
refer to x[a] through a pointer to x[b]. There is *no way* to store the
address of x[a] except when a is 0.

By ``pointer version'' I refer to the translation of a Pascal program
into a more general class of computational models, in which you *can*
refer to x[a] through a pointer to x[b], given a - b, and you *can*
store a pointer to any x[a]. (If you want to think about this as array
shifting rather than C or machine-language pointers, go ahead.)

By ``optimal'' I mean using minimum time and memory, with certain
weights for each. We assume that the machine uses bounded memory
(outside the program, which is, of course, finite). We assume that any
storable constant may be precomputed in zero time. We also assume that
the number of intermediate results that can be ``remembered'' at any one
time is fixed (i.e., finitely many registers).

For instance, consider this Pascal computation:

  fix array x
  read a
  read x[0], x[1], x[2] into array x
  calculate x[1] + x[a]

The calculation takes time for each of the following:

  load the address of x[0] (constant)
  calculate the address of x[1] from the address of x[0]
  load x[1]
  calculate the address of x[a] from a and the address of x[0]
  load x[a]
  add x[1] to x[a]

Different pointer versions are possible:

  load the address of x[0] (constant)
  load the address of x[1] (constant)
  calculate a - 1
  calculate the address of x[a] from a - 1 and the address of x[1]
  load x[1]
  load x[a]
  add x[1] to x[a]

Notice that in this second version, the first precomputation (of the
address of x[0]) is unnecessary.

Can everyone extrapolate from these examples to a reasonably
well-defined mental model? If not, my apologies; please read chapter 1
in Knuth. If that doesn't help, ask me through e-mail and I'll try to
explain in more detail. Can everyone accept this as being reasonably
realistic---i.e., optimizing in the pointer model is reasonably close to
a real optimization problem on many real machines?

Okay. Now back to the main proof. We have an optimal Pascal program P.
It increments a counter after each (constant-time) statement. It uses an
array, x, with values read from the input, not used except as described
below.

Before exiting, the program checks the value of the counter. If the
counter is even, it executes the following (optimal) code:

  load the address of x[0] (constant)
  load x[0]

If the counter is odd, it executes the following (optimal) code:

  load the address of x[0] (constant)
  find the address of x[1] from that of x[0]
  load x[1]

It then uses whichever value it has loaded in some optimal way.
  
Now by hypothesis, there exists an algorithm that will produce the
optimal pointer version of P. Let it do so.

Note that the pointer version may precompute the address of x[1]. This
speeds up the code when the counter is odd.

Suppose that the counter will be even with high probability (not
depending on the input). We assume this probability is high enough that
precomputing the address of x[1] is not worthwhile (remember that
minimum space and minimum memory come with whatever weights). Then the
optimal version is the same as the Pascal version.

Suppose that the counter will be odd with high probability (not
depending on the input). Then the optimal version is as above, but with
x[0] and x[1] flipped (as cannot be done in the Pascal version). So the
address of x[1] is precomputed, and the address of x[0] is computed from
it when it is needed.

But remember our initial assumption. Somehow an algorithm has produced
an optimal pointer version of P. We can conclude that an algorithm can
detect the fact that the counter will be even with high probability (if
indeed it is), as well as the fact that it will be odd with high
probability (if indeed it is), again independent of the input, for any
program.

The clincher is that such an algorithm is known not to exist. In fact,
it has been proven impossible for an algorithm to decide certain wide
classes of statistical statements about the output of a given program
(knowing that the program will, in fact, produce an output). I don't
know how to prove this, but it seems intuitive that you shouldn't be
able to feed some numerical code into The Great Positivity Program and
have it tell you that the output will ever be positive (or negative, or
whatever), even if you happen to know that the particular program's
output will *always* be positive (or negative, or whatever).

Philosophical comments in the next article. One request: If you're going
to claim that I've made a mistake, POINT OUT THE MISTAKE! When I decide
to make a mathematical statement, spend some time verifying its truth,
make sure that I can convince a few people in the field, and then post
an outline of the proof, it's absolutely sickening to have someone say
``you're wrong'' without bothering to go to the same level of detail and
show me the error. I make careless mistakes; I'm grateful when someone
criticizes constructively and I learn something. I'm angry only at
myself when I don't explain well enough and someone misunderstands the
point I'm making. But I'm disgusted when a computer scientist doesn't
even bother to think.

---Dan