brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/14/90)
Outline of this and my next two articles: 1. A more precise proof of the assertion in the subject line. 2. Responses to Misha's statements. 3. A repost of an article showing how to solve the halting problem in practice. Yes, it *can* be done. Here's a more precise proof (with illustrative examples) of the assertion in the subject line. I dislike unnecessary formality, but I will post more detailed explanations of any parts that people don't accept. Suppose there exists an algorithm which, given any optimal Pascal program P, can determine the optimal pointer version of P. By ``Pascal'' I refer to a general class of computational models, in which each operation takes constant time, and in which you may *only* refer to x[a] given a pointer to x[0] and given a. There is *no way* to refer to x[a] through a pointer to x[b]. There is *no way* to store the address of x[a] except when a is 0. By ``pointer version'' I refer to the translation of a Pascal program into a more general class of computational models, in which you *can* refer to x[a] through a pointer to x[b], given a - b, and you *can* store a pointer to any x[a]. (If you want to think about this as array shifting rather than C or machine-language pointers, go ahead.) By ``optimal'' I mean using minimum time and memory, with certain weights for each. We assume that the machine uses bounded memory (outside the program, which is, of course, finite). We assume that any storable constant may be precomputed in zero time. We also assume that the number of intermediate results that can be ``remembered'' at any one time is fixed (i.e., finitely many registers). For instance, consider this Pascal computation: fix array x read a read x[0], x[1], x[2] into array x calculate x[1] + x[a] The calculation takes time for each of the following: load the address of x[0] (constant) calculate the address of x[1] from the address of x[0] load x[1] calculate the address of x[a] from a and the address of x[0] load x[a] add x[1] to x[a] Different pointer versions are possible: load the address of x[0] (constant) load the address of x[1] (constant) calculate a - 1 calculate the address of x[a] from a - 1 and the address of x[1] load x[1] load x[a] add x[1] to x[a] Notice that in this second version, the first precomputation (of the address of x[0]) is unnecessary. Can everyone extrapolate from these examples to a reasonably well-defined mental model? If not, my apologies; please read chapter 1 in Knuth. If that doesn't help, ask me through e-mail and I'll try to explain in more detail. Can everyone accept this as being reasonably realistic---i.e., optimizing in the pointer model is reasonably close to a real optimization problem on many real machines? Okay. Now back to the main proof. We have an optimal Pascal program P. It increments a counter after each (constant-time) statement. It uses an array, x, with values read from the input, not used except as described below. Before exiting, the program checks the value of the counter. If the counter is even, it executes the following (optimal) code: load the address of x[0] (constant) load x[0] If the counter is odd, it executes the following (optimal) code: load the address of x[0] (constant) find the address of x[1] from that of x[0] load x[1] It then uses whichever value it has loaded in some optimal way. Now by hypothesis, there exists an algorithm that will produce the optimal pointer version of P. Let it do so. Note that the pointer version may precompute the address of x[1]. This speeds up the code when the counter is odd. Suppose that the counter will be even with high probability (not depending on the input). We assume this probability is high enough that precomputing the address of x[1] is not worthwhile (remember that minimum space and minimum memory come with whatever weights). Then the optimal version is the same as the Pascal version. Suppose that the counter will be odd with high probability (not depending on the input). Then the optimal version is as above, but with x[0] and x[1] flipped (as cannot be done in the Pascal version). So the address of x[1] is precomputed, and the address of x[0] is computed from it when it is needed. But remember our initial assumption. Somehow an algorithm has produced an optimal pointer version of P. We can conclude that an algorithm can detect the fact that the counter will be even with high probability (if indeed it is), as well as the fact that it will be odd with high probability (if indeed it is), again independent of the input, for any program. The clincher is that such an algorithm is known not to exist. In fact, it has been proven impossible for an algorithm to decide certain wide classes of statistical statements about the output of a given program (knowing that the program will, in fact, produce an output). I don't know how to prove this, but it seems intuitive that you shouldn't be able to feed some numerical code into The Great Positivity Program and have it tell you that the output will ever be positive (or negative, or whatever), even if you happen to know that the particular program's output will *always* be positive (or negative, or whatever). Philosophical comments in the next article. One request: If you're going to claim that I've made a mistake, POINT OUT THE MISTAKE! When I decide to make a mathematical statement, spend some time verifying its truth, make sure that I can convince a few people in the field, and then post an outline of the proof, it's absolutely sickening to have someone say ``you're wrong'' without bothering to go to the same level of detail and show me the error. I make careless mistakes; I'm grateful when someone criticizes constructively and I learn something. I'm angry only at myself when I don't explain well enough and someone misunderstands the point I'm making. But I'm disgusted when a computer scientist doesn't even bother to think. ---Dan