chris@mimsy.UUCP (Chris Torek) (07/03/88)
Now that I have it in front of me:
Second International Converence on Architectural Support
for Programming Languages and Operating Systems (ASPLOS II)
October 5-8, 1987
Palo Alto, California
Computer Society Order Number 805
Library of Congress Number 87-80438
IEEE Catalog Number 87CH2440-6
ISBN 0-8186-0805-6
SAN 264-620X
ACM Order Number 556870
ISBN 0-89791-238-1
(no, I have no idea why there are two ISBNs)
The paper itself is (pp. 122--126):
Superoptimizer -- A Look at the Smallest Program
Henry Massalin
Department of Computer Science
Columbia University
New York, NY 10027
ABSTRACT
Given an instruction set, the superoptimizer finds the shortest
program to compute a function. Startling programs have been
generated, many of them engaging in convoluted bit-fiddling
bearing little resemblance to the source programs which defined
the functions. The key idea in the superoptimizer is a
probabilistic test that makes exhaustive searches practical for
programs of useful size. The search space is defined by the
processor's instruction set, which may include the whole set,
but is typically restricted to a subset. By constraining the
instructions and observing the effect on the output program,
one can gain insight into the design of instruction sets. In
addition, superoptimized programs may be used by peephole
optimizers to improve the quality of generated code, or by
assembly language programmers to improve manually written
code.
[Here it seems appropriate to attach the ACM copyright:
Permission to copy without fee all or part of this material is
granted provided that the copies are not made or distributed for
direct commercial advantage, the ACM copyright notice and the title
of the publication and its date appear, and notice is given that
copying is by permission of the Association for Computing
Machinery. To copy otherwise, or to republish, requires a fee
and/or specific permission.
(c) 1987 ACM 0-89791-238-1/87/1000-0122 $00.75
`$00.75'? :-) ]
I think it is important to note section 4.1, Current Limitations:
Even with the probabilistic test, the exhaustive search still grows
exponentially with the number of instructions in the generated
program. The current version of superoptimizer has generated
programs 12 instructions long in several hours running time on a
16MHz 68020 computer. Therefore, the superoptimizer has limited
usefulness as a code generator for a compiler.
Here are the 68020 routines for the various functions (I use the MIT/Sun
mnemonics below; I like them better):
signum:
| x in d0:
addl d0,d0
subxl d1,d1
negxl d0
addxl d1,d1
| signum(x) in d1
abs:
| x in d0:
movl d0,d1
addl d1,d1
subxl d1,d1
eorl d1,d0
subl d1,d0
| abs(x) in d0
max:
| d0=X, d1=Y
subl d1,d0
subxl d2,d2
orl d2,d0
addxl d1,d0
| max(X,Y) in d0
min:
| d0=X, d1=Y
subl d1,d0
subxl d2,d2
andl d2,d0
addl d1,d0
| min(X,Y) in d0
minmax:
| d0=X, d1=Y
subl d1,d0
subxl d2,d2
andl d0,d2
eorl d2,d0
addl d1,d0
addl d2,d1
| d0=max(X,Y); d1=min(X,Y)
logicals:
| X in d0
negl d0
subxl d0,d0
| d0 = X==0 ? 0 : -1
negl d0
| d0 = X==0 ? 0 : 1
| X in d0
negl d0
subxl d0,d0
notl d0
| d0 = X==0 ? -1 : 0
| X in d0
negl d0
subxl d0,d0
addql #1,d0
| d0 = X==0 ? 1 : 0
8 digit BCD to binary (done in three steps by Superoptimizer):
| 8 digit BCD number in d0
movl d0,d1
andl #0xf0f0f0f0,d1
lsrl #3,d1
subl d1,d0
subl d1,d0
subl d1,d0
movl d0,d1
andl #0xff00ff00,d1
lsrl #1,d1
subl d1,d0
lsrl #2,d1
subl d1,d0
lsrl #3,d1
addl d1,d0
movl d0,d1
swap d1
mulu #0xd8f0,d1
subl d1,d0
| binary equivalent in d0
Multiplication by constants:
| d0 *= 29
movl d0,d1
lsll #4,d0
subl d1,d0
addl d0,d0
subl d1,d0
| d0 *= 39
movl d0,d1
lsll #2,d0
addl d1,d0
lsll #3,d0
subl d1,d0
| d0 *= 156
movl d0,d1
lsll #2,d1
addl d1,d0
lsll #5,d0
subl d1,d0
| d0 *= 625
movl d0,d1
lsll #2,d0
addl d1,d0
lsll #3,d0
subl d1,d0
lsll #4,d0
addl d1,d0
and an interesting note on division:
A general divide by constant that works for all 32-bit arguments is
too long to realize any time gain over the divide instruction, and
is certainly not shorter. Additionally, there doesn't seem to be
any nifty arithmetic-logical operations [sic] that simplify the
process. The generated programs just multiply by the reciprocal of
the constant. Since we do an exhaustive search, this negative
result can be seen as a confirmation of the inherent high cost of
divisions for the instruction sets [68020 and 80?86] considered.
--
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain: chris@mimsy.umd.edu Path: uunet!mimsy!chris