[comp.arch] What the compiler won't do you for you

chased@rbbb.Eng.Sun.COM (David Chase) (02/27/91)

(Even though I'm replying to Dan, I'll be polite.  Truth is stranger
than fiction, eh?)

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
>Observation 2: Whoever writes the optimizer for FUBAR---let's call this
>guy Natas---*could* make every single FUBAR instruction available from
>Z. All he has to do is make sure that there's *some* language construct,
>perhaps ridiculously convoluted, that will compile to each instruction.

>Observation 3: Natas rarely does this. I have yet to see any compiler
>understand any portable expression of Hamming weights, even though this
>wouldn't be very difficult. Even in the occasional case where the
>compiler writer shows some evidence of having used the assembly language
>he's compiling for, he rarely tells programmers what the optimizer can
>and can't do.

>Let's assume that Natas is not such a devil, and manages not only to
>give his optimizer some way of using FUBAR's CXi operation and some way
>of using FUBAR's DREM operation, but also to *document* (gasp) the
>corresponding expressions in Z. Now Joe can write portable Z code that
>will run fast on FUBAR, taking advantage of the chip's instructions. All
>he has to do is follow Natas's instructions.

There's one problem with this: finite resources (if you or Herman
Rubin wants to pony up a dump-truck full of cash, maybe we can talk,
but I'll bet your resources are finite, too).  I think you'll agree
that no compiler writer should spend time optimizing for the
machine-specific case until most of the optimization has been done for
the portable (standard-conforming) case.  So, after we've taken care
of reduction in strength, global value numbering, constant
propagation, redundancy elimination, loop invariant code motion,
instruction selection, register allocation, scheduling, loop
unrolling, loop straightening, peephole improvements, software
pipelining, linear function test replacement, loop fusion,
stripmining, blocking, dead code elimination, tail call elimination,
leaf routine optimization -- oops, better algorithms appeared, so we
have to reimplement a couple of those -- oops, new machine, time to
fool around with the scheduling and instruction selection some more --
oops, time to do some fancy code placement to help out the cache --
oops, time to do interprocedural optimization.  I think you get the
picture.  Always, optimization efforts have to be directed towards
those things that will make the largest number of present and future
customers happy.  When all the work is done for C, Fortran, Pascal,
Modula, and C++, is it better to expose non-portable machine-specific
optimizations to the programmer, or should we look at extending the
optimizer to be useful to Lisp, Ada, Cobol, RPG-III, Prolog, and
Jovial?  Maybe we'd make people happier if the optimizer ran twice as
fast, or in half the memory.  Maybe we should make use of some of the
dataflow algorithms to provide debugging feedback to the user
("there's no exit from this loop; perhaps it won't terminate?")

Another difficulty with the scheme that you describe is that you
really don't want people to be writing their code in strange little
idioms because that will engage some magic widget in the optimizer.
It's portable, but weird.  That doesn't help readability or
debuggability, and it may not be portable into the future.  If someone
rewrites the optimizer, the last thing they want to do is support
weird little hacks like that.  If you must (and some people must),
then use something based on subroutine calls.  That has the advantage
that (1) if no machine support exists, it is easy to plug in portable
code and (2) machine support often exists for assembly-language
implementations (for instance, say "man inline" on a Sun).

Of course, there are some things that "ought" to be recognized because
they are portable, but it still isn't clear that they are needed, or
worth the cost.  For instance, on the Sparc we *could* spill register
%i7 (return PC) and use that register for other purposes in the main
body of a subroutine.  Fine, except that we'd break all the debuggers,
including adb.  That adds big costs to the final debugging that goes
on before a product (ours, or some software vendor's) product is
shipped.  Probably not worth it.

David Chase
Sun

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (02/27/91)

In article <8658@exodus.Eng.Sun.COM> chased@rbbb.Eng.Sun.COM (David Chase) writes:
> There's one problem with this: finite resources

Look, Dave. Say you have a chip with 100 instructions, of which you use
90 in your compiler output. Pick one of the ten left: DREM, perhaps. How
long would it take you to write a DREM function in portable C? So you
write it just *one* way, and get your optimizer to understand that *one*
way, and document that *one* way. How long does this take? An hour,
perhaps, depending on the structure of your compiler. Can you spare 10
hours to make your compiler vastly more useful?

> Another difficulty with the scheme that you describe is that you
> really don't want people to be writing their code in strange little
> idioms because that will engage some magic widget in the optimizer.
> It's portable, but weird.  That doesn't help readability or
> debuggability, and it may not be portable into the future.

Maybe you didn't understand what I said. By definition, everything done
here is portable. If your compiler understands <shazam!> to be DREM, and
<shazam!> doesn't work the same on another machine, then your compiler
is simply incorrect. Programmers need to write efficient code that
*will* be portable into the future, and my scheme is designed to allow
this.

As for maintainability... If the programmer *needs* Hamming weights,
then he's going to write the code for it anyway. Adding an equivalent
section of code for a different machine won't hurt the maintainability
of the original. In fact, I think a pick-this-or-that construct would
*help* maintainability, as it would let the programmer display his
original, comprehensible code along with each transformed version.

Do you have some religious argument against ``strange little idioms''?
Well, it's the compiler writer's fault---your fault---if the idioms are
strange. What's important to me is that they be portable. If you want to
cure the ``strangeness,'' publish a standard for what you consider to be
the right idiom, as I suggested before.

> If you must (and some people must),
> then use something based on subroutine calls.

That's fine. ``If you write the following routine: int sumones(x) int x;
{ int a; int b; <blah, blah, blah> } then our optimizer will convert any
call to sumones() into a CX instruction. You may only change the names
sumones, x, a, and b; you must use exactly the order of instructions
shown here, or our optimizer will not recognize the routine.''

This is all I'm asking for. Surely your compiler can do fixed
pattern-matching without trouble? Is this such a ``weird little hack''?

---Dan

chased@rbbb.Eng.Sun.COM (David Chase) (02/27/91)

1) I prefer David, not Dave.

2) My time is still better spent doing general-purpose optimizations.

3) For hysterical reasons, an optimizer pre-mutilates the code, so
idioms may get transformed beyond recognition in a context-dependent
way.

4) There is ALREADY a tool that will do most of what you want.  Say
"man inline" on some nearby Sun machine running a recent rev of the
operating system.  "f77 -v -fast ..." might provide some guidance.
Given that an 80% solution exists, that makes solving the problem
again even less profitable.

5) Any "idiom" designed to trigger use of a specific instruction may
be portable for correctness, but it will not be portable for speed in
future releases of the compiler unless care is taken.  Verifying that
it continues to provide the same function originally promised to
customers may well take more time than it took to add it in the first
place.

6) Since we have a multi-language optimizer, we might choose to do it
for some language other than C first.  Fortran comes to mind.

>> If you must (and some people must),
>> then use something based on subroutine calls.
>
>That's fine. ``If you write the following routine: int sumones(x) int x;
>{ int a; int b; <blah, blah, blah> } then our optimizer will convert any
>call to sumones() into a CX instruction.

7) No, it is simpler to use the existing inliner tool to inline
assembly language, and the results are more certain. If there
tremendous demand, it goes into a library.  Portable into the future,
and not dependent on the behavior of the optimizer.

8) Need is demonstrated with dollars.  Other people back up their
requests (sometimes unreasonable requests) with dollars, so they get
serious attention paid to them.

David Chase
Sun

tom@ssd.csd.harris.com (Tom Horsley) (02/27/91)

>>>>> Regarding Re: What the compiler won't do you for you; brnstnd@kramden.acf.nyu.edu (Dan Bernstein) adds:

brnstnd> Look, David, I cannot afford to write any large number of routines in
brnstnd> assembly language. Most of my code must work on several different
brnstnd> platforms without excessive rewriting. I have the time to write an
brnstnd> occasional routine in assembly language for a few machines, but this is
brnstnd> simply not cost-effective most of the time.

Look, Dan. Say you have to port your code to 100 machines, of which 90 don't
have machine instructions for DREM anyway. How long would it take you to
write a DREM function in portable C? So you write it just *one* way, and use
it on those 90 machines. For the 10 machines that do have some sort of DREM
instruction, you hand code an assembly routine. How long does this take? An
hour, perhaps, depending on the machine instruction you need to execute. Can
you spare 10 hours to make your application run on 10 machines?
--
======================================================================
domain: tahorsley@csd.harris.com       USMail: Tom Horsley
  uucp: ...!uunet!hcx1!tahorsley               511 Kingbird Circle
                                               Delray Beach, FL  33444
+==== Censorship is the only form of Obscenity ======================+
|     (Wait, I forgot government tobacco subsidies...)               |
+====================================================================+

chased@rbbb.Eng.Sun.COM (David Chase) (02/28/91)

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
>In the article you responded to, David, I proposed a language construct
>to let the compiler choose one of several sections of code. It is f'ing
>obvious that this makes the program ``portable for speed.'' It is f'ing
>obvious that I proposed the construct *exactly* because I am worried
>about efficiency, and because I am trying to *reduce* the work of the
>compiler writer and programmer in making code efficient. The only way
>now to make portable code efficient is to spend years communicating with
>language designers and chip writers. I want to reduce that time.

Dan, you're missing the forest for the trees, several times over.
Even if the general idea that you propose were worthwhile, your scheme
for accessing special-purpose assembly language instructions from
portable code is too complicated.  It is gratuitously difficult for
both the compiler writer and the programmer.  Compare:

Dan's method:

compiler writer writes recognizer for special code sequences.
compiler writer documents these code sequences.
programmer reads the code sequences.
programmer writes these code sequences.

Simpler method:

compiler writer writes library of inlineable code.
compiler writer documents names of routines that are in this library.
programmer uses those names.
if you insist, compiler writer writes portable versions of these
   subroutines, though programmer would have had to do it anyway in
   your method (working from documentation).

No need to recognize complicated code sequences, and easier to use.
The inliner sucks in the code automagically.  Writing the portable
version is no harder than figuring out what pattern is recognized, but
you don't have to write a pattern recognizer, OR impose restrictions
on what transformations might be performed by the optimizer now or in
the future that might complicate pattern recognition (yes,
straight-line code can get oddly mutilated).  No need to promise that
in the future, a special chunk of code will result in the use of a
special instruction.

Now, no doubt in your next twisted missive you will claim that is
exactly what you proposed, since, of course, a procedure call is a
special case of a code sequence.  This behavior is already supported
by existing compilers -- there's several people who write inlineable
routines for Fortran -- so if this is what you meant, then you are
ignorant of the state of current tools.  Since that cannot possibly be
the case, I conclude that this is not what you meant.

In addition, as I said before, there is ample evidence that time is
better spent doing a good job of optimizing code in a general way.  If
it paid to do the things that you propose, it would have showed up in
compiler tuning.  People do look at the code that is generated, and
try to come up with ways to make it better.  Somehow, nothing like
your idea even made it onto the list, let alone near the top.

Perhaps you have an exaggerated opinion of your intelligence and the
importance of your problems.  Just a guess.  By the way, is
abbreviating "fucking" into "f'ing" consider politeness on the net?
Thanks for being so considerate.

Your pal,

David Chase
Sun

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (02/28/91)

In article <8787@exodus.Eng.Sun.COM> chased@rbbb.Eng.Sun.COM (David Chase) writes:
> Simpler method:
> compiler writer writes library of inlineable code.
> compiler writer documents names of routines that are in this library.
> programmer uses those names.

I give up. The man just refuses to see that this does NOT produce
portable code. Sun writes one library, and DEC writes another one, and
Convex writes another one, and a programmer who uses these libraries
will NOT be able to run his programs on more than one machine.

> No need to promise that
> in the future, a special chunk of code will result in the use of a
> special instruction.

Nor does he see that the compiler writer NEVER has to make such a
promise. Again it is obvious that his viewpoint is twisted by his
Sun-specific environment. Too bad for him that the rest of the world
uses hundreds of different architectures.

If David opens his eyes he will see that what I am proposing will ALWAYS
improve the portability of efficient code. But because this requires a
bit of work on the compiler-writer's part, his eyes will remain closed.

> In addition, as I said before, there is ample evidence that time is
> better spent doing a good job of optimizing code in a general way.  If
> it paid to do the things that you propose, it would have showed up in
> compiler tuning.  People do look at the code that is generated, and
> try to come up with ways to make it better.  Somehow, nothing like
> your idea even made it onto the list, let alone near the top.

Actually, David, I found out from e-mail that essentially the same ideas
have been published before, in many separate papers.

I still find it amazing that David can draw a line between optimizing
*(p+a) into a single instruction on a Convex, optimizing c = a/b and
d = a%b into a single instruction on most CISC chips, and optimizing a
short Hamming weight function into a single instruction on a Cyber.

---Dan

chased@rbbb.Eng.Sun.COM (David Chase) (02/28/91)

 brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
>In article <8787@exodus.Eng.Sun.COM> chased@rbbb.Eng.Sun.COM (David Chase) writes:
>> Simpler method:
>> compiler writer writes library of inlineable code.
>> compiler writer documents names of routines that are in this library.
>> programmer uses those names.
>
>I give up. The man just refuses to see that this does NOT produce
>portable code. Sun writes one library, and DEC writes another one, and
>Convex writes another one, and a programmer who uses these libraries
>will NOT be able to run his programs on more than one machine.

I believe you forgot to quote the next line that said "if you insist,
provide portable implementations of the subroutines in that library".
Ergo, portable code.  Or, the programmer can write the portable code
him- or herself; that's no harder than your atrocity.  No harder for
programmers, simpler for compiler, easier to maintain, means BETTER.
Furthermore, note that your scheme for triggering special instructions
through the use of special code sequences doesn't do anything for all
the code that has already been written.  No win means not better.

>Nor does he see that the compiler writer NEVER has to make such a
>promise.

If the documentation says "do this to get the compiler to emit the
splartzfooie instruction", that's a promise.  It seems to be the case
that a non-trivial number of people with money will interpret ANY
discovered behavior as a promise, so imagine the signicance of
documentation.

>Actually, David, I found out from e-mail that essentially the same ideas
>have been published before, in many separate papers.

References, please.  I'd love to be enlightened.

>I still find it amazing that David can draw a line between optimizing
>*(p+a) into a single instruction on a Convex, optimizing c = a/b and
>d = a%b into a single instruction on most CISC chips, and optimizing a
>short Hamming weight function into a single instruction on a Cyber.

It's an easy distinction to make.  I'll bet you'd loan someone a dime,
and not be too upset about not getting it back.  I'll also bet that
you wouldn't feel the same way about $100.  Same difference -- LOTS of
code can benefit from recognizing *(p+a), AND it is easy to recognize
using any number of fast algorithms (it is a simple tree pattern, and
no unification is required).  Recognizing that two divisions can be
compressed into one is not so automatic, but it matters sometimes, and
it isn't too hard to catch most cases, given a basic block that's had
value numbers applied to it.  This last case occurs far less often,
and I think it's harder to recognize (I'd have to go look up what a
Hamming weight function is.  If it involves a loop, it is harder).
Given the low return for high investment, why bother?

So, the line is easy to draw:  profitable, probably profitable,
probably unprofitable.  You and Herman Rubin are not a large enough
market to justify more than a couple of minutes of work, especially
given that there is an alternative.  There are far better wild geese
to chase.

[Lest anyone wonders why I persist, I'm practicing for the day when I
have to deal with a two-year old.]

David Chase
Sun

preston@ariel.rice.edu (Preston Briggs) (03/01/91)

> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
>>I found out from e-mail that essentially the same ideas
>>have been published before, in many separate papers.
>
>References, please.  I'd love to be enlightened.

Good ref's that many people should read include papers describing
the ECS project undertaken at IBM Yorktown in the mid-70's.
ECS was Experimental Compiling System.  Very roughly, it
worked like this:

	They defined an intermediate language (IL).
	This was fairly interesting by itself, and rather more
	extensive and complete than the norm.

	They defined (built?) an extensive IL optimizer

	Front-ends produced IL
	Libraries were kept in IL form

	There was the potential for extensive inlining
	with reoptimization (and more inlining, ...)

	New operations could be added by providing appropriate IL
	definitions.  With adequate inlining, constant propagation,
	type propagation, etc... these new operations could be
	extensively optimized (examples included string concatenation,
	which is trickier than many imagine).

	I have only a few references.

	"A Case Study of a New Code Generation Technique for Compilers"
	J. L. Carter
	CACM, 1977

	"A New Strategy for Code Generation -- 
		The General Purpose Optimizing Compiler"
	W. Harrison
	IEEE Transactions on Software Engineering, 1977

	"The Experimental Compiling System"
	F. Allen, Carter, Fabri, Ferrante, Harrison, Loewner, Trevillyan
	IBM Journal of R & D, November 1980

Unfortunately, the project apparently died after a few years.
Why?  I dunno.  Perhaps the success of the PL.8 work.  Perhaps
it was too complex.  Perhaps it didn't work.  Perhaps funding or politics.

Personally, I was frightened by the complexity of their
intermediate language.  I also wonder how to control
the huge potential inlining.  Can compile-times be controlled?

I think many people suggesting various forms of compiler extensibility
can learn a lot by reading theses guys.  In particular, the case study
paper illustrates how operations can interact and how difficult it
can be to optimize the results.

>>I still find it amazing that David can draw a line between optimizing
>>*(p+a) into a single instruction on a Convex, optimizing c = a/b and
>>d = a%b into a single instruction on most CISC chips, and optimizing a
>>short Hamming weight function into a single instruction on a Cyber.

There are significant differences between the three cases.

	The 1st is simply instruction selection based on a single tree.

	The 2nd is much trickier.  The 2 instructions don't share a common
	root, so we don't have any easy place to start looking (for
	efficient search).  They do share operands, which can be used as
	a big hint.  Hence the commonality can often be detected with a
	simple extension to value numbering (like Chase suggested).

	The extension many people are arguing for:

		c, d = a divrem b

	is much more significant (at least to languages like C or Fortran).
	Basically, you've suddenly defined a new language with significantly
	different syntax.  The front-end of the compiler will probably
	have to be chopped up pretty extensively, though the optimizer
	and code generator might survive unscathed.

	Better than a massive hack job would be to look into languages
	that allow various forms of multiple-assignment
	(Mesa, Beta, Lisp, Clu, others?).  Do some reading!

	The 3rd is much harder again.  Recognizing all the ways
	I might code a particularly complex function as being equivalent
	to some bizzarre instruction is quite difficult -- probably
	impossible.

	It's also wrong-headed, for many reasons.  Chase pointed out
	that there's only a small payoff.  Perhaps he didn't make his
	argument clearly enough.  Think of all the things that are wrong
	with some particular compiler, and prioritize the list.
	First is usually that it produces incorrect code in some
	important case.  It also compiles to slow, uses too much memory,
	and produces slow code.  Why is the object code slow?
	Well, we didn't do good enough strength reduction and
	the value number has bugs.  Our dependence analysis
	can handle subscript expressions that complex, and we
	don't know how to block loops for the TLB yet.
	Or whatever.  These matter a lot and occur over and over
	in everyone's code.

	And there you sit, working on that special hamming-weight
	function recognizer so Preston's code will run faster.
	Besides being a generally undecidable problem (program equivalence),
	Preston suddenly decides to change his code a little
	or the architects redefine how the special hamming-instruction
	works in the first place (say, how one of the condition
	code bits is set).

	Besides, is the instruction faster than those cute little
	RISCish instructions anyway?  Especially when the optimizer
	realizes that you've already done half a hamming-thing earlier
	and that it needn't repeat some of the instructions.

>>I give up.

Good idea.  One of the cool things about the net is that it provides a way
we can ask questions and get quick answers.  For example, I can ask
about the MIPS I-cache and usaually get a precise answer from someone
who knows.  It isn't helpful to tell John Mashey he's wrong about
how his machine works!  Similarly, you aren't going to win arguments
with David (or Ben) Chase about how compilers work.

khb@chiba.Eng.Sun.COM (Keith Bierman fpgroup) (03/01/91)

In article <24280:Feb2802:55:4991@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:

...
   If David opens his eyes he will see that what I am proposing will ALWAYS
   improve the portability of efficient code. But because this requires a
   bit of work on the compiler-writer's part, his eyes will remain
   closed.
...

I question whose eyes are closed here. Exactly how many compiler
writers do you expect to conform to your new ad hoc standard ? In what
timeframe ? If you rely exclusively on the kindness of compilter
writers, you will have zero portability.

Methods which start with class libraries (modules etc.), are
completely portable. Performance can be dialed in over time, and if
need be compilers altered. Portability is maximal; as even old systems
can provide the required functionality.

This is precisely why languages features such as modules and operator
overloading were put into ISO DIS 1539 Fortran.
--
----------------------------------------------------------------
Keith H. Bierman    kbierman@Eng.Sun.COM | khb@chiba.Eng.Sun.COM
SMI 2550 Garcia 12-33			 | (415 336 2648)   
    Mountain View, CA 94043

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (03/01/91)

In article <KHB.91Feb28184824@chiba.Eng.Sun.COM> khb@chiba.Eng.Sun.COM (Keith Bierman fpgroup) writes:
> I question whose eyes are closed here. Exactly how many compiler
> writers do you expect to conform to your new ad hoc standard ?

Huh? It's purely a quality-of-implementation issue. Someone whose
compiler *cannot* produce DREM or CX or any other available instruction
has a poor compiler. And someone with a better compiler had better
document the idioms that his optimizer will understand, or programmers
won't be able to take advantage of the feature.

Are you claiming that a compiler that *can't* produce DREM on a VAX is
better than a compiler that *can*, all else being equal?

Sure, it takes a bit of time for the compiler writer to add this
feature. That's the disadvantage of any improvement to software: someone
has to write the code.

> If you rely exclusively on the kindness of compilter
> writers, you will have zero portability.

Huh? I've seen several responses that make the same claim. But it's
entirely wrong. The recognized idioms will typically be *different*
under different compilers, but since they're written in PORTABLE code
this is entirely irrelevant. If I write code for one idiom and use it
under a compiler that doesn't recognize the idiom, the code will still
work, and I'm certainly better off than if I had used assembly language
on the first machine.

> Methods which start with class libraries (modules etc.), are
> completely portable.

And this is completely useless. I've said this before, and I'll say it
again: I cannot wait for language designers to catch up. I need to be
able to write efficient, portable code NOW, not in ten years when a
better language is widely available.

Sure, it would be nice if C or Fortran had Hamming weights. But they
don't. So stop talking about portable libraries to solve these problems:
they don't exist.

---Dan

hrubin@pop.stat.purdue.edu (Herman Rubin) (03/01/91)

In article <5412:Mar107:10:0491@kramden.acf.nyu.edu>, brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:

			.........................

> And this is completely useless. I've said this before, and I'll say it
> again: I cannot wait for language designers to catch up. I need to be
> able to write efficient, portable code NOW, not in ten years when a
> better language is widely available.

I fully sympathize with you, but neither you nor I will be able to write
efficient, portable code NOW.  We both WANT to, but I see no reasonable
prospect of this in the near future.  

What I believe can be done in the near future is to produce a totally
non-optimizing macro translator, and let the user and compiler/assembler
do the optimizing afterwards.  This is within present knowledge.  But if
the operations one wishes to use are not even known by the language, and
cannot be added to it, even reasonable code cannot be written.

Followups to comp.lang.misc, please.  This is not a hardware problem,
although I suspect that hardware would change to take into account
these instructions, now totally disregarded.
--
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399
Phone: (317)494-6054
hrubin@l.cc.purdue.edu (Internet, bitnet)   {purdue,pur-ee}!l.cc!hrubin(UUCP)

thomson@hub.toronto.edu (Brian Thomson) (03/01/91)

In article <1991Feb28.181948.26958@rice.edu> preston@ariel.rice.edu (Preston Briggs) writes:
>	The 3rd is much harder again.  Recognizing all the ways
>	I might code a particularly complex function as being equivalent
>	to some bizzarre instruction is quite difficult -- probably
>	impossible.

All the ways, yes.  But many of the ways, perhaps not.
I seem to remember running across a reference, about 10 years ago
(that's when I saw it, not necessarily when it was published),
that dealt with recognizing common sequences of operations in APL
and compiling them into particularly efficient code.
I think they called it "recognition of programming idioms" or something
of the sort.  Does this ring a bell with anyone?

>	It's also wrong-headed, for many reasons.  Chase pointed out
>	that there's only a small payoff. 

I think that, in the APL case, the payoff was somewhat higher because
it sometimes encourages the use of powerful yet expensive operations
to accomplish relatively simple tasks.

-- 
		    Brian Thomson,	    CSRI Univ. of Toronto
		    utcsri!uthub!thomson, thomson@hub.toronto.edu

wilson@uicbert.eecs.uic.edu (Paul Wilson) (03/02/91)

thomson@hub.toronto.edu (Brian Thomson) writes:

>I seem to remember running across a reference, about 10 years ago
>(that's when I saw it, not necessarily when it was published),
>that dealt with recognizing common sequences of operations in APL
>and compiling them into particularly efficient code.
>I think they called it "recognition of programming idioms" or something
>of the sort.  Does this ring a bell with anyone?

I'm not sure we're talking about the same thing, but it rings bells with
me.  The Hewlett-Packard APL 3000 compiler did snazzy dynamic compilation,
allowing it to do optimizations of program fragments that actually
get executed, as opposed to working that hard on all of the dynamically
possible sequences.  (See Van Dyke, E.J., "A dynamic incremental compiler
for an interpretive language," Hewlett-Packard Journal, July 1977, pp.17-24.)

This is philosophically similar to Chambers & Ungar's recent work on
the Self compiler, which uses aggressive inlining and interprocedural
optimization.  It comes up with "customized" versions of code for
different kinds of call sites.  This and the inlining allow a lot
more type analysis of code for a (*very*) dynamically-typed
language.  That lets them optimize away most of the dynamic type
checks, and (importantly) optimize across things they otherwise
couldn't optimize across.  Self is very, very fast for a dynamically-typed
language.

(Self is like Smalltalk, only more so.  It doesn't even have classes,
like Smalltalk; it uses prototypes intead.  The implementation has
an analogue of classes to get back the efficiency advantages.  See 
Chambers & Ungar, "Customization: optimizing compiler technology
for Self, a dynamically-typed object-oriented language," Proc. SIGPLAN
'89, pp. 146-160.)

It seems likely that some of the stuff from the APL 3000 compiler could
also be generalized to do a good job optimizing operations on parameterized
types and homogeneous collection types in a strongly-typed polymorphic
language like Modula-3 or Eiffel. 

  -- Paul
-- 
Paul R. Wilson                         
Software Systems Laboratory               lab ph.: (312) 996-9216
U. of Illin. at C. EECS Dept. (M/C 154)   wilson@bert.eecs.uic.edu
Box 4348   Chicago,IL 60680 

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (03/03/91)

In article <TOM.91Feb27071247@hcx2.ssd.csd.harris.com> tom@ssd.csd.harris.com (Tom Horsley) writes:
> Can
> you spare 10 hours to make your application run on 10 machines?

Oh, great, and when a library routine has a dozen things like this, and
a thousand people are using the library, you can expect that a few
hundred of them are using architectures that I've never seen. You want
each of them to spend dozens of hours just to make a library run fast?
You want the library to get slower and slower as the years go by and as
architectures change? You want to have a dozen ten-way #ifdefs that
every user has to worry about?

Contrast this with the situation I'm proposing. The compiler writers on
those 10 machines with DREM make sure to recognize and document *some*
portable idiom for the instruction. I don't have to learn any assembly
languages; I just have to copy the routines straight out of the manuals.
Since the language has a pick-this-or-that construct, I don't need *any*
#ifdefs. Since there are only so many obvious idioms for a given
operation, my code has a good chance of being perfectly efficient on an
architecture I've never seen. The users don't have to worry about a
thing. Even better, as time goes by, the code will get faster: compiler
writers will recognize more idioms, the idioms will slowly become
standardized, and the code will work efficiently without change.

---Dan

dik@cwi.nl (Dik T. Winter) (03/04/91)

Oh dear, again...

In article <23481:Mar311:05:3591@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
 > Contrast this with the situation I'm proposing. The compiler writers on
 > those 10 machines with DREM make sure to recognize and document *some*
 > portable idiom for the instruction. I don't have to learn any assembly
 > languages; I just have to copy the routines straight out of the manuals.
 > Since the language has a pick-this-or-that construct, I don't need *any*
 > #ifdefs. Since there are only so many obvious idioms for a given
 > operation, my code has a good chance of being perfectly efficient on an
 > architecture I've never seen. The users don't have to worry about a
 > thing. Even better, as time goes by, the code will get faster: compiler
 > writers will recognize more idioms, the idioms will slowly become
 > standardized, and the code will work efficiently without change.
 > 
Compiler writers are very inventive in the idiom they recognize.  Do not
expect two compiler writers to support the same idiom.  And even then...
Peter Montgomery and Bob Silverman do request two slightly different
routines, in order to do (in essence) the same operation.  We have the
BigNum package from INRIA/DEC, which requires a different operation again,
and next we have Henry Cohen from Bordeaux with again different requirements.
And I am doing some stuff in this field myself, and do something different
again.  If the users do apparently not have a common view, how do you think
the compiler writers come to a common view?  I think that in view of this
it is not important to standardize on an interface with a DREM instruction
(should it operate on unsigned only, or signed only, or should both options
be present; all occur in existing hardware), it is more important to
standardize on a mp library.  (Yes I am working on such a thing, but at this
stage my package is far from complete; although it has been ported to some
40 platforms for fixed size integers.)  More about this in my follow-up to
John Mashey.
--
dik t. winter, cwi, amsterdam, nederland
dik@cwi.nl

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (03/06/91)

In article <3061@charon.cwi.nl> dik@cwi.nl (Dik T. Winter) writes:
> Compiler writers are very inventive in the idiom they recognize.  Do not
> expect two compiler writers to support the same idiom.  And even then...
> Peter Montgomery and Bob Silverman do request two slightly different
> routines, in order to do (in essence) the same operation.

True. Since few compiler writers for the VAX have made the effort to
provide documented DREM support, nobody agrees on idioms yet, and we all
have different ideas about what good idioms would be. So what?

There doesn't have to be any agreement on idioms for my plan to work.
That's why it's such a crucial element of the plan that the idioms be
written portably (and why it's helpful to have a pick-this-or-that
control structure).

At first, instead of learning five different assembly languages and
figuring out how to write inlined assembly code under five different
compilers, the programmer will copy five routines straight out of
manuals. Even this reduction in effort would be well worth the compiler
writer's time, but that's not the only advantage. As I explained in my
previous article, as time goes by, idioms for a certain instruction can
only become more standardized, and programs written for those idioms can
only become faster. Everything will be perfectly portable to start.

> Users do apparently not have a common view, how do you think
> the compiler writers come to a common view?

I don't, though I imagine that standards will spring up eventually. My
whole point is that we should be able to write portable, efficient code
*now*, even with *no* agreement on standards. All it takes is a bit of
effort on the part of each compiler writer.

> it is more important to
> standardize on a mp library.

Is there anyone else here who understands why suggestions like this are
so pointless? I can't wait for the language designers to catch up and
force everyone to provide an mp library. I think it's fine that you're
writing libraries, and I fully support any efforts to standardize
libraries. But standardization is SLOW. There is a much larger problem
here, namely all the chip instructions that *aren't* supported by
languages or other standards. Why do you refuse to admit that the
programmer should be able to take advantage of those instructions
without sacrificing portability?

---Dan

msp33327@uxa.cso.uiuc.edu (Michael S. Pereckas) (03/06/91)

In <4267:Mar602:32:0591@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:

>In article <3061@charon.cwi.nl> dik@cwi.nl (Dik T. Winter) writes:
>> it is more important to
>> standardize on a mp library.

>Is there anyone else here who understands why suggestions like this are
>so pointless? I can't wait for the language designers to catch up and
>force everyone to provide an mp library. I think it's fine that you're
>writing libraries, and I fully support any efforts to standardize
>libraries. But standardization is SLOW. There is a much larger problem
>here, namely all the chip instructions that *aren't* supported by
>languages or other standards. Why do you refuse to admit that the
>programmer should be able to take advantage of those instructions
>without sacrificing portability?

Do you really suppose that your idiom recognition with your
pick-this-or-that and adequate documentation will become at all
widespread sooner than a new library?  Or even that it could be
implemented faster if everyone wanted to?

If the language doesn't do what you want it to, is the hack you're
describing really the way to fix it?  Even if it is workable, and
people keep telling you that it isn't, it sure isn't elegant.  It's
not that we don't think you should be able to access machine
instructions that can speed up your code, it's that we think that the
method you describe for doing so is bad and impractical.  And that it
won't come any faster than a standard library.

\begin{:-)}
Besides, it isn't ``object oriented''.  It is ``turbo,'' but turbo
went out a few years ago.  ``Hyper'' still has a bit of life left.  If
you can find a way to call it hyper, maybe you can sell it.  Name it
``fuzzy'' and it will be a hit in Japan.  At least this year.
\end{:-)}

--


Michael Pereckas               * InterNet: m-pereckas@uiuc.edu *
just another student...          (CI$: 72311,3246)
Jargon Dept.: Decoupled Architecture---sounds like the aftermath of a tornado

lamaster@pioneer.arc.nasa.gov (Hugh LaMaster) (03/07/91)

In article <8787@exodus.Eng.Sun.COM> chased@rbbb.Eng.Sun.COM (David Chase) writes:
>brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
>>In the article you responded to, David, I proposed a language construct
>>to let the compiler choose one of several sections of code. 
:
:

>Dan, you're missing the forest for the trees, several times over.

>Even if the general idea that you propose were worthwhile, your scheme
>for accessing special-purpose assembly language instructions from
>portable code is too complicated.  It is gratuitously difficult for
>both the compiler writer and the programmer.  Compare:

This may be true, but I question it.  Vectorizing compilers have been around
in usable form for about a decade.  Before that, some compilers recognized
certain loops and variations to emit specific code sequences (e.g. "Stacklib"
type loops).  (In unusable form, they have been around for two decades :-)   )

Note that vectorizing compilers DO recognize specific multiple statement
language constructs and, sometimes, optimize these constructs down to
a simple code sequence, or, in some cases, even a single instruction.


>Dan's method:
:
>compiler writer writes recognizer for special code sequences.
>compiler writer documents these code sequences.
>programmer reads the code sequences.
>programmer writes these code sequences.

This is done every day with vectorizing compilers.  Programmers write
specific code sequences so that the vectorizer will recognize them. 
The compiler documentation, or,sometimes a separate document, tells
programmers exactly how to write code so that it is recognized by
the vectorizer.

Sometimes, they even write very, very specific sequences that several
different vectorizers from several different vendors will recognize.

Of course, now Parallelizing compilers are the rage.

>Simpler method:
>
>compiler writer writes library of inlineable code.
>compiler writer documents names of routines that are in this library.
>programmer uses those names.
>if you insist, compiler writer writes portable versions of these
>   subroutines, though programmer would have had to do it anyway in
>   your method (working from documentation).


This method is also useful and important.  Why do they have to be
mutually exclusive?

BTW, I do insist that portable versions be available.  Note that Cray
Research provides documentation of equivalent code for many of its
scientific library subroutines.

In other words, both methods have long been done, and are not news.

What Dan is proposing is that scalar oriented machines could benefit
from the same technique that vector machine users have all along.  I
think there probably *are* cases which could benefit from his technique,
even on small workstations.  In fact, as I understand it, some compilers
for RISC micros *do* already recognize, for example, special code sequences
that really mean:  integer div, and remainder also.  There are probably
other cases: for example, code sequences that do multiple precision
arithmetic that could take advantage of a double precision multiply result
(anyone remember the CDC 66xx/Cyber/etc multiply?)


:
>try to come up with ways to make it better.  Somehow, nothing like
>your idea even made it onto the list, let alone near the top.

Of course, this is incorrect for vector machines.  

For scalar machines, the performance payoff is usually a lot less.  But not 
always trivial.  And, as I mentioned above, a few compilers have a few such
cases.  I have seen factors of two increase in performance with older
compilers by recognizing special code sequences and doing the optimal
thing.  This sort of thing is a lot more common in numerical simulation
type codes.  Running typical kernel code, or pointer intensive stuff,
the potential for payoff is very small with today's much better 
optimizers.  And, since some readers of this group don't want to talk
about any other kind of applications, it is no wonder that the perceived
benefit is so small   :-)




  Hugh LaMaster, M/S 233-9,  UUCP:                ames!lamaster
  NASA Ames Research Center  Internet:            lamaster@ames.arc.nasa.gov
  Moffett Field, CA 94035    With Good Mailer:    lamaster@george.arc.nasa.gov 
  Phone:  415/604-6117                            #include <std.disclaimer> 

lamaster@pioneer.arc.nasa.gov (Hugh LaMaster) (03/07/91)

In article <1991Feb28.181948.26958@rice.edu> preston@ariel.rice.edu (Preston Briggs) writes:
:
>> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
:
>>>I found out from e-mail that essentially the same ideas
>>>have been published before, in many separate papers.
:
>>References, please.  I'd love to be enlightened.

>	They defined an intermediate language (IL).

>Unfortunately, the project apparently died after a few years.
:
>Personally, I was frightened by the complexity of their
>intermediate language.  I also wonder how to control
>the huge potential inlining.  Can compile-times be controlled?

I would be curious to know if any of the IL's for vectorizing compilers,
commercial or not, have been published.  I note that several companies,
including Cray and CDC, claim to have vectorized IL's with common
code generators for multiple languages (C, Fortran, et al.)
I believe CDC has been doing this for about 4 years on the Cyber 800/900
series running their own O/S.  Cray has done it more recently with
their Fortran and C compilers.  It seems that vectorization is generally
viewed as a front-end, language-dependent, process, unlike what you might 
expect.

>how his machine works!  Similarly, you aren't going to win arguments
>with David (or Ben) Chase about how compilers work.

Maybe not.  (It wasn't my argument to begin with - pardon my butting my
head in).   But I think that there is more to this than is first apparent.



P.S.

I think there is, indeed, a good case for multiple assignment.
*Some* of the funny optimizations that compilers now have to deal with
are an unnatural artifact of the languages with only a single assignment
operator.

>	The extension many people are arguing for:
>
>		c, d = a divrem b
>
>	is much more significant (at least to languages like C or Fortran).
>	Basically, you've suddenly defined a new language with significantly
>	different syntax.  The front-end of the compiler will probably


  Hugh LaMaster, M/S 233-9,  UUCP:                ames!lamaster
  NASA Ames Research Center  Internet:            lamaster@ames.arc.nasa.gov
  Moffett Field, CA 94035    With Good Mailer:    lamaster@george.arc.nasa.gov 
  Phone:  415/604-6117                            #include <std.disclaimer> 

peter@ficc.ferranti.com (Peter da Silva) (03/07/91)

In article <4267:Mar602:32:0591@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
> At first, instead of learning five different assembly languages and
> figuring out how to write inlined assembly code under five different
> compilers, the programmer will copy five routines straight out of
> manuals.

How does this differ from copying 5 inline assembly examples straight out
of 5 manuals, except that under your plan it's less obvious that the code is
optimised for one particular compiler?

> as time goes by, idioms for a certain instruction can
> only become more standardized,

Right. Unless there is a definite conscious effort to do so, things in general
don't move in that direction. Just look at UNIX. Or code formatting in C.

> Is there anyone else here who understands why suggestions like this are
> so pointless? I can't wait for the language designers to catch up and
> force everyone to provide an mp library.

But you think it's OK to wait for them to catch up and standardise a bunch
of poorly-defined idioms?

> Why do you refuse to admit that the
> programmer should be able to take advantage of those instructions
> without sacrificing portability?

#ifdef frobco
	asm("...");
#else
	count_ones(...);
#endif
-- 
Peter da Silva.  `-_-'  peter@ferranti.com
+1 713 274 5180.  'U`  "Have you hugged your wolf today?"

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (03/09/91)

In article <VDY9J-C@xds13.ferranti.com> peter@ficc.ferranti.com (Peter da Silva) writes:
> In article <4267:Mar602:32:0591@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
> > At first, instead of learning five different assembly languages and
> > figuring out how to write inlined assembly code under five different
> > compilers, the programmer will copy five routines straight out of
> > manuals.
> How does this differ from copying 5 inline assembly examples straight out
> of 5 manuals, except that under your plan it's less obvious that the code is
> optimised for one particular compiler?

It differs in three important ways. First, the code will always remain
portable. In your world, the vendors promise to have those library
routines always do the same thing; otherwise they will break correct
code. In my world, there is no chance of this.

Second, the extra code will remain useful, even if the vendors go out of
business or the architectures die. In your world, someone might have
written code under #ifdef Z80; now that code will be almost completely
useless. In my world, the idioms are not hooked to one vendor or
architecture, and will make the code faster under any compiler
recognizing the same idiom.

Third, the #ifdefs can be dispensed with entirely. In your world, there
is no way to get rid of the #ifdefs, because each compiler will blow up
on the unportable library routines written for the others. In my world,
if the language specifies a pick-this-or-that control structure, you can
just throw away the #ifdefs and the code will work on any machine. It
will also be more resilient to changes in architecture or vendor
support.

> > as time goes by, idioms for a certain instruction can
> > only become more standardized,
> Right. Unless there is a definite conscious effort to do so, things in general
> don't move in that direction. Just look at UNIX.

Yes, let's look at UNIX. Within several years, most machines will
support (for example) POSIX termio, and code written for that termio
will be extremely portable.

I think you're confusing the standardization of *all* system facilities
with the standardization of *one* system facility. BSD will always be a
very different system from System V, but their common base gradually
grows larger and larger. Similarly, although compilers will always
support many different idioms, more and more particular idioms will
become standardized. A sufficiently standardized idiom can even be added
to the language.

> > Is there anyone else here who understands why suggestions like this are
> > so pointless? I can't wait for the language designers to catch up and
> > force everyone to provide an mp library.
> But you think it's OK to wait for them to catch up and standardise a bunch
> of poorly-defined idioms?

No, I don't. That's why my proposal works with absolutely no
standardization of the idioms. Whether the idioms gradually become more
standard or not, they will have the three advantages I outlined above.

---Dan