[comp.lang.misc] Unusual instructions and constructions

hrubin@pop.stat.purdue.edu (Herman Rubin) (03/09/91)

There clearly is a major disagreement on what "should" be in an architecture
or language.  One of the arguments given against including assembler code is
that compiler optimization is at least made more difficult, and portability is
lost.

What people like Montgomery, Silverman, and I, as well as many others, have
pointed out is that if the operation is not in the language, the compiler
cannot do a good job of getting the architecture to do it.  Furthermore, 
while one can always simulate what is wanted by a clumsy sequence of operations
in the language, this is at least extremely likely to generate efficient code.

Optimizing compilers "know" about certain means of translating the language
into organized combinations of hardware operations, and have some abilities
to pick efficient ones.  But they cannot include things about which they do
not have any cognizance.

The "solution" I suggest is to allow the programmer, etc., to set up idioms 
in whatever syntax is easiest to use for that programmer, and to provide the
translations into some adequate intermediate language.  There may be, and in
fact should be, alternate translations.  Even the addition of two vectors to
produce a result should have different C code on different machines.

Presumably, those types of expressions which are found useful will eventually
get into the languages.  But the expressions themselves will have considerable
portability, although if the expression is unknown in the target dialect, a
dictionary will have to be provided.

There are two reasons for posting this to comp.arch as well as comp.lang.misc.
For one, much of the discussion has been there, and it seems that many of the
posters there consider language architecture to be architecture.  For another,
by having this type of "portable" representation of what people want computed,
hardware designers may learn something about costs and tradeoffs which they are
not getting now.
--
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399
Phone: (317)494-6054
hrubin@l.cc.purdue.edu (Internet, bitnet)   {purdue,pur-ee}!l.cc!hrubin(UUCP)

gsteckel@vergil.East.Sun.COM (Geoff Steckel - Sun BOS Hardware CONTRACTOR) (03/09/91)

In article <7499@mentor.cc.purdue.edu> hrubin@pop.stat.purdue.edu (Herman Rubin) writes:
>There clearly is a major disagreement on what "should" be in an architecture
>or language.  One of the arguments given against including assembler code is
>that compiler optimization is at least made more difficult, and portability is
>lost.

One thing lost in the recent debate is the possibility of applying `object'
methodology to the complex operation debate.  The C language already has
the ability to return a structure from a function; the `divrem' function
could return   struct div_and_rem { int quotient ; int remainder } ;

C++ (admitting its many faults) can implement this sort of operation easily.
The usage (not the formal definition) of the language (and object-like languages
in general) includes the idea of `standard object heirarchies'.  These are
currently distributed for C++ from a number of sources.  This seems to be
a way for the {numerical analysts, crypto specialists, molbiogeneticists, etc.}
to introduce their special operators, distribute them, standardize them, etc.

For efficiency of human time, the users of the extended functionality would
have to organize a bit to coordinate development and distribution of prototypes
and specifications.  Once common sets of these functions stabilized a bit,
it would then be more likely that vendors would be willing to invest in special
efforts to improve versions for particular hardware.

Just a thought...
	geoff steckel (gwes@wjh12.harvard.EDU)
			(...!husc6!wjh12!omnivore!gws)
Disclaimer: I am not affiliated with Sun Microsystems, despite the From: line.
This posting is entirely the author's responsibility.

chip@tct.uucp (Chip Salzenberg) (03/12/91)

According to hrubin@pop.stat.purdue.edu (Herman Rubin):
>The "solution" I suggest is to allow the programmer, etc., to set up idioms 
>in whatever syntax is easiest to use for that programmer, and to provide the
>translations into some adequate intermediate language.

A spec, Herman.  Surely you can afford some few hours out of your busy
academic schedule to write a spec.
-- 
Chip Salzenberg at Teltronics/TCT     <chip@tct.uucp>, <uunet!pdn!tct!chip>
 "Most of my code is written by myself.  That is why so little gets done."
                 -- Herman "HLLs will never fly" Rubin

hrubin@pop.stat.purdue.edu (Herman Rubin) (03/15/91)

In article <1991Mar14.013109.16636@kithrup.COM>, sef@kithrup.COM (Sean Eric Fagan) writes:
> In article <11964@pasteur.Berkeley.EDU> jbuck@galileo.berkeley.edu (Joe Buck) writes:
> >Check out this short program:
> 
> [code deleted]
> >It's not optimal: there are two divides.  Still, if you write kindly
> >I'll bet you could talk RMS into seeing if he could recognize the
> >pattern and produce one divide in gcc 2.0 (or for some higher number).
> 
> 
> foop:
| 	pushl %ebp
| 	movl %esp,%ebp
| 	pushl %ebx
| 	movl 8(%ebp),%eax
| 	cltd
| 	idivl 12(%ebp)
| 	movl %eax,%ecx
| 	movl %edx,%ebx
| 	pushl %ebx
| 	pushl %ecx
| 	pushl $.LC0
| 	call printf
| 	leal -4(%ebp),%esp
| 	popl %ebx
| 	leave
| 	ret
> 
> It was already done, for gcc 1.3[89].  Good work, eh?  Yes, the code could
> be optimal:  gcc could look at the entire function, and not bother moving
> from eax to ecx, just pushing them directly.  But those are small amounts
> (two or three cycles, I believe), and are *much* better than the 15+ cycles
> the extra divide would have taken.

This example has far too many loads and stores.  Possibly this MIGHT not be
too important for a division, but how about something like frexp?  The 
operations may be register-register, in which case all these loads and
stores are inappropriate.  Also, something this simple should be inlined;
if a subroutine call, there is the additional save/restore overhead which
has to be done somewhere.

The real need is for the languages and compilers to allow the user to 
introduce idioms, with translation into machine primitives.  In the above
example, idivl is such a primitive, and should be considered no differently
than the various types of subtraction.  The relevant idiom in the above 
example would be 

	q,r = x/y,

where the / is overloaded some more.

If instead r, x, and y were floating point, and q integer, the code would be
quite different, 

Mathematical notation has been developing over centuries, and we still see
many new idioms and overloadings.  It is not necessary to have a committee
to decide what notation will be allowed and what will not.
--
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399
Phone: (317)494-6054
hrubin@l.cc.purdue.edu (Internet, bitnet)   {purdue,pur-ee}!l.cc!hrubin(UUCP)

hrubin@pop.stat.purdue.edu (Herman Rubin) (03/15/91)

In article <2378@tuvie.UUCP>, alex@vmars.tuwien.ac.at (Alexander Vrchoticky) writes:
> hrubin@pop.stat.purdue.edu (Herman Rubin) writes:
> 
> > Even the addition of two vectors to
> > produce a result should have different C code on different machines.
> 
> For the purposes of design diversity? Are you sure you did not want to
> type `machine' instead of `C' there?

My statement is correct as it stands.  The optimal C code to add two vectors
is different on different machines, strange as it may seem.  The different
codes do exactly the same thing, IF all one wants to do is to add the vectors.
If the index has to be used, or in some other cases, one code may be better
than another because of other considerations.

> > But the expressions themselves will have considerable
> > portability, although if the expression is unknown in the target dialect, a
> > dictionary will have to be provided.
> 
> Was hat man im Zusammenhang mit Compilertechnologie unter einem `Dictionary'
> zu verstehen? Unter `Portabilitaet von Ausdruecken'?
> 
> [sorry, i could not resist :-)]

Ich verstehe Deutsch, aber nicht sehr gut.

A human has little difficulty in translating between two computer languages,
and not too much problem between "natural" languages.  Computer programs seem
to have much more of a problem.  I have relatively little difficulty in
translating between mathematical constructions and HLL or machine constructions,
but the current communication channels lack the flexibility for even fairly
efficient compilers to take over.  The compiler writer has provided the 
translation between x = y-z and machine code in such a way that the compiler
can take into account types, locations, etc., in producing good code.  The
same type of translation should be available for other constructs.

If a programmer must make the detailed translation, or even does this for
reasons of efficiency, in each case, the language designers and architects
will not see the uses of the constructs.  If the programmer instead can use
the dictionary, the construct is apparent, rather than its expansion, which
is likely to mean little.  How many would recognize the code for B'A^(-1)C,
A positive definite, without expecting that to be done?
--
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399
Phone: (317)494-6054
hrubin@l.cc.purdue.edu (Internet, bitnet)   {purdue,pur-ee}!l.cc!hrubin(UUCP)

hrubin@pop.stat.purdue.edu (Herman Rubin) (03/16/91)

In article <1991Mar14.195853.27398@kithrup.COM>, sef@kithrup.COM (Sean Eric Fagan) writes:
> In article <7850@mentor.cc.purdue.edu> hrubin@pop.stat.purdue.edu (Herman Rubin) writes:

			......................

> >This example has far too many loads and stores.  
> 
> 9 memory references.  4 necessary for the calling sequence gcc conforms to.
> Three necessary to call another function, since the 'bcs' does not specify
> calling routines with values in registers.   Two more because arguments
> passed in are not in registers.  Leaving a total of 0 unnecessary loads and
> stores.  Could this be improved?  Certainly.  But not by much.

> >Possibly this MIGHT not be
> >too important for a division, but how about something like frexp?  
> 
> I think it was frexp() that I wrote for berkeley using gcc with inline
> assembly.  Uhm... I think it had 7 loads and stores, all but two or three of
> which would disappear if the function got inlined and optimized.
> 
> >The 
> >operations may be register-register, in which case all these loads and
> >stores are inappropriate.  
> 
> Herman:  where are you supposed to get the values from?  Magic?
> 
Computing q,r = a/b should not even consider a subroutine call.  The arguments,
or at least most of them, are likely to be the results of previous operations,
and hence already in registers.  The results are likely to be used in proximal
instructions, and hence kept in registers rather than being stored.  This IS
what decent compilers do for the "standard" operations of + - * / ^ | &.  Other
operations should be treated in the same way, and not as subroutine calls.

> >Also, something this simple should be inlined;
> >if a subroutine call, there is the additional save/restore overhead which
> >has to be done somewhere.
> 
> Jesus.  Guess what, herman:  the routine *was* inlined.  Take a look at the
> original source code again.

It is inlined, but it is still in the nature of a subroutine call.  These
"unusual" constructs should be treated as having general arguments, usually
not in specified locations.  The example given loaded the arguments and
stored the results, even if that were inlined.  Somewhat better would be
to have the arguments in registers specific to the inlining procedure, and
the results in other specified registers.  This is not what a decent compiler
now does for the operations it understands.  The expansion should allow adding
to THAT set of operations.

To summarize, what should be provided is to allow the compiler to accept the
idiom producer's insight into the various ways the job can be done using the
machine instructions or previous idioms, and optimize using this information.
As I understand an inlined subroutine call, it could not merely issue the
instruction 
		idivl a,b,q,r

or for some machines something similar to

		idivl a,b,q
		movl  q',r

where q' is the register adjacent to q, assuming that things were in the 
appropriate registers, and only load/store as needed.
--
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399
Phone: (317)494-6054
hrubin@l.cc.purdue.edu (Internet, bitnet)   {purdue,pur-ee}!l.cc!hrubin(UUCP)