hrubin@pop.stat.purdue.edu (Herman Rubin) (03/09/91)
There clearly is a major disagreement on what "should" be in an architecture or language. One of the arguments given against including assembler code is that compiler optimization is at least made more difficult, and portability is lost. What people like Montgomery, Silverman, and I, as well as many others, have pointed out is that if the operation is not in the language, the compiler cannot do a good job of getting the architecture to do it. Furthermore, while one can always simulate what is wanted by a clumsy sequence of operations in the language, this is at least extremely likely to generate efficient code. Optimizing compilers "know" about certain means of translating the language into organized combinations of hardware operations, and have some abilities to pick efficient ones. But they cannot include things about which they do not have any cognizance. The "solution" I suggest is to allow the programmer, etc., to set up idioms in whatever syntax is easiest to use for that programmer, and to provide the translations into some adequate intermediate language. There may be, and in fact should be, alternate translations. Even the addition of two vectors to produce a result should have different C code on different machines. Presumably, those types of expressions which are found useful will eventually get into the languages. But the expressions themselves will have considerable portability, although if the expression is unknown in the target dialect, a dictionary will have to be provided. There are two reasons for posting this to comp.arch as well as comp.lang.misc. For one, much of the discussion has been there, and it seems that many of the posters there consider language architecture to be architecture. For another, by having this type of "portable" representation of what people want computed, hardware designers may learn something about costs and tradeoffs which they are not getting now. -- Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399 Phone: (317)494-6054 hrubin@l.cc.purdue.edu (Internet, bitnet) {purdue,pur-ee}!l.cc!hrubin(UUCP)
gsteckel@vergil.East.Sun.COM (Geoff Steckel - Sun BOS Hardware CONTRACTOR) (03/09/91)
In article <7499@mentor.cc.purdue.edu> hrubin@pop.stat.purdue.edu (Herman Rubin) writes: >There clearly is a major disagreement on what "should" be in an architecture >or language. One of the arguments given against including assembler code is >that compiler optimization is at least made more difficult, and portability is >lost. One thing lost in the recent debate is the possibility of applying `object' methodology to the complex operation debate. The C language already has the ability to return a structure from a function; the `divrem' function could return struct div_and_rem { int quotient ; int remainder } ; C++ (admitting its many faults) can implement this sort of operation easily. The usage (not the formal definition) of the language (and object-like languages in general) includes the idea of `standard object heirarchies'. These are currently distributed for C++ from a number of sources. This seems to be a way for the {numerical analysts, crypto specialists, molbiogeneticists, etc.} to introduce their special operators, distribute them, standardize them, etc. For efficiency of human time, the users of the extended functionality would have to organize a bit to coordinate development and distribution of prototypes and specifications. Once common sets of these functions stabilized a bit, it would then be more likely that vendors would be willing to invest in special efforts to improve versions for particular hardware. Just a thought... geoff steckel (gwes@wjh12.harvard.EDU) (...!husc6!wjh12!omnivore!gws) Disclaimer: I am not affiliated with Sun Microsystems, despite the From: line. This posting is entirely the author's responsibility.
chip@tct.uucp (Chip Salzenberg) (03/12/91)
According to hrubin@pop.stat.purdue.edu (Herman Rubin): >The "solution" I suggest is to allow the programmer, etc., to set up idioms >in whatever syntax is easiest to use for that programmer, and to provide the >translations into some adequate intermediate language. A spec, Herman. Surely you can afford some few hours out of your busy academic schedule to write a spec. -- Chip Salzenberg at Teltronics/TCT <chip@tct.uucp>, <uunet!pdn!tct!chip> "Most of my code is written by myself. That is why so little gets done." -- Herman "HLLs will never fly" Rubin
hrubin@pop.stat.purdue.edu (Herman Rubin) (03/15/91)
In article <1991Mar14.013109.16636@kithrup.COM>, sef@kithrup.COM (Sean Eric Fagan) writes: > In article <11964@pasteur.Berkeley.EDU> jbuck@galileo.berkeley.edu (Joe Buck) writes: > >Check out this short program: > > [code deleted] > >It's not optimal: there are two divides. Still, if you write kindly > >I'll bet you could talk RMS into seeing if he could recognize the > >pattern and produce one divide in gcc 2.0 (or for some higher number). > > > foop: | pushl %ebp | movl %esp,%ebp | pushl %ebx | movl 8(%ebp),%eax | cltd | idivl 12(%ebp) | movl %eax,%ecx | movl %edx,%ebx | pushl %ebx | pushl %ecx | pushl $.LC0 | call printf | leal -4(%ebp),%esp | popl %ebx | leave | ret > > It was already done, for gcc 1.3[89]. Good work, eh? Yes, the code could > be optimal: gcc could look at the entire function, and not bother moving > from eax to ecx, just pushing them directly. But those are small amounts > (two or three cycles, I believe), and are *much* better than the 15+ cycles > the extra divide would have taken. This example has far too many loads and stores. Possibly this MIGHT not be too important for a division, but how about something like frexp? The operations may be register-register, in which case all these loads and stores are inappropriate. Also, something this simple should be inlined; if a subroutine call, there is the additional save/restore overhead which has to be done somewhere. The real need is for the languages and compilers to allow the user to introduce idioms, with translation into machine primitives. In the above example, idivl is such a primitive, and should be considered no differently than the various types of subtraction. The relevant idiom in the above example would be q,r = x/y, where the / is overloaded some more. If instead r, x, and y were floating point, and q integer, the code would be quite different, Mathematical notation has been developing over centuries, and we still see many new idioms and overloadings. It is not necessary to have a committee to decide what notation will be allowed and what will not. -- Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399 Phone: (317)494-6054 hrubin@l.cc.purdue.edu (Internet, bitnet) {purdue,pur-ee}!l.cc!hrubin(UUCP)
hrubin@pop.stat.purdue.edu (Herman Rubin) (03/15/91)
In article <2378@tuvie.UUCP>, alex@vmars.tuwien.ac.at (Alexander Vrchoticky) writes: > hrubin@pop.stat.purdue.edu (Herman Rubin) writes: > > > Even the addition of two vectors to > > produce a result should have different C code on different machines. > > For the purposes of design diversity? Are you sure you did not want to > type `machine' instead of `C' there? My statement is correct as it stands. The optimal C code to add two vectors is different on different machines, strange as it may seem. The different codes do exactly the same thing, IF all one wants to do is to add the vectors. If the index has to be used, or in some other cases, one code may be better than another because of other considerations. > > But the expressions themselves will have considerable > > portability, although if the expression is unknown in the target dialect, a > > dictionary will have to be provided. > > Was hat man im Zusammenhang mit Compilertechnologie unter einem `Dictionary' > zu verstehen? Unter `Portabilitaet von Ausdruecken'? > > [sorry, i could not resist :-)] Ich verstehe Deutsch, aber nicht sehr gut. A human has little difficulty in translating between two computer languages, and not too much problem between "natural" languages. Computer programs seem to have much more of a problem. I have relatively little difficulty in translating between mathematical constructions and HLL or machine constructions, but the current communication channels lack the flexibility for even fairly efficient compilers to take over. The compiler writer has provided the translation between x = y-z and machine code in such a way that the compiler can take into account types, locations, etc., in producing good code. The same type of translation should be available for other constructs. If a programmer must make the detailed translation, or even does this for reasons of efficiency, in each case, the language designers and architects will not see the uses of the constructs. If the programmer instead can use the dictionary, the construct is apparent, rather than its expansion, which is likely to mean little. How many would recognize the code for B'A^(-1)C, A positive definite, without expecting that to be done? -- Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399 Phone: (317)494-6054 hrubin@l.cc.purdue.edu (Internet, bitnet) {purdue,pur-ee}!l.cc!hrubin(UUCP)
hrubin@pop.stat.purdue.edu (Herman Rubin) (03/16/91)
In article <1991Mar14.195853.27398@kithrup.COM>, sef@kithrup.COM (Sean Eric Fagan) writes: > In article <7850@mentor.cc.purdue.edu> hrubin@pop.stat.purdue.edu (Herman Rubin) writes: ...................... > >This example has far too many loads and stores. > > 9 memory references. 4 necessary for the calling sequence gcc conforms to. > Three necessary to call another function, since the 'bcs' does not specify > calling routines with values in registers. Two more because arguments > passed in are not in registers. Leaving a total of 0 unnecessary loads and > stores. Could this be improved? Certainly. But not by much. > >Possibly this MIGHT not be > >too important for a division, but how about something like frexp? > > I think it was frexp() that I wrote for berkeley using gcc with inline > assembly. Uhm... I think it had 7 loads and stores, all but two or three of > which would disappear if the function got inlined and optimized. > > >The > >operations may be register-register, in which case all these loads and > >stores are inappropriate. > > Herman: where are you supposed to get the values from? Magic? > Computing q,r = a/b should not even consider a subroutine call. The arguments, or at least most of them, are likely to be the results of previous operations, and hence already in registers. The results are likely to be used in proximal instructions, and hence kept in registers rather than being stored. This IS what decent compilers do for the "standard" operations of + - * / ^ | &. Other operations should be treated in the same way, and not as subroutine calls. > >Also, something this simple should be inlined; > >if a subroutine call, there is the additional save/restore overhead which > >has to be done somewhere. > > Jesus. Guess what, herman: the routine *was* inlined. Take a look at the > original source code again. It is inlined, but it is still in the nature of a subroutine call. These "unusual" constructs should be treated as having general arguments, usually not in specified locations. The example given loaded the arguments and stored the results, even if that were inlined. Somewhat better would be to have the arguments in registers specific to the inlining procedure, and the results in other specified registers. This is not what a decent compiler now does for the operations it understands. The expansion should allow adding to THAT set of operations. To summarize, what should be provided is to allow the compiler to accept the idiom producer's insight into the various ways the job can be done using the machine instructions or previous idioms, and optimize using this information. As I understand an inlined subroutine call, it could not merely issue the instruction idivl a,b,q,r or for some machines something similar to idivl a,b,q movl q',r where q' is the register adjacent to q, assuming that things were in the appropriate registers, and only load/store as needed. -- Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399 Phone: (317)494-6054 hrubin@l.cc.purdue.edu (Internet, bitnet) {purdue,pur-ee}!l.cc!hrubin(UUCP)