[comp.sys.m88k] Info about the 88open Consortium and standards

bruceco@jove.cs.pdx.edu (Bruce Coorpender) (11/14/89)

The 88open Consortium was mentioned in a recent article. It is
a group of companies and individuals with a common interest in
the use of the MC88000. One of the primary efforts is in
developing standards so that software can by "shrinkwrapped"
portable across systems by different manufacturers. 

The first set of standards is complete. The separate documents are
listed below. These standards apply to UNIX System V3.2. The
consortium is currently working on the ABI for System V4.0

Binary Compatability Standard (BCS)
Object Compatability Standard (BCS)
Networking and X Window supplements

Each is available from the 88open at the following address at a
member price of $20, and a non-member price of $40.

88pen Consortium, Ltd
8560 SW Salish Lane Suite 500
Wilsonville, OR 97070

503 682 5703 Voice
503 682 5836 FAX

If you have questions or would like more information about 88open,
send email to uunet!88open!bruceco

Bruce Coorpender - Technical Director - 88open

rfg@ics.uci.edu (Ron Guilmette) (11/15/89)

In article <1948@psueea.UUCP> bruceco@jove.cs.pdx.edu (Bruce Coorpender) writes:
>The 88open Consortium was mentioned in a recent article. It is
>
>Binary Compatability Standard (BCS)
>Object Compatability Standard (OCS)

I have a technical question about the BCS and the OCS.

I was unhappy to see that these two standards have mandated (for those who
choose to adhere to them) what I believe is a somewhat sub-optimal function
calling sequence.  While it is true that the abundance of registers on the
88k does significantly improve the efficiency of parameter passing in the
great majority of cases, I have been distressed to see that (a) you
are limited to passing at most 8 words worth of actual parameters in
registers (which seems an unnecessary and artificial limit to me) and
(b) more significantly, there seems to have been a major departure from
the common practice in the good ol' CICS days when callers could assume
that almost all registers (except for the one which got the return value)
were *preserved* across a "standard" call.

It seems to me that (b) effectively ties the hands of many otherwise very
sophisticated modern optimizers (with nice graph-coloring register allocators
of course).

Am I right that this is an artificial "man-made" performance barrier?  If
so, what was the motivation?

Inquiring minds want to know! :-)

// rfg

andrew@frip.WV.TEK.COM (Andrew Klossner) (11/16/89)

[]

	"(a) you are limited to passing at most 8 words worth of actual
	parameters in registers (which seems an unnecessary and
	artificial limit to me)"

Yep.  This covers 98%+ of procedure calls in real-world code.

	"(b) more significantly, there seems to have been a major
	departure from the common practice in the good ol' CICS days
	when callers could assume that almost all registers (except for
	the one which got the return value) were *preserved* across a
	"standard" call."

Registers r14 and up are preserved.  That's more registers than those
good ol' CISC machines preserved.

	"It seems to me that (b) effectively ties the hands of many
	otherwise very sophisticated modern optimizers (with nice
	graph-coloring register allocators of course)."

Not at all.  If you have the global program view necessary to do
graph-coloring (you are generating the code for all calls to the
caller), then it's fairly obvious that you can generate any code you
want, and need not conform to the standard interface.  The OCS serves
only to guide compilers/linkers in making procedure interfaces for
callers that they do not control.

  -=- Andrew Klossner   (uunet!tektronix!frip.WV.TEK!andrew)    [UUCP]
                        (andrew%frip.wv.tek.com@relay.cs.net)   [ARPA]

terryl@tekcrl.LABS.TEK.COM (11/16/89)

In article <1989Nov14.175806.23483@paris.ics.uci.edu> Ron Guilmette <rfg@ics.uci.edu> writes:
+In article <1948@psueea.UUCP> bruceco@jove.cs.pdx.edu (Bruce Coorpender) writes:
+>The 88open Consortium was mentioned in a recent article. It is
+>
+>Binary Compatability Standard (BCS)
+>Object Compatability Standard (OCS)
+
+I have a technical question about the BCS and the OCS.
+
+I was unhappy to see that these two standards have mandated (for those who
+choose to adhere to them) what I believe is a somewhat sub-optimal function
+calling sequence.  While it is true that the abundance of registers on the
+88k does significantly improve the efficiency of parameter passing in the
+great majority of cases, I have been distressed to see that (a) you
+are limited to passing at most 8 words worth of actual parameters in
+registers (which seems an unnecessary and artificial limit to me) and
+(b) more significantly, there seems to have been a major departure from
+the common practice in the good ol' CICS days when callers could assume
+that almost all registers (except for the one which got the return value)
+were *preserved* across a "standard" call.
+
+It seems to me that (b) effectively ties the hands of many otherwise very
+sophisticated modern optimizers (with nice graph-coloring register allocators
+of course).

     Actually, there is a very good reason it was done this way: in many
functions, the caller has intimate knowledge of which registers it knows
are "alive", and which registers it knows are "dead". "alive" registers
must be preserved across procedure calls, and "dead" registers don't. The
called function may or may not have as much intimate knowledge of the needs
of the caller's register set.

+Am I right that this is an artificial "man-made" performance barrier?  If
+so, what was the motivation?

      Well, IMHO, you're wrong. It is NOT an artificial "man-made" performance
barrier; as with almost all decisions, it was the traditional time/space per-
formance trade offs. What has actually happened is that the register set was
divided into FOUR sets: the first set is the parameter passing set (which is
eight words worth of parameters, as Mr. Guilmette has noted, but there are
additional registers used for structure return, etc.), a temporary register set
(which are NOT preserved across procedure calls), and a permanent register set
(which are preserved across procedure calls). The fourth register set is re-
served for linker use; they are used to access global data more efficiently
(read the BCS for further info on this).

     The traditional time/space performance trade offs come into play when a
compiler decides to put a variable into either the temporary register set or
the permanent register set; I'm no compiler person, but it seems reasonable
to me that a compiler can make a better distinction when compiling a procedure
to know whether a register should be preserved across a call or not, instead of
preserving ALL of the registers (or just the registers the called procedure
uses, as in most current <hardly optimizing> C compilers). The rationale behind
this is that if the called procedure were to save all registers (or even just
the ones it used), it might be doing some unneeded saving and restoring of
registers.

     But the opposite (where NO registers are saved across procedure calls; it
is the responsibility of the calling procedure to save whichever registers it
will need later) is probably too restrictive; it puts too much burden on the
compiler writer (not to mention all the assembler writers, but it's not a
pretty sight to program this beasty in assembler!!! (-:). So, a compromise
was struck: a temporary register set, and a permanent register set. Now, one
can argue that one set should be larger than the other, but that's where one's
personal religious beliefs set in, and we all know what that means!!! (-:



		Just one person's religious beliefs

			Terry Laskodi
			     of
			Tektronix

pfeiffer@nmsu.edu (Joe Pfeiffer) (11/16/89)

terryl@tekcrl.LABS.TEK.COM writes in <5063@tekcrl.LABS.TEK.COM>:

|     But the opposite (where NO registers are saved across procedure calls; it
|is the responsibility of the calling procedure to save whichever registers it
|will need later) is probably too restrictive; it puts too much burden on the
|compiler writer (not to mention all the assembler writers, but it's not a

This is probably getting a bit far afield, but I'm very puzzled by
this statement.  It has always been my impression that it doesn't much
matter whether the registers are preserved by the caller or the
callee, just so somebody does it.  Why is there a greater burden on
the compiler writer if it's the callee?

-Joe Pfeiffer.

pfeiffer@nmsu.edu (Joe Pfeiffer) (11/16/89)

----fumbling fingers, in that last post I meant to ask about the
calleR, not the calleE.

-Joe.

marvin@oakhill.UUCP (Marvin Denman) (11/17/89)

In article <PFEIFFER.89Nov15213136@puye.nmsu.edu> pfeiffer@nmsu.edu (Joe Pfeiffer) writes:
>terryl@tekcrl.LABS.TEK.COM writes in <5063@tekcrl.LABS.TEK.COM>:
>
>|     But the opposite (where NO registers are saved across procedure calls; it
>|is the responsibility of the calling procedure to save whichever registers it
>|will need later) is probably too restrictive; it puts too much burden on the
>|compiler writer (not to mention all the assembler writers, but it's not a
>
>This is probably getting a bit far afield, but I'm very puzzled by
>this statement.  It has always been my impression that it doesn't much
>matter whether the registers are preserved by the caller or the
>callee, just so somebody does it.  Why is there a greater burden on
>the compiler writer if it's the callee?
>
>-Joe Pfeiffer.


The answer probably lies in the fact that many leaf calls will not require 
enough registers that they need to save anything if they know which registers 
are guaranteed to be "dead".  These are by frequency a large percentage of 
calls so they should not be penalized.  Using the current standard a leaf 
routine can use all of the temporary set of registers plus all of the 
parameter passing registers without any saving.  Why should the caller save 
"all" of his live registers when only a small fraction will be used?  The 
converse argument is why should the callee be required to save dead registers?
The current scheme is a fairly efficient compromise between the two, that will
do in the absence of global register allocation.  


-- 

Marvin Denman
Motorola 88000 Design

rfg@ics.uci.edu (Ron Guilmette) (11/17/89)

In article <5352@orca.WV.TEK.COM> andrew@frip.wv.tek.com writes:
>
>	"(b) more significantly, there seems to have been a major
>	departure from the common practice in the good ol' CICS days
>	when callers could assume that almost all registers (except for
>	the one which got the return value) were *preserved* across a
>	"standard" call."
>
>Registers r14 and up are preserved.  That's more registers than those
>good ol' CISC machines preserved.

But of course, that's not the point Andrew.  If we can do better still with
this hardware, why not try?

>
>	"It seems to me that (b) effectively ties the hands of many
>	otherwise very sophisticated modern optimizers (with nice
>	graph-coloring register allocators of course)."
>
>Not at all.  If you have the global program view necessary to do
>graph-coloring (you are generating the code for all calls to the
>caller), then it's fairly obvious that you can generate any code you
>want, and need not conform to the standard interface.  The OCS serves
>only to guide compilers/linkers in making procedure interfaces for
>callers that they do not control.

But in practice, you can still do graph coloring register allocation
*without* perfect (ultra-global) information about standard library
routines and if you try to do this, you will be better off (performance-
wise) if you can allocate more registers because you know that
they will not be clobbered across calls to library routines.

Right now people can't take advantage of this possibility because the
OCS is treated as gospel for the building of standard libraries, meaning
that users of those libraries, even if they have otherwise perfect global
data-flow information or *their* routines are still forced to do unnecessary
stores and reloads of registers across calls to the standard library routines.

// rfg