[comp.sys.apollo] 68k compiler target defaults

rehrauer@apollo.HP.COM (Steve Rehrauer) (07/16/90)

In article <1990Jul13.195714.6078@terminator.cc.umich.edu> rees@citi.umich.edu (Jim Rees) writes:
>In article <9007131550.AA25981@pan.ssec.honeywell.com>,
>thompson%pan@UMIX.CC.UMICH.EDU (John Thompson) writes:
>  
>  >  e) do gcc and X11R4 compile with no more than the usual mucking around?
>  BeatsMe, but from comments on the net, I'd say no.
>
>I've got both these running here, and didn't have to do anything special
>other than add "-A cpu,3000" to the CFLAGS (I wish Apollo would make this
>the default).

Interesting.  Are you doing this for performance reasons, I assume?  If
not, then ignore me as I talk through my hat below...

The so-called "CR1.0" 68k compilers (we've decoupled our OS and language
releases, so I can't say "SR10.3" or some such -- "CR1.0" means revision
10.8 FORTRAN, 8.8 Pascal, 6.8 C), which are now in Beta test, will "sort
of" have this.  This is going to be a potentially confusing issue for our
customers who use our compilers, so let me see if I can do a little advance
hand-waving.

We in R&D believe that all but a handful of our installed 68k customer
base is running on "-cpu 3000" machines by now.  What we hear from the
field is that there's relatively few 100/300/320/400/420/460/660s still
out there.  So it's been, if not an actual thorn, at least a niggling
splinter in our sides that the default machine target for the compilers
has been "-cpu any".  "-cpu any" code runs everywhere on any 68k nodes,
but hardly well.

We also realize that there's been an ever-proliferating, somewhat schizoid
set of "-cpu" options to support new hardware.  The now-announced 68040-based
hardware waggled the temptation to add Yet Another "-cpu" option, but what
to call it?  "-cpu m040"?  "-cpu some_HP_model_number"?

We hope that what we've done for CR1.0 makes a bit more sense.  First, we
preserved all existing "-cpu" options, so nearly everyone can ignore the
whole issue if they wish.  Second, we added three new "-cpu" options:
mathchip, mathlib and mathlib_sr10.  ("THREE" new options?  Are we insane??
Well, let me explain the rationale and then you decide.)  Third, we made
"mathlib_sr10" the default; more in a moment.

"Mathchip" is merely a synonym for any of the following options (I'm not
even sure all of these are documented): 3000, m881, 68881, 560, 330, 570,
580, 5700.  Code will run optimally on a DN5xx (without FPX board), or
any of DN330/550/2500/3x00/4x00 (without FPA board), when compiled "-cpu
mathchip".  "Mathchip", as the name implies, assumes that you have Motorola
floating-point hardware (i.e.: a 68881/68882 chip) in your node.

The 68040 hardware was the impetus behind "mathlib" and "mathlib_sr10".
As some of you may know, the 68040 processor has floating-point support
on-chip.  Motorola chose not (actually, I suspect limits on physical
die-size "chose not" :-) to implement the full instruction-set of their
'881/'882 coprocessors.  Unimplemented instructions can of course be
emulated in kernel code (actually MUST be, so that existing code will
run on an '040 node without recompilation), but that is SLOW.  It's far
better for the compilers to avoid generating such instructions.

Hence, "mathlib" and "mathlib_sr10" both tell the compiler to assume that
it can use floating-point instructions that an '040 can execute.  "Mathlib"
tells the compiler that f.p. ops that an '040 can't execute should be
implemented via a new user-space f.p. library, supported in SR10.3 and
above.  "Mathlib_sr10" tells the compiler to use the existing f.p. library,
which has been present as far back as anyone reading this cares about --
certainly at SR9.7 and above -- for those ops.

The advantage of "mathlib" is that the interface to the new library has been
crafted for speed.  Code compiled "-cpu mathlib" will run nearly optimally
on any node for which "-cpu mathchip" will generate optimal code.  The
advantage of "-cpu mathlib_sr10" is that the compiled code will be backwards-
compatible with a much wider range of OS revisions, at some sacrifice of
speed versus "mathlib".  When you hear "mathlib" and "mathlib_sr10", think
"new floating-point library" and "old floating-point library", respectively.

The upshot is that "mathlib_sr10" is the default machine target for the
68k compilers at CR1.0.  If you:

    -   Have a PEB machine (e.g.: 320/420), or any of the "old" nodes that
        I mentioned at the outset of this, then to use the CR1.0 compilers
        you'd need to explicitly ask for "-cpu peb" or "-cpu any".  Note
        that pre-CR1.0, you automatically got this; at CR1.0 and beyond, you
        won't.

    -   Want to use the CR1.0 compilers to generate code that MAY need to
        run on a PEB or similarly "old" hardware, you should explictly ask
        for "-cpu any" to ensure that your code runs everywhere.  Note that
        pre-CR1.0, you automatically got this; at CR1.0 and beyond, you
        won't.

    -   Have a "-cpu 3000"-capable machine and have just been using the
        default compiler settings, and especially if you compile floating-
        point programs, you can continue to do nothing special.  You should
        be pleased with the runtime performance boost of your code, though.

    -   Have a "-cpu 3000"-capable machine and have been asking for "-cpu
        3000" because your applications are performance-critical, you should
        probably continue to ask for "-cpu 3000" (or "mathchip" if you want
        to take the trouble to learn the new name).

    -   Have one of the new 68040-based nodes (we're talking of the future
        now, of course), then the default will run nearly optimally, "mathlib"
        will run optimally, and "-cpu 3000/mathchip" may run optimally or
        nearly so (depending upon the mix of floating-point operations in
        your code).

All of this stuff will be explained in more (and probably more lucid) detail
in the release notes & man pages.  I'm just a code-gen weenie, not a tech-
writer.  Hope I didn't bore everyone to tears; apologies for spending so much
bandwidth on the issue.

P.S. - We kicked around a number of names, and while we didn't exactly fall
       madly in love with "mathchip/mathlib/mathlib_sr10", we couldn't arrive
       at a better set that intuitively manages to convey what the compiler
       does with them.  It's too late for the names to change, but if anyone
       has a better trio of names, I'd be happy to hear them anyway. :-)
--
   >>"Aaiiyeeee!  Death from above!"<<     | (Steve) rehrauer@apollo.hp.com
"Spontaneous human combustion - what luck!"| Apollo Computer (Hewlett-Packard)

Jacques_Gelinas@CMR001.BITNET (07/16/90)

             >>           add "-A cpu,3000" to the CFLAGS
             >>          (I wish Apollo would make this the default)
>    Interesting.  Are you doing this for performance reasons?

I have also been doing this "customer" optimisation dozen of times.
Since i am lucky to be able to use a DN4000, i want to take full
advantage of it (even if it is like a DN3000). Otherwise, why pay
extra for advaced machines when the software ignores their features?

>               This is going to be a potentially confusing issue for our
>               customers who use our compilers

Indeed. But if they survived through the GPR routines changes (&status)
they should accept this.

>               we preserved all existing "-cpu" options,

Thanks! Very good (see my remark about GPR above).

>               Second, we added three new "-cpu" options
>               mathchip, mathlib and mathlib_sr10.

No, thanks. This is not enough. As a simple user -who complains a lot-
i want an internal flag to be turned on during the installation
procedure telling the compiler what machines are on my networrk
(many DN3500, 3 DN4000 ). The compiler should adjust its complicated
flags automagically based on this information. Of course:
- The individual users can overwrite this on the command line
- The hpollo supplied scripts would have to be rewritten to supply
  all the flags instead of relying on these varying defaults.
- Third party software developpers would also have to modify their
  scripts if they want the generic compiler (to avoid -O bugs, say).

DO NOT ASSUME THAT ORDINARY USERS READ SOFTWARE RELEASE NOTES PLEASE.
(There is too much to be done after installation to spend 3 hours
 on this, according to me..., and once it works, why bother?)

Hope i have been direct enough to avoid wasting bandwidth.

rehrauer@apollo.HP.COM (Steve Rehrauer) (07/16/90)

In article <900715.20155735.017211@CMR.CP6> Jacques_Gelinas@CMR001.BITNET writes:
>>               Second, we added three new "-cpu" options
>>               mathchip, mathlib and mathlib_sr10.
>
>No, thanks. This is not enough. As a simple user -who complains a lot-
>i want an internal flag to be turned on during the installation
>procedure telling the compiler what machines are on my networrk
>(many DN3500, 3 DN4000 ). The compiler should adjust its complicated
>flags automagically based on this information.

Hmmm, you're asking for the compiler to do a network poll to determine
the mix of nodes, and pick the most appropriate least-common-denominator
-cpu settings?  Or just a poll-and-set-it-once-when-the-compiler-is-installed
sort of environment flag?  The former seems like an unreasonable burden on
the average user -- I can imagine that large networks (of which we have one)
might take more effort to poll than the compilation would.

I could live with the latter; something like the ISP environ var?  Perhaps
M68K_COMPILER_CPU or some such?  Seems like a possibility for Real Confusion
if the install doesn't go smoothly, though -- are we talking about a question
you answer, or something the install is supposed to find out for itself?  (I
suppose the best solution would be for the install to try to figure it out,
and then ask for you to verify what it has found...)  Anyway, it's too late
in the release cycle for me to do anything like that for CR1.0; my poor boss
would be a wreck. :-)  I'm not even sure that I have any say over the install
script for the compilers.  I'll take the suggestion in consideration for the
next release, though; thanks.

If DN3x00's and 4x00's are all you have on your network, you really should
see a *BIG* improvement at CR1.0 compiling with the default -cpu setting
(mathlib_sr10).  At least now your compilers can assume that you have f.p.
regs and better-than-68010 addressing modes / instructions at your disposal...

>DO NOT ASSUME THAT ORDINARY USERS READ SOFTWARE RELEASE NOTES PLEASE.

I know, I know -- I don't.  That's one of the reasons I mentioned the matter
here; I believe in spreading information as widely, in as many ways, as early
as possible.  Isn't that what many of the recent complaints here have been
about: not getting needed information?  If I knew the answers to questions
about SCSI devices or SR10.2p, believe me, I'd help if I could.  All I can
do is to watch here & hope someone asks something about our compilers...
--
   >>"Aaiiyeeee!  Death from above!"<<     | (Steve) rehrauer@apollo.hp.com
"Spontaneous human combustion - what luck!"| Apollo Computer (Hewlett-Packard)

chen@digital.sps.mot.com (Jinfu Chen) (07/17/90)

In article <4ba03a18.20b6d@apollo.HP.COM> rehrauer@apollo.HP.COM (Steve Rehrauer) writes:
>Hmmm, you're asking for the compiler to do a network poll to determine
>the mix of nodes, and pick the most appropriate least-common-denominator
>-cpu settings?  Or just a poll-and-set-it-once-when-the-compiler-is-installed
>sort of environment flag?  The former seems like an unreasonable burden on
>the average user -- I can imagine that large networks (of which we have one)
>might take more effort to poll than the compilation would.

Although I'm not sure I'm in favor of customizable compiler option, there's
a way to do that for HP/Apollo. Consider imake that comes with X11, one
can have a site.def file to have all his favor compiler options in it.

>If DN3x00's and 4x00's are all you have on your network, you really should
>see a *BIG* improvement at CR1.0 compiling with the default -cpu setting
>(mathlib_sr10).  At least now your compilers can assume that you have f.p.
>regs and better-than-68010 addressing modes / instructions at your disposal...

On a seperate issue, when are we going to see a version of Domain/OS
compiled with `020/030/040 option? Perhaps at least all the compilers
should be made with the option? I serious doubt there're many pre-020
cpu nodes around.

-- 
Jinfu Chen                  (602)898-5338      |
Motorola, Inc.  SPS  Mesa, AZ                  |
 ...uunet!motsps!digital!chen                  |
chen@digital.sps.mot.com                       |
CMS: RXFR30 at MESAVM                          |
----------

rehrauer@apollo.HP.COM (Steve Rehrauer) (07/17/90)

In article <4ba189ba.12c9a@digital.sps.mot.com> chen@digital.sps.mot.com (Jinfu Chen) writes:
>On a seperate issue, when are we going to see a version of Domain/OS
>compiled with `020/030/040 option?

It was my impression that the relevant SAU's for '020/'030/'040 machines
_were_ being built with the appropriate -cpu settings.  I believe Domain
was built -cpu 3000 for my DN4000 here.  Perhaps I'm mistaken.  Or do I
misunderstand you?

> Perhaps at least all the compilers
>should be made with the option? I serious doubt there're many pre-020
>cpu nodes around.

No, we didn't want to outright orphan all the older nodes.  There aren't
many around, but enough to deserve some minimum level of support.  If we
built the compilers -cpu mathlib_sr10, the older nodes couldn't run them
at all.  It's one thing to sacrifice convenience of the few for sake of
performance of the many; it's another to obsolete the few entirely.  We
aren't quite there yet for the compilers.  (Ignoring for a moment the
question of whether a DN320 would be able to do any meaningful work with
recent compilers, given their present 'heft'...)

--
   >>"Aaiiyeeee!  Death from above!"<<     | (Steve) rehrauer@apollo.hp.com
"Spontaneous human combustion - what luck!"| Apollo Computer (Hewlett-Packard)

chen@digital.sps.mot.com (Jinfu Chen) (07/18/90)

In article <4ba5b9c8.20b6d@apollo.HP.COM> rehrauer@apollo.HP.COM (Steve Rehrauer) writes:
>In article <4ba189ba.12c9a@digital.sps.mot.com> chen@digital.sps.mot.com (Jinfu Chen) writes:
>>On a seperate issue, when are we going to see a version of Domain/OS
>>compiled with `020/030/040 option?
>
>It was my impression that the relevant SAU's for '020/'030/'040 machines
>_were_ being built with the appropriate -cpu settings.  I believe Domain
>was built -cpu 3000 for my DN4000 here.  Perhaps I'm mistaken.  Or do I
>misunderstand you?

I wasn't clear on this. What I meant is the utilies such as ones in /com
/usr/apollo/bin, etc. Yes you're correct in /sau*.

>> Perhaps at least all the compilers
>>should be made with the option? I serious doubt there're many pre-020
>>cpu nodes around.
>
>No, we didn't want to outright orphan all the older nodes.  There aren't
>many around, but enough to deserve some minimum level of support.  If we
>built the compilers -cpu mathlib_sr10, the older nodes couldn't run them
>at all.  It's one thing to sacrifice convenience of the few for sake of
>performance of the many; it's another to obsolete the few entirely.  We
>aren't quite there yet for the compilers.  (Ignoring for a moment the
>question of whether a DN320 would be able to do any meaningful work with
>recent compilers, given their present 'heft'...)

There is an alternative, although it may add some extra works to HP/Apollo.
You can distribute the -cpu mathlib_sr10 compilers as the default in
/usr/apollo/lib, and also make following available:

cc.m68k.any ftn.m68k.any pas.m68k.any

Better yet, use symbolic link:

cc.m68k		-> cc.m68k.020
ftn.m68k	-> ftn.m68k.020
pas.m68k	-> pas.m68k.020

and let user use /install/tools/config to decide which one (020 or any) 
should be installed as default. In this way, majority will get 
performance boot, and yet oldder hardwares still can use them 
(if they want :-).


-- 
Jinfu Chen                  (602)898-5338      |
Motorola, Inc.  SPS  Mesa, AZ                  |
 ...uunet!motsps!digital!chen                  |
chen@digital.sps.mot.com                       |
CMS: RXFR30 at MESAVM                          |
----------

krowitz@RICHTER.MIT.EDU (David Krowitz) (07/19/90)

I think the idea was that all of the /com, /lib, /usr, /bin, /etc directories
and their bretheren are currently compiled with the -cpu any switch; and
that it might be nice to have a Domain/OS SR10.2.020 in addition to the
currently available SR10.2 (ie. -cpu any) and the SR10.2.p (ie. -cpu a88k).

Although the system utilities, in general, do not have a lot of floating
point code in them, the 020/030/040 have faster memory addressing modes
than those generated by the -cpu any switch. Using -cpu 3000 when compiling
something like /etc/tcpd just might result in a nice performance boost
for network functions.


 -- David Krowitz

krowitz@richter.mit.edu   (18.83.0.109)
krowitz%richter.mit.edu@eddie.mit.edu
krowitz%richter.mit.edu@mitvma.bitnet
(in order of decreasing preference)

rees@dabo.ifs.umich.edu (Jim Rees) (07/19/90)

In article <4ba5b9c8.20b6d@apollo.HP.COM>, rehrauer@apollo.HP.COM (Steve
Rehrauer) writes:
  It was my impression that the relevant SAU's for '020/'030/'040 machines
  _were_ being built with the appropriate -cpu settings.  I believe Domain
  was built -cpu 3000 for my DN4000 here.  Perhaps I'm mistaken.  Or do I
  misunderstand you?

The only saus that aren't necessarily built with the correct target machine
type are sau2 and sau3, which are both shared by 010 and 020 machines
(dn300/330 and dsp80/90).  These are built for 010 target.

I once built a sau2/aegis compiled for 020.  I couldn't notice any
difference in size or performance, but I didn't actually measure it.

Since there is no floating point in aegis I wouldn't expect it to make a lot
of difference.

You might see some improvement by compiling everything else with the correct
-cpu, but then Apollo would have to ship two sets of software, you would
have to worry about booting 010 nodes diskless off of 020 nodes, and it
would be a big headache.