[comp.arch] SUN procedure inlining

schmitz@fas.ri.cmu.edu (Donald Schmitz) (03/31/89)

In article david@sun.com  writes:
>In article sclafani@jumbo.dec.com (Michael Sclafani) writes:
>>Results for the Sun-3/60 are not reported because the data in
>>[Presentation on Benchmarks given at Sun User Group Conference, Dec 5-7,
>>1988 by Sun Microsystems, Inc.] uses compiler optimization level 4 which
>>employs procedure inlining.
>
>FYI, this is incorrect.  Current Sun C compilers do not perform procedure
>inlining at any optimization level.
>
>David DiGiacomo, Sun Microsystems, Mt. View, CA  sun!david david@sun.com

Maybe you should look again -  our SUN3 C compiler has a limited, but
occasionally useful procedure inline facility - and the standard inline
files have 1986 SUN copywrite notices in them.  Its actually pretty crude,
you compile the source with a .il file (assembly code with a few special
directives), and all of the original jsr's are macro replaced with the
assembler for the routine.  It doesn't inline argument passing, but it does
eliminate the jsr/rts overhead.

Don Schmitz
-- 

khb%chiba@Sun.COM (chiba) (04/01/89)

In article <4614@pt.cs.cmu.edu> schmitz@fas.ri.cmu.edu (Donald Schmitz) writes:
>In article david@sun.com  writes:
>>In article sclafani@jumbo.dec.com (Michael Sclafani) writes:
>>>Results for the Sun-3/60 are not reported because the data in
>>>[Presentation on Benchmarks given at Sun User Group Conference, Dec 5-7,
>>>1988 by Sun Microsystems, Inc.] uses compiler optimization level 4 which
>>>employs procedure inlining.
>>
>>FYI, this is incorrect.  Current Sun C compilers do not perform procedure
>>inlining at any optimization level.
>>
>>David DiGiacomo, Sun Microsystems, Mt. View, CA  sun!david david@sun.com
>

>Maybe you should look again -  our SUN3 C ....
>occasionally useful procedure inline facility - and the standard inline
>files have 1986 SUN copywrite notices in them.  Its actually pretty crude,
>you compile the source with a .il file (assembly code with a few special
>directives), and all of the original jsr's are macro replaced with the
>assembler for the routine.  It doesn't inline argument passing, but it does
>eliminate the jsr/rts overhead.
>

For the record......

-O4 has nothing to do with inlining (in any product currently released).

/usr/lib/f77/libm.il 
/usr/lib/libm.il

exist for cc and f77 on ALL sun platforms. The names vary a bit to
account for hardware (e.g. /usr/lib/f77/f68881 means use the f77
inline library for 68881 instructions).

Cheers.




Keith H. Bierman
It's Not My Fault ---- I Voted for Bill & Opus

guy@auspex.auspex.com (Guy Harris) (04/01/89)

>>>Results for the Sun-3/60 are not reported because the data in
>>>[Presentation on Benchmarks given at Sun User Group Conference, Dec 5-7,
>>>1988 by Sun Microsystems, Inc.] uses compiler optimization level 4 which
>>>employs procedure inlining.
>>
>>FYI, this is incorrect.  Current Sun C compilers do not perform procedure
>>inlining at any optimization level.
>
>Maybe you should look again

No, there's not much point in looking again; David is talking about
*general* procedure inlining, which the current Sun compilers do not do,
at least not to the best of my knowledge - a second look will almost
certainly confirm that.  This is quite different from the very
specialized inlining that the "inline" program performs (as I remember,
it performs inlining on the assembly-language output from the compiler),
so if the claim is that the 3/60 results used general procedure
inlining, the claim seems suspicious to me.  The only ".il" files I
could find on any of the 4.0 machines around here (both 68K and SPARC)
are 1) files for "libm" - in several flavors for the 68K machines - and
2) some for doing loads from possibly-misaligned locations. 

As far as I know, Dhrystone (the benchmark to which the comment about
the 3/60 refers) doesn't make heavy use of floating point (it's supposed
to be a synthetic systems programming benchmark, and systems programming
tends not to make heavy use of floating point) or the math library (if
it makes any use of it at all); furthermore, the 3/60 doesn't make use
of SPARC templates for handling misaligned data (it can handle it on its
own).  As such, "inline" is completely irrelevant to the discussion.

jamesa@arabian.Sun.COM (James D. Allen) (04/01/89)

In article <4614@pt.cs.cmu.edu>, schmitz@fas.ri.cmu.edu (Donald Schmitz) writes:
|From: schmitz@fas.ri.cmu.edu (Donald Schmitz)
|Newsgroups: comp.arch
|Subject: SUN procedure inlining(was i860 Dhrystones)
|
|In article david@sun.com  writes:
|>In article sclafani@jumbo.dec.com (Michael Sclafani) writes:
|>>Results for the Sun-3/60 are not reported because the data in
|>>[Presentation on Benchmarks given at Sun User Group Conference, Dec 5-7,
|>>1988 by Sun Microsystems, Inc.] uses compiler optimization level 4 which
|>>employs procedure inlining.
|>
|
|	... our SUN3 C compiler has a limited, but
|occasionally useful procedure inline facility - and the standard inline
|files have 1986 SUN copywrite notices in them.  Its actually pretty crude,

	pretty crude?  it got rave reviews in our real-time controller
	application, virtually eliminating any issue of asm-interface
	speed deficit.  You don't even have to put on your assembly-language
	hat if you don't want to since you can run "cc -S" on the functions to
	be inlined and touch up the output.

|you compile the source with a .il file (assembly code with a few special
|directives), and all of the original jsr's are macro replaced with the
|assembler for the routine.  It doesn't inline argument passing, but it does
|eliminate the jsr/rts overhead.

	Perhaps you neglected to use "cc -O" ( "-O1" is all you need if you
	want to talk about optimization levels).  Even before SunOS 4.0, the
	peephole optimizer "knew" about inline code's argument passing and
	typically optimizes the argument-passing down to *nothing*.
|
|Don Schmitz
|-- 

James Allen

Disclaimer: I am not authorized to speak for Sun, but hopefully they won't
object if it's complimentary material.	:-)

mash@mips.COM (John Mashey) (04/02/89)

In article <97048@sun.Eng.Sun.COM> khb@sun.UUCP (chiba) writes:
....
>For the record......
>
>-O4 has nothing to do with inlining (in any product currently released).

People maybe have been confusing this with MIPS -O4, which DOES do inlining.
Note that the MIPS Performance Briefs have consistently observed that
we weren't sure what the higher Sun optimizations did, but we knew they
didn't do inlining, and so cited those numbers.
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	{ames,decwrl,prls,pyramid}!mips!mash  OR  mash@mips.com
DDD:  	408-991-0253 or 408-720-1700, x253
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086

schmitz@fas.ri.cmu.edu (Donald Schmitz) (04/04/89)

(Lots of people writing)
>>>FYI, this is incorrect.  Current Sun C compilers do not perform procedure
>>>inlining at any optimization level.              ^^^^^^^^^^^^^^^^^^^^^^^^
>> ^^^^^^^^
>>(I wrote) Maybe you should look again
>
>No, there's not much point in looking again; David is talking about
>*general* procedure inlining, which the current Sun compilers do not do,
 ^^^^^^^^
>at least not to the best of my knowledge - a second look will almost
>certainly confirm that.  This is quite different from the very
>specialized inlining that the "inline" program performs (as I remember,
>it performs inlining on the assembly-language output from the compiler),
>so if the claim is that the 3/60 results used general procedure
>inlining, the claim seems suspicious to me.  The only ".il" files I
>could find on any of the 4.0 machines around here (both 68K and SPARC)
>are 1) files for "libm" - in several flavors for the 68K machines - and
>2) some for doing loads from possibly-misaligned locations. 

To put an end to this, I was responding to the statement that SUN compilers
*do no procedure inlining*, looking at the man entry will tell you this
isn't true.  The original message regarding the -O4 switch was very likely
an error, as far as I know there is only one -O switch for the SUN3 family.
However, even the crude inlining capability available on the SUN can be used
to speed up Dhrystone.  Those magic .il files are just assembler with some
special directives thrown in, I could easily write a strcmp and strcpy .il
file and compile the source with them.  Assuming I didn't further massage
the assembly code, this would still speed up every subroutine call (it
eliminates jsr, link, unlink, and rts, all multi cycle instructions).  It
would also improve code locality, improving I cache performance (on SUNs with
I caches).  Since the original post seemed to indicate someone had played
loose with the Dhrystone rules, this seemed like a very possible way it
could have been done.

Don Schmitz
-- 

grunwald@flute.cs.uiuc.edu (04/05/89)

The inlining done by the Greenhills compiler could be done by a
peep-hole phase, but it's unlikely.  When you say

		strcpy(foo, "this is a string")

Greenhills turns this into a block move, because it already knows the
length of the string. This beats the pants of a ``move-until-null-byte''
loop.

I noticed this when comparing greenhills C to Gnu C for the '386.
For dhrystones (and a few other programs) this makes a big performance
difference. For other programs, Gnu C was better than Greenhills
(and, if I might add, more stable than the greenhills version we have).
--
Dirk Grunwald
Univ. of Illinois
grunwald@flute.cs.uiuc.edu

guy@auspex.auspex.com (Guy Harris) (04/05/89)

>The original message regarding the -O4 switch was very likely
>an error, as far as I know there is only one -O switch for the SUN3
>family.

There's only one -O switch for the Sun-3 family, just as there's only
one for the Sun-4 family and for the Sun-2 family; however, in SunOS
4.0, the "-O" switch takes an optional optimization level on all three
of those families:

     -O[level] Optimize the object code.  Ignored when either -g,
               -go,  or  -a is used.  On Sun-2 and Sun-3 systems,
               -O with the level omitted is equivalent to -O1; on
               Sun-4  systems,  it  is  equivalent  to  -O2.   on
               Sun386i systems, all levels are  the  same  as  1.
               level is one of:

                    1    Do postpass assembly-level  optimization
                         only.

                    2    Do global  optimization  prior  to  code
                         generation,        including        loop
                         optimizations,   common    subexpression
                         elimination,   copy   propagation,   and
                         automatic register allocation. -O2  does
                         not  optimize  references  to or defini-
                         tions of external or indirect variables.

                    3    Same  as  -O2,  but  optimize  uses  and
                         definitions  of external variables.  -O3
                         does not trace the  effects  of  pointer
                         assignments.  Neither -O3 nor -O4 should
                         be used  when  compiling  either  device
                         drivers,  or programs that modify exter-
                         nal   variables   from   within   signal
                         handlers.

                    4    Same as -O3, but trace  the  effects  of
                         pointer assignments.

>Since the original post seemed to indicate someone had played
>loose with the Dhrystone rules, this seemed like a very possible way it
>could have been done.

The original post seemed to indicate that somebody was confused about
what the Sun compilers did and didn't do.

jesup@cbmvax.UUCP (Randell Jesup) (04/06/89)

In article <GRUNWALD.89Apr4185706@flute.cs.uiuc.edu> grunwald@flute.cs.uiuc.edu writes:
>
>The inlining done by the Greenhills compiler could be done by a
>peep-hole phase, but it's unlikely.  When you say
>
>		strcpy(foo, "this is a string")
>
>Greenhills turns this into a block move, because it already knows the
>length of the string. This beats the pants of a ``move-until-null-byte''
>loop.

	Only with the "dhrystone" switch enabled, and the code does NOT
work if the pointer "foo" is odd (on a 68000).  (This is when generating
code for a 68000 - on an '020, it would work, but slowly)

-- 
Randell Jesup, Commodore Engineering {uunet|rutgers|allegra}!cbmvax!jesup

khb@fatcity.Sun.COM (Keith Bierman Sun Tactical Engineering) (04/06/89)

In article <4641@pt.cs.cmu.edu> schmitz@fas.ri.cmu.edu (Donald Schmitz) writes:


>an error, as far as I know there is only one -O switch for the SUN3 family.

No. -O1, -O2, -O3, and -O4 all exist. -O means -O3. -O4 works, but is
not supported for f77 (meaning try it on your most expensive
procedures, but don't be surprised if something breaks). cc and f77 do
slightly different things with the optimizer... so it is NOT the case
that cc -O4 is really more optimized that f77 at -O3.

>However, even the crude inlining capability available on the SUN can be used
>to speed up Dhrystone.  Those magic .il files are just assembler with some

The question wasn't COULD it be done; the orginal posting said that
the optimizer was not turned all the way on because -O4 would have
resulted in strcpy being inlined. This is, with current compilers,
simply not true. No way. No how. If you want strcpy inlined, you'd
have to code your own .il (nothing up the sleeve, etc.).

>loose with the Dhrystone rules, this seemed like a very possible way it
>could have been done.

Greenhills does this silently. This is not currently done by any sun
compiler. Considering how totally stupid the Dhrystone benchmark is,
but how important it is in the minds of so many customers, it could be
that MIPS and SUN are missing the boat here. It would appear that
several vendors have been publishing "aggresive" Dhry. figures, and
our insistance on "playing by the rules" allows certain vendors to
claim their chips are unnaturally hot...

cheers all,

Keith H. Bierman
It's Not My Fault ---- I Voted for Bill & Opus