schmitz@fas.ri.cmu.edu (Donald Schmitz) (03/31/89)
In article david@sun.com writes: >In article sclafani@jumbo.dec.com (Michael Sclafani) writes: >>Results for the Sun-3/60 are not reported because the data in >>[Presentation on Benchmarks given at Sun User Group Conference, Dec 5-7, >>1988 by Sun Microsystems, Inc.] uses compiler optimization level 4 which >>employs procedure inlining. > >FYI, this is incorrect. Current Sun C compilers do not perform procedure >inlining at any optimization level. > >David DiGiacomo, Sun Microsystems, Mt. View, CA sun!david david@sun.com Maybe you should look again - our SUN3 C compiler has a limited, but occasionally useful procedure inline facility - and the standard inline files have 1986 SUN copywrite notices in them. Its actually pretty crude, you compile the source with a .il file (assembly code with a few special directives), and all of the original jsr's are macro replaced with the assembler for the routine. It doesn't inline argument passing, but it does eliminate the jsr/rts overhead. Don Schmitz --
khb%chiba@Sun.COM (chiba) (04/01/89)
In article <4614@pt.cs.cmu.edu> schmitz@fas.ri.cmu.edu (Donald Schmitz) writes: >In article david@sun.com writes: >>In article sclafani@jumbo.dec.com (Michael Sclafani) writes: >>>Results for the Sun-3/60 are not reported because the data in >>>[Presentation on Benchmarks given at Sun User Group Conference, Dec 5-7, >>>1988 by Sun Microsystems, Inc.] uses compiler optimization level 4 which >>>employs procedure inlining. >> >>FYI, this is incorrect. Current Sun C compilers do not perform procedure >>inlining at any optimization level. >> >>David DiGiacomo, Sun Microsystems, Mt. View, CA sun!david david@sun.com > >Maybe you should look again - our SUN3 C .... >occasionally useful procedure inline facility - and the standard inline >files have 1986 SUN copywrite notices in them. Its actually pretty crude, >you compile the source with a .il file (assembly code with a few special >directives), and all of the original jsr's are macro replaced with the >assembler for the routine. It doesn't inline argument passing, but it does >eliminate the jsr/rts overhead. > For the record...... -O4 has nothing to do with inlining (in any product currently released). /usr/lib/f77/libm.il /usr/lib/libm.il exist for cc and f77 on ALL sun platforms. The names vary a bit to account for hardware (e.g. /usr/lib/f77/f68881 means use the f77 inline library for 68881 instructions). Cheers. Keith H. Bierman It's Not My Fault ---- I Voted for Bill & Opus
guy@auspex.auspex.com (Guy Harris) (04/01/89)
>>>Results for the Sun-3/60 are not reported because the data in >>>[Presentation on Benchmarks given at Sun User Group Conference, Dec 5-7, >>>1988 by Sun Microsystems, Inc.] uses compiler optimization level 4 which >>>employs procedure inlining. >> >>FYI, this is incorrect. Current Sun C compilers do not perform procedure >>inlining at any optimization level. > >Maybe you should look again No, there's not much point in looking again; David is talking about *general* procedure inlining, which the current Sun compilers do not do, at least not to the best of my knowledge - a second look will almost certainly confirm that. This is quite different from the very specialized inlining that the "inline" program performs (as I remember, it performs inlining on the assembly-language output from the compiler), so if the claim is that the 3/60 results used general procedure inlining, the claim seems suspicious to me. The only ".il" files I could find on any of the 4.0 machines around here (both 68K and SPARC) are 1) files for "libm" - in several flavors for the 68K machines - and 2) some for doing loads from possibly-misaligned locations. As far as I know, Dhrystone (the benchmark to which the comment about the 3/60 refers) doesn't make heavy use of floating point (it's supposed to be a synthetic systems programming benchmark, and systems programming tends not to make heavy use of floating point) or the math library (if it makes any use of it at all); furthermore, the 3/60 doesn't make use of SPARC templates for handling misaligned data (it can handle it on its own). As such, "inline" is completely irrelevant to the discussion.
jamesa@arabian.Sun.COM (James D. Allen) (04/01/89)
In article <4614@pt.cs.cmu.edu>, schmitz@fas.ri.cmu.edu (Donald Schmitz) writes: |From: schmitz@fas.ri.cmu.edu (Donald Schmitz) |Newsgroups: comp.arch |Subject: SUN procedure inlining(was i860 Dhrystones) | |In article david@sun.com writes: |>In article sclafani@jumbo.dec.com (Michael Sclafani) writes: |>>Results for the Sun-3/60 are not reported because the data in |>>[Presentation on Benchmarks given at Sun User Group Conference, Dec 5-7, |>>1988 by Sun Microsystems, Inc.] uses compiler optimization level 4 which |>>employs procedure inlining. |> | | ... our SUN3 C compiler has a limited, but |occasionally useful procedure inline facility - and the standard inline |files have 1986 SUN copywrite notices in them. Its actually pretty crude, pretty crude? it got rave reviews in our real-time controller application, virtually eliminating any issue of asm-interface speed deficit. You don't even have to put on your assembly-language hat if you don't want to since you can run "cc -S" on the functions to be inlined and touch up the output. |you compile the source with a .il file (assembly code with a few special |directives), and all of the original jsr's are macro replaced with the |assembler for the routine. It doesn't inline argument passing, but it does |eliminate the jsr/rts overhead. Perhaps you neglected to use "cc -O" ( "-O1" is all you need if you want to talk about optimization levels). Even before SunOS 4.0, the peephole optimizer "knew" about inline code's argument passing and typically optimizes the argument-passing down to *nothing*. | |Don Schmitz |-- James Allen Disclaimer: I am not authorized to speak for Sun, but hopefully they won't object if it's complimentary material. :-)
mash@mips.COM (John Mashey) (04/02/89)
In article <97048@sun.Eng.Sun.COM> khb@sun.UUCP (chiba) writes: .... >For the record...... > >-O4 has nothing to do with inlining (in any product currently released). People maybe have been confusing this with MIPS -O4, which DOES do inlining. Note that the MIPS Performance Briefs have consistently observed that we weren't sure what the higher Sun optimizations did, but we knew they didn't do inlining, and so cited those numbers. -- -john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> UUCP: {ames,decwrl,prls,pyramid}!mips!mash OR mash@mips.com DDD: 408-991-0253 or 408-720-1700, x253 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086
schmitz@fas.ri.cmu.edu (Donald Schmitz) (04/04/89)
(Lots of people writing) >>>FYI, this is incorrect. Current Sun C compilers do not perform procedure >>>inlining at any optimization level. ^^^^^^^^^^^^^^^^^^^^^^^^ >> ^^^^^^^^ >>(I wrote) Maybe you should look again > >No, there's not much point in looking again; David is talking about >*general* procedure inlining, which the current Sun compilers do not do, ^^^^^^^^ >at least not to the best of my knowledge - a second look will almost >certainly confirm that. This is quite different from the very >specialized inlining that the "inline" program performs (as I remember, >it performs inlining on the assembly-language output from the compiler), >so if the claim is that the 3/60 results used general procedure >inlining, the claim seems suspicious to me. The only ".il" files I >could find on any of the 4.0 machines around here (both 68K and SPARC) >are 1) files for "libm" - in several flavors for the 68K machines - and >2) some for doing loads from possibly-misaligned locations. To put an end to this, I was responding to the statement that SUN compilers *do no procedure inlining*, looking at the man entry will tell you this isn't true. The original message regarding the -O4 switch was very likely an error, as far as I know there is only one -O switch for the SUN3 family. However, even the crude inlining capability available on the SUN can be used to speed up Dhrystone. Those magic .il files are just assembler with some special directives thrown in, I could easily write a strcmp and strcpy .il file and compile the source with them. Assuming I didn't further massage the assembly code, this would still speed up every subroutine call (it eliminates jsr, link, unlink, and rts, all multi cycle instructions). It would also improve code locality, improving I cache performance (on SUNs with I caches). Since the original post seemed to indicate someone had played loose with the Dhrystone rules, this seemed like a very possible way it could have been done. Don Schmitz --
grunwald@flute.cs.uiuc.edu (04/05/89)
The inlining done by the Greenhills compiler could be done by a peep-hole phase, but it's unlikely. When you say strcpy(foo, "this is a string") Greenhills turns this into a block move, because it already knows the length of the string. This beats the pants of a ``move-until-null-byte'' loop. I noticed this when comparing greenhills C to Gnu C for the '386. For dhrystones (and a few other programs) this makes a big performance difference. For other programs, Gnu C was better than Greenhills (and, if I might add, more stable than the greenhills version we have). -- Dirk Grunwald Univ. of Illinois grunwald@flute.cs.uiuc.edu
guy@auspex.auspex.com (Guy Harris) (04/05/89)
>The original message regarding the -O4 switch was very likely >an error, as far as I know there is only one -O switch for the SUN3 >family. There's only one -O switch for the Sun-3 family, just as there's only one for the Sun-4 family and for the Sun-2 family; however, in SunOS 4.0, the "-O" switch takes an optional optimization level on all three of those families: -O[level] Optimize the object code. Ignored when either -g, -go, or -a is used. On Sun-2 and Sun-3 systems, -O with the level omitted is equivalent to -O1; on Sun-4 systems, it is equivalent to -O2. on Sun386i systems, all levels are the same as 1. level is one of: 1 Do postpass assembly-level optimization only. 2 Do global optimization prior to code generation, including loop optimizations, common subexpression elimination, copy propagation, and automatic register allocation. -O2 does not optimize references to or defini- tions of external or indirect variables. 3 Same as -O2, but optimize uses and definitions of external variables. -O3 does not trace the effects of pointer assignments. Neither -O3 nor -O4 should be used when compiling either device drivers, or programs that modify exter- nal variables from within signal handlers. 4 Same as -O3, but trace the effects of pointer assignments. >Since the original post seemed to indicate someone had played >loose with the Dhrystone rules, this seemed like a very possible way it >could have been done. The original post seemed to indicate that somebody was confused about what the Sun compilers did and didn't do.
jesup@cbmvax.UUCP (Randell Jesup) (04/06/89)
In article <GRUNWALD.89Apr4185706@flute.cs.uiuc.edu> grunwald@flute.cs.uiuc.edu writes: > >The inlining done by the Greenhills compiler could be done by a >peep-hole phase, but it's unlikely. When you say > > strcpy(foo, "this is a string") > >Greenhills turns this into a block move, because it already knows the >length of the string. This beats the pants of a ``move-until-null-byte'' >loop. Only with the "dhrystone" switch enabled, and the code does NOT work if the pointer "foo" is odd (on a 68000). (This is when generating code for a 68000 - on an '020, it would work, but slowly) -- Randell Jesup, Commodore Engineering {uunet|rutgers|allegra}!cbmvax!jesup
khb@fatcity.Sun.COM (Keith Bierman Sun Tactical Engineering) (04/06/89)
In article <4641@pt.cs.cmu.edu> schmitz@fas.ri.cmu.edu (Donald Schmitz) writes: >an error, as far as I know there is only one -O switch for the SUN3 family. No. -O1, -O2, -O3, and -O4 all exist. -O means -O3. -O4 works, but is not supported for f77 (meaning try it on your most expensive procedures, but don't be surprised if something breaks). cc and f77 do slightly different things with the optimizer... so it is NOT the case that cc -O4 is really more optimized that f77 at -O3. >However, even the crude inlining capability available on the SUN can be used >to speed up Dhrystone. Those magic .il files are just assembler with some The question wasn't COULD it be done; the orginal posting said that the optimizer was not turned all the way on because -O4 would have resulted in strcpy being inlined. This is, with current compilers, simply not true. No way. No how. If you want strcpy inlined, you'd have to code your own .il (nothing up the sleeve, etc.). >loose with the Dhrystone rules, this seemed like a very possible way it >could have been done. Greenhills does this silently. This is not currently done by any sun compiler. Considering how totally stupid the Dhrystone benchmark is, but how important it is in the minds of so many customers, it could be that MIPS and SUN are missing the boat here. It would appear that several vendors have been publishing "aggresive" Dhry. figures, and our insistance on "playing by the rules" allows certain vendors to claim their chips are unnaturally hot... cheers all, Keith H. Bierman It's Not My Fault ---- I Voted for Bill & Opus