wood@gen-rtx.rtp.dg.com (Tom Wood) (05/25/91)
I was finally able to fathom SPEC. Here's a comparison of GNU C 1.93.4,
1.37.31, and DIAB C 2.35. These results are pretty impressive. I ran
these on a 20Mhz AViiON server. The options were:
1.93.4: -O2 -funroll-loops -U__CLASSIFY_TYPE__ (version 1 varargs)
1.37.31: -O
Diab 2.35: -O (with short addressing)
Benchmark comparison of 1.93.4-0522 and 1.37.31
Benchmark Time 1.93.4-0522 1.37.31 Change Significance
gcc real 114.3280 120.8500 5.70% +/- .98% p < .20
user 85.6100 94.0075 9.81% +/- .30% p < .001
prof 101.6180 109.3180 7.58% +/- .20% p < .001
espresso real 164.9980 187.6300 13.72% +/- .35% p < .001
user 160.5000 183.1250 14.10% +/- .16% p < .001
prof 162.8000 185.4100 13.89% +/- .18% p < .001
xlisp real 443.5280 473.0080 6.65% +/- .27% p < .001
user 441.9550 471.4330 6.67% +/- .24% p < .001
prof 442.4300 472.1330 6.71% +/- .23% p < .001
eqntott real 80.9825 117.0500 44.54% +/- .33% p < .001 *
user 77.9250 113.8830 46.14% +/- .28% p < .001 *
prof 80.3300 116.4080 44.91% +/- .31% p < .001
Benchmark comparison of 1.93.4-0522 and Diab 2.35
Benchmark Time 1.93.4-0522 Diab 2.35 Change Significance
espresso real 164.9980 164.6480 -.21% +/- .28% p < .90
user 160.5000 160.8030 .19% +/- .10% p < .70 *
prof 162.8000 162.8080 .00% +/- .10%
xlisp real 443.5280 443.8030 .06% +/- .28%
user 441.9550 442.4630 .11% +/- .22% p < .90
prof 442.4300 442.9230 .11% +/- .23%
eqntott real 80.9825 90.6950 11.99% +/- .48% p < .001
user 77.9250 87.3750 12.13% +/- .35% p < .001
prof 80.3300 89.6900 11.65% +/- .26% p < .001 *
Times are the sample mean reported in seconds (prof is user+system time).
The value P is the probability that there is no difference in the sample
means derived by the T-test (* indicates the separate T-test was used).
The P values reported are .90, .80, ..., .10, .05, and .001.
Here's a blurb on the version 2 optimization options:
`-O2'
Highly optimizize. All supported optimizations are performed. As
compared to `-O', this option will increase both compilation time
and the performance of the generated code.
All `-fFLAG' options that control optimization are turned on
when `-O2' is specified.
The following options control specific optimizations. The `-O2'
option turns on all of these optimization. The `-O' option usually
turns on the `-fthread-jumps' and `-fdelayed-branch' options,
but specific machines may change the default optimizations.
You can use the following flags in the rare cases when "fine-tuning"
of optimizations to be performed is desired.
`-fstrength-reduce'
Perform the optimizations of loop strength reduction and
elimination of iteration variables.
`-fthread-jumps'
Perform optimizations where we check to see if a jump branches to a
location where another comparison subsumed by the first is found. If
so, the first branch is redirected to either the destination of the
second branch or a point immediately following it, depending on whether
the condition is known to be true or false.
`-funroll-loops'
Perform the optimization of loop unrolling. This is only done for loops
whose number of iterations can be determined at compile time or run time.
`-fcse-follow-jumps'
In common subexpression elimination, scan through jump instructions in
certain cases. This is not as powerful as completely global CSE, but
not as slow either.
`-frerun-cse-after-loop'
Re-run common subexpression elimination after loop optimizations has been
performed.
`-fexpensive-optimizations'
Perform a number of minor optimizations that are relatively expensive.
`-fdelayed-branch'
If supported for the target machine, attempt to reorder instructions
to exploit instruction slots available after delayed branch
instructions.
`-fschedule-insns'
If supported for the target machine, attempt to reorder instructions to
eliminate execution stalls due to required data being unavailable. This
helps machines that have slow floating point or memory load instructions
by allowing other instructions to be issued until the result of the load
or floating point instruction is required.
`-fschedule-insns2'
Similar to `-fschedule-insns', but requests an additional pass of
instruction scheduling after register allocation has been done. This is
especially useful on machines with a relatively small number of
registers and where memory load instructions take more than one cycle.
Version 2 is not yet released by FSF, but it's working quite well.
---
Tom Wood (919) 248-6067
Data General, Research Triangle Park, NC
wood@dg-rtp.dg.comwood@gen-rtx.rtp.dg.com (Tom Wood) (05/25/91)
In article <1991May24.183902.14528@dg-rtp.dg.com> I wrote: > Benchmark Time 1.93.4-0522 1.37.31 Change Significance > > gcc real 114.3280 120.8500 5.70% +/- .98% p < .20 > user 85.6100 94.0075 9.81% +/- .30% p < .001 > prof 101.6180 109.3180 7.58% +/- .20% p < .001 I made a mistake in computing the change (A% +/- B%). The value B is supposed to be the expected range of percent change for a significance of .50 (p == .50). Instead, the program was using a significance value of .80. The same data as above should be: gcc real 114.3280 120.8500 5.70% +/- 2.66% p < .20 user 85.6100 94.0075 9.81% +/- .82% p < .001 prof 101.6180 109.3180 7.58% +/- .55% p < .001 --- Tom Wood (919) 248-6067 Data General, Research Triangle Park, NC wood@dg-rtp.dg.com