wood@gen-rtx.rtp.dg.com (Tom Wood) (05/25/91)
I was finally able to fathom SPEC. Here's a comparison of GNU C 1.93.4, 1.37.31, and DIAB C 2.35. These results are pretty impressive. I ran these on a 20Mhz AViiON server. The options were: 1.93.4: -O2 -funroll-loops -U__CLASSIFY_TYPE__ (version 1 varargs) 1.37.31: -O Diab 2.35: -O (with short addressing) Benchmark comparison of 1.93.4-0522 and 1.37.31 Benchmark Time 1.93.4-0522 1.37.31 Change Significance gcc real 114.3280 120.8500 5.70% +/- .98% p < .20 user 85.6100 94.0075 9.81% +/- .30% p < .001 prof 101.6180 109.3180 7.58% +/- .20% p < .001 espresso real 164.9980 187.6300 13.72% +/- .35% p < .001 user 160.5000 183.1250 14.10% +/- .16% p < .001 prof 162.8000 185.4100 13.89% +/- .18% p < .001 xlisp real 443.5280 473.0080 6.65% +/- .27% p < .001 user 441.9550 471.4330 6.67% +/- .24% p < .001 prof 442.4300 472.1330 6.71% +/- .23% p < .001 eqntott real 80.9825 117.0500 44.54% +/- .33% p < .001 * user 77.9250 113.8830 46.14% +/- .28% p < .001 * prof 80.3300 116.4080 44.91% +/- .31% p < .001 Benchmark comparison of 1.93.4-0522 and Diab 2.35 Benchmark Time 1.93.4-0522 Diab 2.35 Change Significance espresso real 164.9980 164.6480 -.21% +/- .28% p < .90 user 160.5000 160.8030 .19% +/- .10% p < .70 * prof 162.8000 162.8080 .00% +/- .10% xlisp real 443.5280 443.8030 .06% +/- .28% user 441.9550 442.4630 .11% +/- .22% p < .90 prof 442.4300 442.9230 .11% +/- .23% eqntott real 80.9825 90.6950 11.99% +/- .48% p < .001 user 77.9250 87.3750 12.13% +/- .35% p < .001 prof 80.3300 89.6900 11.65% +/- .26% p < .001 * Times are the sample mean reported in seconds (prof is user+system time). The value P is the probability that there is no difference in the sample means derived by the T-test (* indicates the separate T-test was used). The P values reported are .90, .80, ..., .10, .05, and .001. Here's a blurb on the version 2 optimization options: `-O2' Highly optimizize. All supported optimizations are performed. As compared to `-O', this option will increase both compilation time and the performance of the generated code. All `-fFLAG' options that control optimization are turned on when `-O2' is specified. The following options control specific optimizations. The `-O2' option turns on all of these optimization. The `-O' option usually turns on the `-fthread-jumps' and `-fdelayed-branch' options, but specific machines may change the default optimizations. You can use the following flags in the rare cases when "fine-tuning" of optimizations to be performed is desired. `-fstrength-reduce' Perform the optimizations of loop strength reduction and elimination of iteration variables. `-fthread-jumps' Perform optimizations where we check to see if a jump branches to a location where another comparison subsumed by the first is found. If so, the first branch is redirected to either the destination of the second branch or a point immediately following it, depending on whether the condition is known to be true or false. `-funroll-loops' Perform the optimization of loop unrolling. This is only done for loops whose number of iterations can be determined at compile time or run time. `-fcse-follow-jumps' In common subexpression elimination, scan through jump instructions in certain cases. This is not as powerful as completely global CSE, but not as slow either. `-frerun-cse-after-loop' Re-run common subexpression elimination after loop optimizations has been performed. `-fexpensive-optimizations' Perform a number of minor optimizations that are relatively expensive. `-fdelayed-branch' If supported for the target machine, attempt to reorder instructions to exploit instruction slots available after delayed branch instructions. `-fschedule-insns' If supported for the target machine, attempt to reorder instructions to eliminate execution stalls due to required data being unavailable. This helps machines that have slow floating point or memory load instructions by allowing other instructions to be issued until the result of the load or floating point instruction is required. `-fschedule-insns2' Similar to `-fschedule-insns', but requests an additional pass of instruction scheduling after register allocation has been done. This is especially useful on machines with a relatively small number of registers and where memory load instructions take more than one cycle. Version 2 is not yet released by FSF, but it's working quite well. --- Tom Wood (919) 248-6067 Data General, Research Triangle Park, NC wood@dg-rtp.dg.com
wood@gen-rtx.rtp.dg.com (Tom Wood) (05/25/91)
In article <1991May24.183902.14528@dg-rtp.dg.com> I wrote: > Benchmark Time 1.93.4-0522 1.37.31 Change Significance > > gcc real 114.3280 120.8500 5.70% +/- .98% p < .20 > user 85.6100 94.0075 9.81% +/- .30% p < .001 > prof 101.6180 109.3180 7.58% +/- .20% p < .001 I made a mistake in computing the change (A% +/- B%). The value B is supposed to be the expected range of percent change for a significance of .50 (p == .50). Instead, the program was using a significance value of .80. The same data as above should be: gcc real 114.3280 120.8500 5.70% +/- 2.66% p < .20 user 85.6100 94.0075 9.81% +/- .82% p < .001 prof 101.6180 109.3180 7.58% +/- .55% p < .001 --- Tom Wood (919) 248-6067 Data General, Research Triangle Park, NC wood@dg-rtp.dg.com