[comp.sys.nsc.32k] Can compilers make a difference ?

arielf@BBN.COM (Ariel Faigon) (04/11/88)
Hi  guys,  have  you  ever noticed how much compilers can make a difference ?
True, contemporary optimizing compilers for  imperative  languages  like  'C'
or  Fortran  or  Pascal  cannot  make  an order of magnitude improvement over
simple older compilers, because of  the  inherent  serial  execution  of  the
imperative model of computation, but there seem to be an encouraging trend.

Optimizing compilers improve in two main directions:

    1. Better machine independent optimization techniques are utilized:
       extensive data-flow analysis + global optimizations. Lately,
       interprocedural analysis is showing up.

    2. Better tunning for specific new hardware in order to make the
       most of it. e.g. register allocation to avoid memory references,
       code-reordering to avoid contention or pipeline breakage in
       pipelined architectures, and even techniques to ensure maximum
       utilization of given sized on-chip-caches.

---- The Present state:

Because  of  significant progress  in  both  directions, todays compilers can
claim substantial  improvement  over  traditional  old  compilers.  Sometimes
when  you  cannot  find a better algorithm for a real impressive speed-up nor
can you change the  architecture  because  of  some  big  past  investment, I
suggest that you try switching compilers, you may be surprised by the results.

Well,  at  least  I  was surprised  when I benchmarked  National-Semiconductor
GNX/CTP compiler against other compilers which run on OPUS and Sequent Balance
machines (see below). All benchmarks are  very well known programs  which were
picked because of their availability and with no bias whatsoever.

Note:  All  figures  refer  to  optimized  programs (-O option in effect) All
sources are 'C' programs. (I assume  that  Pascal  results  will  be  usually
even  more  impressive  because  'C'  address-taken variables inhibit certain
optimizations). The CTP compiler  supports  Pascal, Modula-2 and Fortran77 in
addition to C.

VAX 785 BSD4.3 Berkeley 'C' standard (pcc)  compiler  times  are  given  here
just  for  reference  since  this  is  a  very well known Machine/OS/Compiler
combination. Remember, what is being compared is Compilers only.


VAX 785 BSD4.3 reference times   Exact benchmark parameters
-----------------------------------------------------------
Ackerman:    33116 milisec.      Ackerman(3,6) = 509
Puzzle:      94233 milisec.      10 loop iterations size=511
Quicksort:    9133 milisec.      10 iterations, 1000 long integers array
Sieve:       16916 milisec.      100 iterations, SIZE=8192
C Whetstone:  1783 milisec.      10 iterations (1 Million Whetstones)

-------------------- Compiler Comparison results ----------------------

All times are in Miliseconds

Set 1:
	Machine - OPUS 32332 (15 MHz)
	O.S.    - UNIX SYS-5.3

The OPUS machine is an add-on board for IBM-XT/AT compatibles manufactured
by OPUS systems.

The Optimizing compiler from Green Hills denoted 'GH' below is:
C-32000 1.8.1(C) Copyright (c)1985,1986 Green Hills Software, Inc.

pcc1 is the Standard compiler supplied by OPUS with the machine.

----------------+-----------------------------+-----------------------+
                |           Compiler          |   Runtime Ratios      |
Benchmark name  |    GH   |   pcc1  | GNX/CTP |  GH/CTP  |  pcc1/CTP  |
----------------+---------+---------+---------+----------+------------+
Ackerman        |  16800  |  16166  |  14700  |   1.14   |   1.10     |
Puzzle          |  38000  |  85233  |  35000  |   1.09   |   2.44 (!) |
Quicksort       |  15766  |  10866  |   7966  |   1.98   |   1.36     |
Sieve           |  10400  |  20483  |   8400  |   1.24   |   2.45 (!) |
C Whetstone     |   1950  |  1966   |   1750  |   1.11   |   1.12     |
----------------+---------+---------+---------+----------+------------+

--- Notes:

In  the  Whetstone  benchmark  the  CTP  gain is only from  the user code: the
mathematical library  used  was the same  for both compilers. I would expect a
greater improvement if the two compilers were used on the math library sources.

With the Green Hills compiler the '-O2' optimization (full optimization)
was used instead of '-O'.

The Green Hills compiler is a good optimizing compiler. Checking the assembly
produced points out that CTP makes better machine-specific optimizations,
e.g. multiplying by a constant using 'addr' instructions.

Set 2:
	Machine - Sequent Balance 32032,
	O.S -     Dynix (Sequent's BSD4.2 Variant)

pcc2 is the standard compiler supplied by Sequent with the machine.
All times are in miliseconds.

----------------+---------------------+-----------------+
                |     Compiler        |                 |
Benchmark name  |   pcc2     GNX/CTP  | pcc2/CTP ratio  |
----------------+---------+-----------+-----------------+
Ackerman        |  39000  |    30850  |     1.26        |
Puzzle          | 177333  |    64883  |     2.73 (!)    |
Quicksort       |  17783  |    12316  |     1.44        |
Sieve           |  44233  |    16633  |     2.65 (!)    |
C Whetstone     |   3783  |     3183  |     1.19        |
----------------+---------+-----------+-----------------+
--- Notes:

Again,  in  the  whetstone program a common math-library was used so only the
user-code was actually benchmarked.

CTP as a product does support only COFF (Common  Object  File  Format)  which
is  the  AT&T UNIX standard and not Berkeley a.out format. The version tested
here is an in-house version with no debugging  support  -  which  is  not  an
official product.

---- The future is still more promising...

Results  for the  new  NS32532  running at 30-MHz are  typically   5-6  times
faster  than  for  the NS32332  comparing CTP times on both processors. (e.g.
sieve 1316 milisec. [6.38x],  Ackerman  2854  milisec. [5.15x]).  Traditional
compilers produce  the  same  code for the 32532 and for other members of the
NS32000 family. CTP can generate code that is  specifically  tuned  for  each
CPU/FPU/BUS-WIDTH  combination.  Thus I expect the performance ratios between
older compilers and CTP to increase as  newer  NS32000  hardware  appears  on
the horizon.

VAX, VAX-VMS Are trademarks of Digital Equipment Corp.
Sequent, Balance, Dynix Are trademarks of Sequent Computer Systems
OPUS Is a trademark of OPUS Systems
IBM XT and AT are trademarks of IBM
UNIX is a trademark of AT&T
-- 
Ariel Faigon, CTP group
National Semiconductor (Israel)
6 Maskit st. P.O.B. 3007, Hertzlia 46104, Israel. Tel. (972)52-522312
arielf%taux01@nsc.com @{hplabs,pyramid,sun,decwrl}  34 48 E / 32 10 N
[I would be more impressed with results on larger programs, since results on
these toy test programs can be heavily biased by compiler differences that
make little difference on larger, more realistic programs. For example,
compile troff with different compilers and compare performance on a 30-page
-mm document. -John]
--
Send compilers articles to ima!compilers or, in a pinch, to Levine@YALE.EDU
Plausible paths are { ihnp4 | decvax | cbosgd | harvard | yale | bbn}!ima
Please send responses to the originator of the message -- I cannot forward
mail accidentally sent back to compilers.  Meta-mail to ima!compilers-request