[comp.lang.eiffel] Mosaic Benchmark on other platforms

murphy@eric.mpr.ca (Gail Murphy) (03/28/90)

Recently, Kim Rochat posted an article about the performance of 
Eiffel vs.  C++ vs.  Smalltalk for the Mosaic program.  Included was
source code for the benchmark.

Being curious, I tried this benchmark on:
 + a VAXStation 3100 running Ultrix 3.0 with Eiffel V2.2 (Level A) 
 + a VAXStation 3100 running Ultrix 3.0 and G++ (Version 1.36.2 based on
   GCC 1.36) 

The Eiffel generated C-package was also moved to the following platforms:
 + an Apollo 3500 running SR10.2
 + a Sun-3/60 running SunOS 4.0.1

Both of these packages were compiled using the respective platforms CC 
compilers.

The results were:

Host      Language   Compiler     Execution Time (secs)   Executable Size 
                                  User   System  Wall     Text    Data    BSS
----      --------   --------     ----   ------  ----     ----    ----    ---

VAX 3100  Eiffel     gcc          154.2   176.4  5:55      64512  6144    5028

VAX 3100  c++        g++           49.1     5.7  1:00      29696  3072    1868

Ap 3500   Eiffel     Apollo CC    137.1     4.5  2:27      68060 11204    2552

Sun-3/50  Eiffel     Sun CC       163.9   273.7  7:21      90112 16384    0

These results are significantly different from the results posted previously, 
where the authors found:

> While bearing in mind that mosaic is a small program exercising a
> limited subset of the languages, two major conclusions can be drawn.
> First, any perception that C++ has superior performance to Eiffel may
> be invalid.  Second, if you're using Eiffel, a different C compiler may
> result in significantly increased performance.

The executions were conducted as described in the previous article.
(ie.  Eiffel C-package generation was used with no assertions, the
program was run twice in succession, etc.).  Similar results were found
using the Ultrix CC compiler.  The following differences to the previous
benchmark (besides Host) exist:

+ the platforms have 8Mb memory as compared to 12Mb Hosts in the
  previous benchmark 

+ different versions of g++ and Eiffel were used.

Use of prof on the Eiffel C-package outputs (the first 10 lines):

 %time  cumsecs  #call  ms/call  name
  36.3   210.061637611     0.13  _sigblock
  12.1   279.99                  mcount
  10.8   342.59 997002     0.06  __0a8005_row_contrastcolorcanappearat
   5.8   375.97                  _c1_item
   5.4   407.151637611     0.02  _setjmp
   5.3   437.85   1000    30.69  __0a8000_row_create
   3.6   458.61                  _c1_put
   2.9   475.46 633603     0.03  __0a9001_random_dont_care
   2.7   490.96 633603     0.02  _random
   2.4   504.76                  _c_put

A profile of the Eiffel C-package on the Apollo, however, did not reveal
anytime spent in _sigblock. A profile of the g++ code was not possible
as the -p and -pg options are not yet supported.

The call to sigblock (an operating system routine) seems to come from the
SETJMP2 macro.  The SETJMP2 and OUTJMP2 calls in the 
row_contrastcolorcanappearat routine were commented out (I think
this disables proper exception handling).

This resulted in the following:

Host      Language   Compiler     Execution Time (secs)   
                                  User   System  Wall     
----      --------   --------     ----   ------  ----     

VAX 3100  Eiffel     gcc         109.8    66.5   2:58

Ap 3500   Eiffel     Apollo CC   111.1     4.4   1:58

Sun-3/50  Eiffel     Sun CC      109.2   106.6   3:36

There is a notable cost, then, for performing the SETJMP2 and OUTJMP2 
macros.  Is this a change from Eiffel Version 2.1?  Why else are the
results of the benchmarks remarkably different?  Is there a more performance
cost efficient means for performing these actions?  

Has anyone else tried these benchmarks on the same or other platforms?
What results have you obtained?

Gail Murphy                     | murphy@joplin.mpr.ca
Microtel Pacific Research       | joplin.mpr.ca!murphy@uunet.uu.net
8999 Nelson Way, Burnaby, BC    | murphy%joplin.mpr.ca@relay.ubc.ca
Canada, V5A 4B5, (604) 293-5462 | ...!ubc-vision!joplin.mpr.ca!murphy



Gail Murphy                     | murphy@joplin.mpr.ca
Microtel Pacific Research       | ubc-cs!eric!murphy@UUNET.UU.NET
8999 Nelson Way, Burnaby, BC    | murphy%joplin.mpr.ca@relay.ubc.ca
Canada, V5A 4B5, (604) 293-5462 | ...!ubc-vision!joplin.mpr.ca!murphy

nosmo@eiffel.UUCP (Vince Kraemer) (03/29/90)

>In article <21110@kiwi.mpr.ca>, Gail Murphy (murphy@eric.mpr.ca) writes:

>[Some intro comments and some really embarassing statistic deleted]
>
>These results are significantly different from the results posted previously, 
>where the authors found:
>
>[Quote from Kim Rochat's article deleted]
>
>The executions were conducted as described in the previous article.
>(ie.  Eiffel C-package generation was used with no assertions, the
>program was run twice in succession, etc.).  Similar results were found
>using the Ultrix CC compiler.  The following differences to the previous
>benchmark (besides Host) exist:
>
>+ the platforms have 8Mb memory as compared to 12Mb Hosts in the
>  previous benchmark 
>
>+ different versions of g++ and Eiffel were used.
>
>Use of prof on the Eiffel C-package outputs (the first 10 lines):
>
> %time  cumsecs  #call  ms/call  name
>  36.3   210.061637611     0.13  _sigblock
>  12.1   279.99                  mcount
>  10.8   342.59 997002     0.06  __0a8005_row_contrastcolorcanappearat
>   5.8   375.97                  _c1_item
>   5.4   407.151637611     0.02  _setjmp
>   5.3   437.85   1000    30.69  __0a8000_row_create
>   3.6   458.61                  _c1_put
>   2.9   475.46 633603     0.03  __0a9001_random_dont_care
>   2.7   490.96 633603     0.02  _random
>   2.4   504.76                  _c_put
>

From the evidence that I see here, I think that assertion checking was
not turned completely off.  The way to do this, which is a tad bit
obtuse in the documentation is to set up the SDF as follows:

NO_ASSERTION_CHECK (Y): ALL
PRECONDITIONS (N): ALL
ALL_ASSERTIONS (N): ALL

C_PACKAGE (Y): {some dir name}

This is the only explanation for the presence of the _sigblock and
_setjmp calls.

>A profile of the Eiffel C-package on the Apollo, however, did not reveal
>anytime spent in _sigblock. A profile of the g++ code was not possible
>as the -p and -pg options are not yet supported.
>
>The call to sigblock (an operating system routine) seems to come from the
>SETJMP2 macro.  The SETJMP2 and OUTJMP2 calls in the 
>row_contrastcolorcanappearat routine were commented out (I think
>this disables proper exception handling).

This disables the "default" exception handling behavior; printing out
an exception stack.  Rescues will still be taken care of - if present
with all assertions off.

(It should be noted here that the use of rescue inside a loop will also
produce many calls to setjmp.  Therefore, don't use rescues inside inner
loops is a good rule to remember when doing Eiffel programming "for speed".)

>
>This resulted in the following:
>
>Host      Language   Compiler     Execution Time (secs)   
>                                  User   System  Wall     
>----      --------   --------     ----   ------  ----     
>
>VAX 3100  Eiffel     gcc         109.8    66.5   2:58
>
>Ap 3500   Eiffel     Apollo CC   111.1     4.4   1:58
>
>Sun-3/50  Eiffel     Sun CC      109.2   106.6   3:36
>
>There is a notable cost, then, for performing the SETJMP2 and OUTJMP2 
>macros.  Is this a change from Eiffel Version 2.1?  Why else are the
>results of the benchmarks remarkably different?  Is there a more performance
>cost efficient means for performing these actions?

The above I'll take in order:

1. There has been very little change in the implementation of assertion
handling from Eiffel 2.1.

2. I'm not too sure why these benchmark results are so different.  One
possible reason may be, was the SDF OPTIMIZE line set to true, for all
classes?  What was the performance on the VAX 3100, using the Ultrix C
compiler?  I know that the gcc optimizer is a real winner: if the gcc vs
Ucc time is comparable, it would indicate that C compiler optimization
was not being done.  I have a gut feeling that the time difference
between the Vax3100/gcc combo vs. Sun-3/50/Sun cc in this second set of
trials should be higher, if gcc is optimizing.

Another cost that is due to preconditions being used is the use of a
routine to access row members, instead of a more direct access method
through a macro expansion.  This is still being done here (See the note
above about turning off all assertions), thus adding to the number of
functions being called by about 2,000,000

3. There are ways of implementing exception handling with a better
cost/benefit.  The problem is they aren't very portable.  Our goal was
portability for the systems produced.

Also worth noting, we see assertions as a development tool.  They are to
be stripped out (via recompilation) at delivery time.

I hope this has helped shed some light on the subject and not cloudy it.
I too would be interested in find out about the results of this
benchmark on other platforms -- especially 386 boxes using the
implementations of c++ available for them.

Vince Kraemer
ISE Jack-of-all-Trades
business related reply-to: eiffel@eiffel.com
personal correspondences reply-to: nosmo@eiffel.com

jimad@microsoft.UUCP (Jim ADCOCK) (04/10/90)

In article <KIM.90Apr2202123@helios.enea.se> kim@helios.enea.se (Kim Wald`n) writes:
>
>The only reason to turn off all assertion checking when running benchmarks
>against C++ should be to get more comparable figures.
>
>As we all know, there is no such thing as a thoroughly tested software
>system, and the resulting safety is well worth the extra cpy cycles.

I believe there is a contradiction in these two statements.  Timing comparisons
should be representative of final executable code as delivered to customers.
If Eiffel code is to be delivered with assertions in the code, and C++
code is not [typically] delivered with assertions in the code, then the
relative timings between the languages should reflect this fact.

If the cost in Eiffel of the assertions is worthwhile, then you should be
able to successfully argue for these costs, even if they make Eiffel code
bigger and slower.  Speed and size is only two [but important] measures of
a language.  Typically one pays for the features of a language.  Timings
and code sizes let a user decide if he/she considers the costs of the
features represent a good value for that user.

Alternately, show size and timings for code both with and without
assertion checking.  Then users can decide if they agree with your
assessment that runtime checking is worth the cost.

Don't "cook" comparisons to make them unrepresentative of how people
actually use languages.  Comparisons should be representative.

sakkinen@tukki.jyu.fi (Markku Sakkinen) (04/10/90)

In article <54007@microsoft.UUCP> jimad@microsoft.UUCP (Jim ADCOCK) writes:
>In article <KIM.90Apr2202123@helios.enea.se> kim@helios.enea.se (Kim Wald`n) writes:
>>
>>The only reason to turn off all assertion checking when running benchmarks
>>against C++ should be to get more comparable figures.
>>
>>As we all know, there is no such thing as a thoroughly tested software
>>system, and the resulting safety is well worth the extra cpy cycles.
>
>I believe there is a contradiction in these two statements.  Timing comparisons
>should be representative of final executable code as delivered to customers.
>If Eiffel code is to be delivered with assertions in the code, and C++
>code is not [typically] delivered with assertions in the code, then the
>relative timings between the languages should reflect this fact.

I disagree, i.e. agree with the original poster. See below.

> ...
>Alternately, show size and timings for code both with and without
>assertion checking.  Then users can decide if they agree with your
>assessment that runtime checking is worth the cost.

Yes, _this_ is a sensible suggestion.

>Don't "cook" comparisons to make them unrepresentative of how people
>actually use languages.  Comparisons should be representative.

No, comparisons should above all be as comparable as possible,
i.e. no apples and oranges. Of course, the less similar languages
one has, the harder it is to design a meaningful and fair comparison.
Eiffel vs. C++ is a lot easier than Prolog vs. RPG.

Markku Sakkinen
Department of Computer Science
University of Jyvaskyla (a's with umlauts)
Seminaarinkatu 15
SF-40100 Jyvaskyla (umlauts again)
Finland
          SAKKINEN@FINJYU.bitnet (alternative network address)

jimad@microsoft.UUCP (Jim ADCOCK) (04/13/90)

In article <4083@tukki.jyu.fi> sakkinen@jytko.jyu.fi (Markku Sakkinen) writes:
>No, comparisons should above all be as comparable as possible,
>i.e. no apples and oranges. Of course, the less similar languages
>one has, the harder it is to design a meaningful and fair comparison.
>Eiffel vs. C++ is a lot easier than Prolog vs. RPG.

I don't consider it apples and oranges to compare two languages as
they are actually used.  To compare "apples to apples" is one to
gin up a hashed dispatcher in C++ to slow it down enough to compare to
Smalltalk dispatching?  This makes C++ more like Smalltalk -- does
that make the comparison more fair?  Should one add bounds checking on
C++ arrays to make it more comparable to some Pascal compilers?  When
comparing Pascal code to C code does one write the C code using a Pascal-like
coding style? 

-- If you keep playing these kinds of games to try to make languages
look more and more similar, then you end up with two identical sets
of features -- only the syntax is different.  Then you are not comparing
the languages nor the compilers, but only the two back-end code generators.
Which seems pretty silly -- especially if you're attempting to compare
two languages/compilers both using the same C compiler as the back-end code 
generator!  Surprise:  Given two half reasonable front-ends both
implementing the same exact set of features, you get exactly the same
code out the C back-end compiler.

The choices made in languages: safety vs speed vs flexibility etc,
are reflected in in the size and speed of the resulting code.  If one
believes one's language has made the right choices, then one should be
willing to live by the results.  I think user's of Smalltalk [rightly
or wrongly] would be willing to claim their language is worth the speed
hit.

bruce@menkar.gsfc.nasa.gov (Bruce Mount) (04/13/90)

In article <54060@microsoft.UUCP> jimad@microsoft.UUCP (Jim ADCOCK) writes:
>[Stuff deleted]

>I don't consider it apples and oranges to compare two languages as
>they are actually used.  To compare "apples to apples" is one to
>gin up a hashed dispatcher in C++ to slow it down enough to compare to
>Smalltalk dispatching?  

Yes it does, if hashed dispatching is part of your application.  The
point of a benchmark is to compare different platforms (languages or
hardware), not different implementations.  It is up to the reader to
determine if the benchmark applies to their application.

The Mosaic Benchmark compares a few specific features---this was freely
admitted by the author---not the whole range of OO activity.

I agree with the poster that stated that a more comprehensive benchmark
would be invaluble (testing dynamic binding, etc), however, this does
not diminish the value of the Mosaic benchmarch.  Rather than griping
about the Mosaic benchmark, why don't we all think of a more comprehensive
one(s)?

--Bruce
=================================================
| Bruce Mount                 "Brevity is best" |
| bruce@atria.gsfc.nasa.gov                     |
=================================================

bruce@menkar.gsfc.nasa.gov (Bruce Mount) (04/13/90)

Please forgive the multiple spelling mistakes in my last posting.  I was
typing *WAY* too fast.

=================================================
| Bruce Mount                 "Brevity is best" |
| bruce@atria.gsfc.nasa.gov                     |
=================================================