[comp.benchmarks] unbc - A New!, Improved! bc benchmark

eugene@eos.arc.nasa.gov (Eugene Miya) (12/19/90)

In article <115440001@hpcuhc.cup.hp.com> spuhler@hpcuhc.cup.hp.com
(Tom Spuhler) writes:
>Concerned that your 'bc' benchmark results may be skewed by vendor
>optimization of the trivial case?  Looking for a longer running version
>for your faster CPU's?  Does management want a richer instruction mix to
>be tested?

Er, sorry, I must be dense, but where does the "richer instruction mix"(tm)
come in (sounds like coffee, thank god I drink tea).  Seems like more of
the same.  Do you work per change in a marketing department?  Longer running?
Longer is not necessarily better (no sex jokes please).  Seems this could
be optimized as well.  Fortunately (?) I didn't see the beginnings of the 
2^n thread.  

>	It is better to have some data, no matter how limited, as long
>	as you understand it, then no data at all.

Nope.  Beg to disagree.  It can be more damaging.  I think some one is suing
someone else over performance claims, getting nasty.  
Note: in a first post, I cited the APL benchmark (Gaussian sum) where
the adds were all replaced by the simple (n+1)n/2 formula (n was = 256).

It's hard to understand the behavior of some benchmark results, even by
some of the programmer who wrote a given benchmark or compiler.

--e.n. miya, NASA Ames Research Center, eugene@eos.arc.nasa.gov
  {uunet,mailrus,most gateways}!ames!eugene
  AMERICA: CHANGE IT OR LOSE IT.

spuhler@hpcuhc.cup.hp.com (Tom Spuhler) (12/20/90)

# >for your faster CPU's?  Does management want a richer instruction mix to
# 
# Er, sorry, I must be dense, but where does the "richer instruction mix"(tm)
# come in (sounds like coffee, thank god I drink tea).  Seems like more of

Come on, Eugene, you're tripping over the easy ones:-)  Richer
instruction mix means a more varied, or using a larger subset of the machine 
insructions.  Not particullary interesting, as the important criteria is
how the tested instruction mix matches your expected workloads(for richer
or poorer:-) but, I get more warm fuzzies from tests that exercise the
'richer' mixes then the 'poorers' as real life usage tends to be on the
richer side (for the kinds of computers I'm interested in).  Was common
terminology around here.  I didn't invent it  (now, as to the concept of
"creamier" code, I'll take some blame on that).

# the same.  Do you work per change in a marketing department?  Longer running?
# Longer is not necessarily better (no sex jokes please).  Seems this could

Sorry, no, to the marketing question.  Longer is better in that it tends
to minimize the lack of precision of the reporting mechanism (in this
case /bin/time) and the impact of startup effects (something of conern
in the 'bc' benchmark) will be minimized.  When the run times drop below
a couple of seconds, I personally start to worry about the precision of
/bin/time.  I like um to run at least 10 seconds.  Unfortunately, I
didn't achieve that goal with 2^9999/3^6308.  On some systems, I expect
it can run in less then a second, but I was limited by the 'bc' program
and my interests in simplicity.  Longer is not 'necessarily' better, but I find
it usually is for accuracy in results, although 'longer' may reduce the
number or times it's run or it's usefulness, which may be more
important.

# Longer is not necessarily better (no sex jokes please).  Seems this could
# be optimized as well.  Fortunately (?) I didn't see the beginnings of the 

Optimizable?  Oh sure.  This is always true.  Vendors could hard code in
the answer.  It's a question of ease, likelyhood, and dependence.
How hard is it to optimize for this case?  2^9999/3^6308 is
harder to optimize for then 2^5000/2^5000, assuming for more then just
the hard-coded case (easy to detect) and somewhat consistent with
the intent of 'bc'.  How likely is someone likely to do something like
that?  Depends on how hung up the world gets on a single benchmark.
How likely is someone going to optimize for Dhrystone? (Seems to have
hppened).  It's all a matter of contest.
  
# >	It is better to have some data, no matter how limited, as long
# >	as you understand it, then no data at all.
# 
# Nope.  Beg to disagree.  It can be more damaging.  I think some one is suing
# someone else over performance claims, getting nasty.  
# Note: in a first post, I cited the APL benchmark (Gaussian sum) where
# the adds were all replaced by the simple (n+1)n/2 formula (n was = 256).
# 
We always have to live with imperfect information.  True, the results of
a benchmark running your applications(s) on a variety of vendor machines
with a variety of configurations is ideal, it can be a little expensive
to achieve.   Something like the bc or nbc benchmarks may be not
very good, but they are cheap to run.  Results from a good number of
machines are available.  Note that the results of both efforts may be no
more useful (or less useful) to someone else in determining the relative
performances of the tested boxes.  And guess which one cost less.
Using bc, or better nbc can help classify systems and direct other
investigative efforts.  The combination of bc and nbc results is
considerably more useful then either one alone.  Keep adding in more
benchmarks and you can develop a performance profile of a system.  Does
SPEC alone allow one to characterize the performance of a system?
Definately not.  Does it help?  Sure.  How about TPC-A?  For any single
characterization, one can cite exceptions.  Only the complete universe
of information is universally useful.  Performance information is
damaging only if it is missued (happends a lot).  
["there is no enlightenment until there is total enlightenment"].  

# It's hard to understand the behavior of some benchmark results, even by
# some of the programmer who wrote a given benchmark or compiler.

and it's even harder to come up with a single all singing and dancing
benchmark which will allow anyone to evalute the performance of a
variety of boxes running whatever applications they choose.  

-Tom Spuhler,  Spuhler@cup.hp.com

spuhler@hpcuhc.cup.hp.com (Tom Spuhler) (12/22/90)

# >Most importantly, You didn't get the correct output.  Any benchmark
# >which doesn't return the expected output is invalid (or at least VERY
# >suspect).  Work on it until you get '2'.
# 
# perhaps you should work on UN*X then.  elementary considerations
# show that the numerator must end in "8", and the denominator must
# end in "1".  How can the answer possibly be "2"?

Easy.  The program 'bc' assumes no (0) decimal places by default.  and
as you can see below, '2' is quite reasonable (and correct) given the
calculation and the documented behavior of 'bc'.   I suppose your 'bc's
may vary.  When I asked for a more exact answer I got 2.079945....

# My LISP machine gives an answer of:
# 
# 9975..(omit ~ 6300 digits)..4688 / 4795..(omit ~ 6300 digits)..2561
# 
# i.e. no integral answer.  it took 17.4 seconds by the way.

Looks correct, The check is an exercise left to the reader.  Note that
9975/4795 comes out close to 2.08.  Interesting that my workstation (HP9000/360
diskless) was able to generate the (precise) answer in 17.66 seconds using
Mathematica (but does take longer using bc).  What hardware is your LISP
machine based on? :-)

I might add, that this illustrates some of the value of 2^9999/3^6308 as it
requires significantly more work from the more intelligent packages.
Mathematica only took .04 seconds to solve 2^5000/2^5000.
-Tom

tac@cs.brown.edu (Ted A. Camus) (12/23/90)

># My LISP machine gives an answer of:
># 
># 9975..(omit ~ 6300 digits)..4688 / 4795..(omit ~ 6300 digits)..2561
># 
># i.e. no integral answer.  it took 17.4 seconds by the way.

>Note that 9975/4795 comes out close to 2.08.  Interesting that my workstation
>(HP9000/360 diskless) was able to generate the (precise) answer in 17.66 
>seconds using Mathematica (but does take longer using bc).  

Why is this interesting ?
Here's my reason why bc is not a good benchmark:

> (time (* 1.0 (/ (expt 2 9999) (expt 3 6308))))
Elapsed Real Time = 0.14 seconds
. . .
2.079945102751959

using Lucid CL on a SS1.  
Given this, I find it hard to take bc seriously.  


- Ted

==========================================================
  Ted Camus                          Box 1910 CS Dept 
  tac@cs.brown.edu                   Brown University
  tac@browncs.BITNET                 Providence, RI 02912