[comp.benchmarks] BYTE and benchmarks

torbenm@diku.dk (Torben [gidius Mogensen) (01/08/91)

Sometime in '88, BYTE published a special issue on benchmarks. I don't
have the issue handy, so I can't tell you which month it was.

One of the articles was an overview and discussion of popular
benchmarks that was in use or had been in use previously.

During this overview, the author (I can't recall his name) came to a
then popular Pascal benchmark suite, containing a recursive Fibonacci
number program to test the speed of recursive function calls. In a
moment of inspiration the author had the marvelous idea that he would
write Fibonacci in a non-recursive style, to compare the speeds. To
his amazement, the recursive program used several thousand times as
much time as the non-recursive program to evaluate Fibonacci(20). His
conclusion was that recursive function calls must be very expensive
indeed :-)

What really happened must have been this: the recursive program was
written in the "naive" fashion:

	function Fibonacci(n : integer) : integer;
	var result : integer;
	begin
	  if n<2 then
	    result := n
	  else
	    result := Fibonacci(n-1) + Fibonacci(n-2);
	  Fibonacci := result
	end;

which is known to use exponential time. The non-recursive program was
propably written using a for-loop:

	a := 0;
	b := 1;
	for i := 1 to n do begin
	  t := a+b;
	  a := b;
	  b := t
	end;
	Fibonacci := a;

which uses linear time. Using this comparison, you can conclude that
recursive calls are arbitrarily more expensive than a for loop :-)

	Torben Mogensen (torbenm@diku.dk)

eachus@linus.mitre.org (Robert I. Eachus) (01/09/91)

     Exercise for the reader: Write a function to compute the Nth
Fibonacci number in O(log n) time.... (Constant time if you limit
yourself to 32-bit integers.)

     I actually wrote a Fibonacci program that worked this way (in
APL, what else) and it is a very good example of how vectorizing or
partitioning of code well is not something a compiler can do for you.
Computing the entire vector of Fibonacci numbers (in an APL one-liner
of course) is much faster than even the non-recursive formula, because
there is only one line to interpret (once).

     The same thing occurs if you have a vector machine handy.
Computing powers seems expensive until you realize that ANY straight
line code which uses the vector pipes will win over an iterative
(scalar) loop.  (The square root of 5 is a constant {hint}, and the
second term need not be computed if you truncate to an integer.)

     So here is a challenge to you vector machine freaks out there.
My handy dandy pocket calculator tells me that F(50) = 12586269025
(hmmm, a little big for most people) and F(46) = 1836311903 (that's
better--as in less than 2**31), so how fast can you calculate the
first 46 (to put the integer and floating people on the same footing)
Fibonacci numbers?  Or better yet, how long does your function to
compute a given Fibonacci number take to run, when called a equal
number of times with arguments 0 to 46.

     This is not so much a benchmark as a challenge to find out how
your favorite machine works, and to show how ferocious application of
analysis of algorithms can speed up a program.

     I'll post a benchmark program in a couple of days, but I want to
challenge some of you first...

--

					Robert I. Eachus

     When the dictators are ready to make war upon us, they will not
wait for an act of war on our part." - Franklin D. Roosevelt

patrick@convex.COM (Patrick F. McGehearty) (01/10/91)

In article <EACHUS.91Jan8114647@aries.linus.mitre.org> eachus@linus.mitre.org (Robert I. Eachus) writes:
>
>     Exercise for the reader: Write a function to compute the Nth
>Fibonacci number in O(log n) time.... (Constant time if you limit
>yourself to 32-bit integers.)
>
I'm not sure how vectors figure in, but with a constraint of
0 <= N <= 46, it can be quite easy:
	precompute the first 46 Fibonacci numbers and assign them in
	a DATA statement to FIBON.  Then the function is:
	INTEGER FUNCTION FIBONACCI(N)
	COMMON /FIB/FIBON(47)
	INTEGER FIBON
	FIBONACCI = FIBON(N+1)    ! compensate for one-based arrays
	RETURN
	END

Of course, good software engineering practice would check N for being
within bounds, and do something appropriate, but this is comp.benchmarks. :-)

Adding the constraint of computing the number on the fly with a maximum
of k prestored values instead of precomputing the table will add more of a
challenge to the exercise.  k << max(N)

>     I'll post a benchmark program in a couple of days, but I want to
>challenge some of you first...
>
>--
>
>					Robert I. Eachus
>
>     When the dictators are ready to make war upon us, they will not
>wait for an act of war on our part." - Franklin D. Roosevelt

aburto@marlin.NOSC.MIL (Alfred A. Aburto) (01/10/91)

In article <1991Jan8.101912.15157@odin.diku.dk> torbenm@diku.dk (Torben [gidius Mogensen) writes:
>
>Sometime in '88, BYTE published a special issue on benchmarks. I don't
>have the issue handy, so I can't tell you which month it was.
>
>One of the articles was an overview and discussion of popular
>benchmarks that was in use or had been in use previously.
>
>During this overview, the author (I can't recall his name) came to a
>then popular Pascal benchmark suite, containing a recursive Fibonacci
>number program to test the speed of recursive function calls. In a
>moment of inspiration the author had the marvelous idea that he would
>write Fibonacci in a non-recursive style, to compare the speeds. To
>his amazement, the recursive program used several thousand times as
>much time as the non-recursive program to evaluate Fibonacci(20). His
>conclusion was that recursive function calls must be very expensive
>indeed :-)
>

Yes, this fellow sure did fall on his sword.  I seem to remember a number
of 'letters to the editor' on this subject.  In fact this fellow won
the Byte Magazine 'BOMB' award for receiving the most letters that
month. Yes, one wonders what tree this fellow fell out of :-)

A little thought (or a counter in the Fibonacci routine) would have
shown that the recursive Fibonacci requires 92735 calls to itself inorder
to calculate Fibonacci(24) whereas the non-recursive 'for' loop takes only
24 interations (loops)!  The overhead involved in the procedure (function)
call is, in this case, not very significant at all.  In other situations
it might be, but not in this one.

Actually, the number of recursive function calls is itself a Fibonacci
like (like) number.  The number of calls is equal to the sum of the
two previous sums plus one.  That is, the number of function calls for
Fibonacci(n), n = 0, 1, 2, 3, ..., is given by:

1, 1, 3, 5, 9, 15, 25, 41, 67, 109, 177, ..., s(n), ....

s(0) = 1
s(1) = 1
s(n) = s(n-1) + s(n-2) + 1, n > 1 .

Al Aburto
aburto@marlin.nosc.mil

hooft@ruunsa.fys.ruu.nl (Rob Hooft) (01/10/91)

In <EACHUS.91Jan8114647@aries.linus.mitre.org> eachus@linus.mitre.org (Robert I. Eachus) writes:


>     Exercise for the reader: Write a function to compute the Nth
>Fibonacci number in O(log n) time.... (Constant time if you limit
>yourself to 32-bit integers.)

Do not read on if you want to try yourself. I tried to mail the result but
it bounced (user eachus unknown).

      program test
      do i=1,46
        write(*,*) ifib(i)
      enddo
      end

      function ifib(i)
c
c The actual function is:
c      fib=(((1.+5.**.5)/2.)**i-((1.-5.**.5)/2.)**i)/5.**.5
c
      ifib=int(aint(((1.d0+5.d0**.5d0)/2.d0)**i)/5.d0**.5d0)+1
      end
-- 
Rob Hooft, Chemistry department University of Utrecht.
hooft@hutruu54.bitnet hooft@chem.ruu.nl hooft@fys.ruu.nl

augustss@cs.chalmers.se (Lennart Augustsson) (01/10/91)

In article <EACHUS.91Jan8114647@aries.linus.mitre.org> eachus@linus.mitre.org (Robert I. Eachus) writes:
>     Exercise for the reader: Write a function to compute the Nth
>Fibonacci number in O(log n) time.... (Constant time if you limit
>yourself to 32-bit integers.)
>

To do this we consult the classic MIT HAKMEM, item 12.  This explains
exactly how to do it.  Define multiplication on pairs by
(a,b)(c,d)=(ac+ad+bc,ac+bd)
Note that (a,b)(1,0) = (a+b,a) and thus (1,0)**n = (fib(n),fib(n-1)).
Then use the usual trick with repeated squaring to do the exponentiation
in log n operations (which is log n time if you have constant time
arithmetic).

This is a nice method since it uses only integer arithmetic (as
opposed to solving the difference equation and computing
exponentiation of something with sqrt(5)).

Here is a Haskell program to do it:
------
instance (Num a) => Num (a,a) where
	(a,b) * (c,d) = (a*c+a*d+b*c, a*c+b*d)
fib n = fst ((1,0)^n)
------
Some sample results:
fib 10 = 55
fib 46 = 1836311903
fib 100 = 354224848179261915075
fib 1000 = 43466557686937456435688527675040625802564660517371780402481729
08953655541794905189040387984007925516929592259308032263477520968962323987332
2471161642996440906533187938298969649928516003704476137795166849228875
All of which take a very short time to compute (of course).

	-- Lennart Augustsson
[This signature is intentionally left blank.]

eachus@linus.mitre.org (Robert I. Eachus) (01/11/91)

In article <1872@ruunsa.fys.ruu.nl> hooft@ruunsa.fys.ruu.nl (Rob Hooft) writes:

	 program test
	 do i=1,46
	   write(*,*) ifib(i)
	 enddo
	 end

	 function ifib(i)
   c
   c The actual function is:
   c      fib=(((1.+5.**.5)/2.)**i-((1.-5.**.5)/2.)**i)/5.**.5
   c
	 ifib=int(aint(((1.d0+5.d0**.5d0)/2.d0)**i)/5.d0**.5d0)+1
	 end

     Not bad, but did you check your output?  A good compiler will
figure out that the square root of 5 is a common subexpression, and
good compilers will even evaluate it at compile time.  But I'd have to
run your code through a FORTRAN compiler to see if you got the
truncation right--it looks wrong.  (The second term gets small rapidly
but it alternates in sign.) Also I don't think your function works
correctly for the magic numbers of computer science -- zero and one.

     My program (in Ada for readability, it should easily translate to
C or Pascal) is append below, together with output.  Some explanations
are in order.  The 68020 is (relatively) slow on floating point
operations compared to chips which have floating point on chip (I say
the 68020, not the 68881, since most of the overhead is moving stuff
on and off chip).  Also the math package I used calls the Unix exp
function, which is fairly slow.  Doing the exp by calling the 68881
directly would probably speed up the result using Fib4 below
immensely.  

    If you are wondering about the complexity of the code for the
"simple" solution, I had to make sure that it would not overflow for
N=46... Solutions which happily return junk are not acceptable! These
function all raise CONSTRAINT_ERROR for values out of range, and only
for values out of range.  (If you run the Ada on your machine, check
that the accuracy of float is acceptable.  I only used FLOAT here to
allow me to use the Verdix Math package.)  I also have another math
package to try out. If you are wondering about the comparison to 42,
that was just to make sure that the compiler didn't optimize stuff
away...

					Robert I. Eachus



     package Fibonacci is
       function Fib1(N: in Natural) return Natural;
       function Fib2(N: in Natural) return Natural;
       function Fib3(N: in Natural) return Natural;
       function Fib4(N: in Natural) return Natural;
       function Fib5(N: in Natural) return Natural;
       function Fib6(N: in Natural) return Natural;

       pragma INLINE(Fib1,Fib2,Fib3,Fib4,Fib5,Fib6);
       -- Procedure call overhead will be larger than most of these
       -- functions.

     end Fibonacci;

     with Math;
     package body Fibonacci is

  SQRT5 	: constant :=  2.23606_79774_99789_69640_91736_68731_27624;
  GOLDEN_RATIO	: constant :=  1.61803_39887_49894_84820_45868_34365_63812;
  NAT_LOG_PHI	: constant :=  0.48121_18250_59603_44749_77589_13424_36842;
  -- borrowed from the Verdix supplied Math package, copied to make the
  -- code more portable.
  
       InvSqrt5: constant := 1.0/Sqrt5;
       Phi: constant Float := Golden_Ratio;
       Rho: constant Float := Sqrt5/2.0 - 0.5; -- also = 1.0/Phi.
       -- Phi is a common designator for the golden ratio,
       -- Rho is just my choice of name for the inverse.
       -- Phi and Rho are declared Float to determine the
       -- accuracy used in evaluating the expressions below.
 
       function Fib1(N: in Natural) return Natural is
         N1: Natural := 0;
         N2: Natural := 1;
       begin
         if N in 0..1
         then return N;
	 else 
           if N mod 2 = 0 
	   then
             for I in 1..N/2-1 loop
               N1 := N1 + N2;
               N2 := N2 + N1;
             end loop;
             return N1+N2;
           else
             for I in 1..N/2 loop
               N1 := N1 + N2;
               N2 := N2 + N1;
             end loop;
	     return N2;
          end if;
        end if;
      end Fib1;

      function Fib2(N: in Natural) return Natural is
      begin
        return Natural((Phi**N-(-Rho)**N) / Sqrt5 );
      end Fib2;

      -- Now to cheat.  For large N, Rho**N < 0.5, so all we have to
      -- do is round..., How large must N be? Let's see, 1/Sqrt5
      -- rounds ok, and so does Phi/Sqrt5, and Rho**2 < 0.5 so no
      -- check need be made...

      function Fib3(N: in Natural) return Natural is
      begin 
        return Natural(Phi**N/Sqrt5);
      end Fib3;

      -- Now, since a float to an integer power is supposed to be
      -- calculated iteratively in Ada (or iteratively followed by an
      -- inverse, let's see what happens when we explicitly force the power
      -- operation to be done by exponentiation.
      
      function Fib4(N: in Natural) return Natural is
      begin
        return Natural(Math.Exp(Nat_Log_Phi*Float(N))/Sqrt5);
      end Fib4;
      
      -- See if explicit looping for exponentiation is faster or slower...

      function Fib5(N: in Natural) return Natural is
        Temp: Float := InvSqrt5;
      begin
        for I in 1..N loop
          Temp := Temp * Phi;
        end loop;
        return Natural(Temp);
      end Fib5;

      -- Last but not least, see if multiplication is faster...

      function Fib6(N: in Natural) return Natural is
      begin
        return Natural(InvSqrt5*Phi**N);
      end Fib6;

   end Fibonacci;

   with Text_IO;
   with Fibonacci; use Fibonacci;
   with Calendar;
   procedure Main is
     Start, Stop: Calendar.Time;

     function "-" (L,R: Calendar.Time) return Duration
                                       renames Calendar."-";
     package Int_IO is new Text_IO.Integer_IO(Integer);
     package Duration_IO is new Text_IO.Fixed_IO(Duration);
     use Int_IO, Duration_IO;
     -- If you don't know Ada, ignore these lines. They are only here
     -- to make right aligned columnar output of integers and output
     -- of timestamps easier.
   begin
     -- First check that everything works:
     for I in 0..46 loop
       Put(I,5);
       Put(Fib1(I),12);
       Put(Fib2(I),12);
       Put(Fib3(I),12);
       Put(Fib4(I),12);
       Put(Fib5(I),12);
       Put(Fib6(I),12);
       Text_IO.New_Line;
     end loop;

     delay 0.5; -- wait for quiescence
     Start := Calendar.Clock;
     for I in 1..1_000_000 loop
       if Fib1(I mod 47) = 42 then Text_IO.Put_Line("Ooops. :-)"); end if;
     end loop;
     Stop := Calendar.Clock;
     Text_IO.Put(" Fib1 took, on average, ");
     Put(Stop-Start,4,2);
     Text_IO.Put_Line(" microseconds.");

     delay 0.5; -- wait for quiescence
     Start := Calendar.Clock;
     for I in 1..1_000_000 loop
       if Fib2(I mod 47) = 42 then Text_IO.Put_Line("Ooops. :-)"); end if;
     end loop;
     Stop := Calendar.Clock;
     Text_IO.Put(" Fib2 took, on average, ");
     Put(Stop-Start,4,2);
     Text_IO.Put_Line(" microseconds.");

     delay 0.5; -- wait for quiescence
     Start := Calendar.Clock;
     for I in 1..1_000_000 loop
       if Fib3(I mod 47) = 42 then Text_IO.Put_Line("Ooops. :-)"); end if;
     end loop;
     Stop := Calendar.Clock;
     Text_IO.Put(" Fib3 took, on average, ");
     Put(Stop-Start,4,2);
     Text_IO.Put_Line(" microseconds.");

     delay 0.5; -- wait for quiescence
     Start := Calendar.Clock;
     for I in 1..1_000_000 loop
       if Fib4(I mod 47) = 42 then Text_IO.Put_Line("Ooops. :-)"); end if;
     end loop;
     Stop := Calendar.Clock;
     Text_IO.Put(" Fib4 took, on average, ");
     Put(Stop-Start,4,2);
     Text_IO.Put_Line(" microseconds.");

     delay 0.5; -- wait for quiescence
     Start := Calendar.Clock;
     for I in 1..1_000_000 loop
       if Fib5(I mod 47) = 42 then Text_IO.Put_Line("Ooops. :-)"); end if;
     end loop;
     Stop := Calendar.Clock;
     Text_IO.Put(" Fib5 took, on average, ");
     Put(Stop-Start,4,2);
     Text_IO.Put_Line(" microseconds.");

     delay 0.5; -- wait for quiescence
     Start := Calendar.Clock;
     for I in 1..1_000_000 loop
       if Fib6(I mod 47) = 42 then Text_IO.Put_Line("Ooops. :-)"); end if;
     end loop;
     Stop := Calendar.Clock;
     Text_IO.Put(" Fib6 took, on average, ");
     Put(Stop-Start,4,2);
     Text_IO.Put_Line(" microseconds.");

   end Main;

   Output on a Sun 3/260, using VADS Ada 6.03:

aries% fib
    0           0           0           0           0           0           0
    1           1           1           1           1           1           1
    2           1           1           1           1           1           1
    3           2           2           2           2           2           2
    4           3           3           3           3           3           3
    5           5           5           5           5           5           5
    6           8           8           8           8           8           8
    7          13          13          13          13          13          13
    8          21          21          21          21          21          21
    9          34          34          34          34          34          34
   10          55          55          55          55          55          55
   11          89          89          89          89          89          89
   12         144         144         144         144         144         144
   13         233         233         233         233         233         233
   14         377         377         377         377         377         377
   15         610         610         610         610         610         610
   16         987         987         987         987         987         987
   17        1597        1597        1597        1597        1597        1597
   18        2584        2584        2584        2584        2584        2584
   19        4181        4181        4181        4181        4181        4181
   20        6765        6765        6765        6765        6765        6765
   21       10946       10946       10946       10946       10946       10946
   22       17711       17711       17711       17711       17711       17711
   23       28657       28657       28657       28657       28657       28657
   24       46368       46368       46368       46368       46368       46368
   25       75025       75025       75025       75025       75025       75025
   26      121393      121393      121393      121393      121393      121393
   27      196418      196418      196418      196418      196418      196418
   28      317811      317811      317811      317811      317811      317811
   29      514229      514229      514229      514229      514229      514229
   30      832040      832040      832040      832040      832040      832040
   31     1346269     1346269     1346269     1346269     1346269     1346269
   32     2178309     2178309     2178309     2178309     2178309     2178309
   33     3524578     3524578     3524578     3524578     3524578     3524578
   34     5702887     5702887     5702887     5702887     5702887     5702887
   35     9227465     9227465     9227465     9227465     9227465     9227465
   36    14930352    14930352    14930352    14930352    14930352    14930352
   37    24157817    24157817    24157817    24157817    24157817    24157817
   38    39088169    39088169    39088169    39088169    39088169    39088169
   39    63245986    63245986    63245986    63245986    63245986    63245986
   40   102334155   102334155   102334155   102334155   102334155   102334155
   41   165580141   165580141   165580141   165580141   165580141   165580141
   42   267914296   267914296   267914296   267914296   267914296   267914296
   43   433494437   433494437   433494437   433494437   433494437   433494437
   44   701408733   701408733   701408733   701408733   701408733   701408733
   45  1134903170  1134903170  1134903170  1134903170  1134903170  1134903170
   46  1836311903  1836311903  1836311903  1836311903  1836311903  1836311903
 Fib1 took, on average,   47.52 microseconds.
 Fib2 took, on average,  141.74 microseconds.
 Fib3 took, on average,   83.50 microseconds.
 Fib4 took, on average,  275.32 microseconds.
 Fib5 took, on average,  317.32 microseconds.
 Fib6 took, on average,   82.98 microseconds.
aries% 
--

					Robert I. Eachus

     When the dictators are ready to make war upon us, they will not
wait for an act of war on our part." - Franklin D. Roosevelt

gl8f@astsun.astro.Virginia.EDU (Greg Lindahl) (01/15/91)

In article <josef.663497368@ugum01> josef@nixpbe.nixdorf.de (josef Moellers) writes:

>losing out of sight the purpose of a
>benchmark: not to produce the numbers in the least amount of time, but
>rather to keep Your machine busy for some time.

Gee, and here I thought the purpose of a benchmark was to predict the
relative performance of various machines on the applications that I
use them for.

Silly me. I propose we use the "infinite loop" benchmark instead of
SPEC.

burley@geech.ai.mit.edu (Craig Burley) (01/15/91)

In article <1991Jan14.202646.5677@murdoch.acc.Virginia.EDU> gl8f@astsun.astro.Virginia.EDU (Greg Lindahl) writes:

   Silly me. I propose we use the "infinite loop" benchmark instead of
   SPEC.

At my last job, I did some work on the ROM-based microcode, primarily
optimizing for space and, sometimes, speed, so new features could be
added to the ROM.

One thing I made smaller and, therefore, faster, was the idle loop.
It had half the instructions of the previous version yet worked as
well.

Since the idle loop accounted for 99.99% of the execution profile of
these machines, I reported to management that I had singlehandedly
improved system performance by a factor of two!

They were very impressed!  (-:

Seriously, it does seem difficult to produce benchmarks that don't show,
to an unspecifiable extent, a combination of hardware speed and compiler-
produced optimizations.  Globally optimizing compilers, as they become
available, might well turn many popular benchmarks into, Fortranically,
STOP statements, or WRITEs of constants.

One way to help avoid this is to have a two-phase benchmark.  The first
phase uses system information like date and time to seed a random number,
then writes a data file containing values derived from this number.
The second phase, the one actually measured, then reads this file and
does the measureable calculations on it.

Of course, this might appear to corrupt timings wanted for just CPU
activity with disk activity.  Even trying to isolate, in the code, one
mode from another by reading the whole file into memory, has its
problems (a global optimizer might decide, if it doesn't see the
call to begin the timer between the reads and the computes, to
reduce memory requirements and perhaps increase speed by folding the
compute loop into the I/O loop).

If the data set were small compared to the compute size of the problem --
for example, if an NP-complete problem like traveling salesman were
computed using whatever favorite (to-be-benchmarked) techniques based
on distance data in a file -- this I/O problem could be reduced to the
vanishing point.  Also, by using an NP-complete problem, even a
globally optimizing compiler is less likely to find a better algorithm
that significantly affects the benchmark (if it does, that's big news,
right?!).

The other important solution, I'd like to stand on my soapbox and say,
is the proliferation of free software such that any machine you want
to benchmark is likely to already have compiler technology equivalent
to that used for the machine with which it is being compared.  For
example, doing 80486 vs. 68040 benchmarks for C code is likely to be
much more helpful in isolating the hardware characteristics (when this
is desired -- it isn't always, of course!) if GNU CC is used to do
the compilations on both machines.  Of course, one machine's description
might have more coverage as far as various optimizations, and differences
in library routines would remain, but at least you wouldn't be comparing
code compiled using cse against code compiled without it, for example.
--

James Craig Burley, Software Craftsperson    burley@ai.mit.edu