torbenm@diku.dk (Torben [gidius Mogensen) (01/08/91)
Sometime in '88, BYTE published a special issue on benchmarks. I don't have the issue handy, so I can't tell you which month it was. One of the articles was an overview and discussion of popular benchmarks that was in use or had been in use previously. During this overview, the author (I can't recall his name) came to a then popular Pascal benchmark suite, containing a recursive Fibonacci number program to test the speed of recursive function calls. In a moment of inspiration the author had the marvelous idea that he would write Fibonacci in a non-recursive style, to compare the speeds. To his amazement, the recursive program used several thousand times as much time as the non-recursive program to evaluate Fibonacci(20). His conclusion was that recursive function calls must be very expensive indeed :-) What really happened must have been this: the recursive program was written in the "naive" fashion: function Fibonacci(n : integer) : integer; var result : integer; begin if n<2 then result := n else result := Fibonacci(n-1) + Fibonacci(n-2); Fibonacci := result end; which is known to use exponential time. The non-recursive program was propably written using a for-loop: a := 0; b := 1; for i := 1 to n do begin t := a+b; a := b; b := t end; Fibonacci := a; which uses linear time. Using this comparison, you can conclude that recursive calls are arbitrarily more expensive than a for loop :-) Torben Mogensen (torbenm@diku.dk)
eachus@linus.mitre.org (Robert I. Eachus) (01/09/91)
Exercise for the reader: Write a function to compute the Nth Fibonacci number in O(log n) time.... (Constant time if you limit yourself to 32-bit integers.) I actually wrote a Fibonacci program that worked this way (in APL, what else) and it is a very good example of how vectorizing or partitioning of code well is not something a compiler can do for you. Computing the entire vector of Fibonacci numbers (in an APL one-liner of course) is much faster than even the non-recursive formula, because there is only one line to interpret (once). The same thing occurs if you have a vector machine handy. Computing powers seems expensive until you realize that ANY straight line code which uses the vector pipes will win over an iterative (scalar) loop. (The square root of 5 is a constant {hint}, and the second term need not be computed if you truncate to an integer.) So here is a challenge to you vector machine freaks out there. My handy dandy pocket calculator tells me that F(50) = 12586269025 (hmmm, a little big for most people) and F(46) = 1836311903 (that's better--as in less than 2**31), so how fast can you calculate the first 46 (to put the integer and floating people on the same footing) Fibonacci numbers? Or better yet, how long does your function to compute a given Fibonacci number take to run, when called a equal number of times with arguments 0 to 46. This is not so much a benchmark as a challenge to find out how your favorite machine works, and to show how ferocious application of analysis of algorithms can speed up a program. I'll post a benchmark program in a couple of days, but I want to challenge some of you first... -- Robert I. Eachus When the dictators are ready to make war upon us, they will not wait for an act of war on our part." - Franklin D. Roosevelt
patrick@convex.COM (Patrick F. McGehearty) (01/10/91)
In article <EACHUS.91Jan8114647@aries.linus.mitre.org> eachus@linus.mitre.org (Robert I. Eachus) writes: > > Exercise for the reader: Write a function to compute the Nth >Fibonacci number in O(log n) time.... (Constant time if you limit >yourself to 32-bit integers.) > I'm not sure how vectors figure in, but with a constraint of 0 <= N <= 46, it can be quite easy: precompute the first 46 Fibonacci numbers and assign them in a DATA statement to FIBON. Then the function is: INTEGER FUNCTION FIBONACCI(N) COMMON /FIB/FIBON(47) INTEGER FIBON FIBONACCI = FIBON(N+1) ! compensate for one-based arrays RETURN END Of course, good software engineering practice would check N for being within bounds, and do something appropriate, but this is comp.benchmarks. :-) Adding the constraint of computing the number on the fly with a maximum of k prestored values instead of precomputing the table will add more of a challenge to the exercise. k << max(N) > I'll post a benchmark program in a couple of days, but I want to >challenge some of you first... > >-- > > Robert I. Eachus > > When the dictators are ready to make war upon us, they will not >wait for an act of war on our part." - Franklin D. Roosevelt
aburto@marlin.NOSC.MIL (Alfred A. Aburto) (01/10/91)
In article <1991Jan8.101912.15157@odin.diku.dk> torbenm@diku.dk (Torben [gidius Mogensen) writes: > >Sometime in '88, BYTE published a special issue on benchmarks. I don't >have the issue handy, so I can't tell you which month it was. > >One of the articles was an overview and discussion of popular >benchmarks that was in use or had been in use previously. > >During this overview, the author (I can't recall his name) came to a >then popular Pascal benchmark suite, containing a recursive Fibonacci >number program to test the speed of recursive function calls. In a >moment of inspiration the author had the marvelous idea that he would >write Fibonacci in a non-recursive style, to compare the speeds. To >his amazement, the recursive program used several thousand times as >much time as the non-recursive program to evaluate Fibonacci(20). His >conclusion was that recursive function calls must be very expensive >indeed :-) > Yes, this fellow sure did fall on his sword. I seem to remember a number of 'letters to the editor' on this subject. In fact this fellow won the Byte Magazine 'BOMB' award for receiving the most letters that month. Yes, one wonders what tree this fellow fell out of :-) A little thought (or a counter in the Fibonacci routine) would have shown that the recursive Fibonacci requires 92735 calls to itself inorder to calculate Fibonacci(24) whereas the non-recursive 'for' loop takes only 24 interations (loops)! The overhead involved in the procedure (function) call is, in this case, not very significant at all. In other situations it might be, but not in this one. Actually, the number of recursive function calls is itself a Fibonacci like (like) number. The number of calls is equal to the sum of the two previous sums plus one. That is, the number of function calls for Fibonacci(n), n = 0, 1, 2, 3, ..., is given by: 1, 1, 3, 5, 9, 15, 25, 41, 67, 109, 177, ..., s(n), .... s(0) = 1 s(1) = 1 s(n) = s(n-1) + s(n-2) + 1, n > 1 . Al Aburto aburto@marlin.nosc.mil
hooft@ruunsa.fys.ruu.nl (Rob Hooft) (01/10/91)
In <EACHUS.91Jan8114647@aries.linus.mitre.org> eachus@linus.mitre.org (Robert I. Eachus) writes: > Exercise for the reader: Write a function to compute the Nth >Fibonacci number in O(log n) time.... (Constant time if you limit >yourself to 32-bit integers.) Do not read on if you want to try yourself. I tried to mail the result but it bounced (user eachus unknown). program test do i=1,46 write(*,*) ifib(i) enddo end function ifib(i) c c The actual function is: c fib=(((1.+5.**.5)/2.)**i-((1.-5.**.5)/2.)**i)/5.**.5 c ifib=int(aint(((1.d0+5.d0**.5d0)/2.d0)**i)/5.d0**.5d0)+1 end -- Rob Hooft, Chemistry department University of Utrecht. hooft@hutruu54.bitnet hooft@chem.ruu.nl hooft@fys.ruu.nl
augustss@cs.chalmers.se (Lennart Augustsson) (01/10/91)
In article <EACHUS.91Jan8114647@aries.linus.mitre.org> eachus@linus.mitre.org (Robert I. Eachus) writes: > Exercise for the reader: Write a function to compute the Nth >Fibonacci number in O(log n) time.... (Constant time if you limit >yourself to 32-bit integers.) > To do this we consult the classic MIT HAKMEM, item 12. This explains exactly how to do it. Define multiplication on pairs by (a,b)(c,d)=(ac+ad+bc,ac+bd) Note that (a,b)(1,0) = (a+b,a) and thus (1,0)**n = (fib(n),fib(n-1)). Then use the usual trick with repeated squaring to do the exponentiation in log n operations (which is log n time if you have constant time arithmetic). This is a nice method since it uses only integer arithmetic (as opposed to solving the difference equation and computing exponentiation of something with sqrt(5)). Here is a Haskell program to do it: ------ instance (Num a) => Num (a,a) where (a,b) * (c,d) = (a*c+a*d+b*c, a*c+b*d) fib n = fst ((1,0)^n) ------ Some sample results: fib 10 = 55 fib 46 = 1836311903 fib 100 = 354224848179261915075 fib 1000 = 43466557686937456435688527675040625802564660517371780402481729 08953655541794905189040387984007925516929592259308032263477520968962323987332 2471161642996440906533187938298969649928516003704476137795166849228875 All of which take a very short time to compute (of course). -- Lennart Augustsson [This signature is intentionally left blank.]
eachus@linus.mitre.org (Robert I. Eachus) (01/11/91)
In article <1872@ruunsa.fys.ruu.nl> hooft@ruunsa.fys.ruu.nl (Rob Hooft) writes:
program test
do i=1,46
write(*,*) ifib(i)
enddo
end
function ifib(i)
c
c The actual function is:
c fib=(((1.+5.**.5)/2.)**i-((1.-5.**.5)/2.)**i)/5.**.5
c
ifib=int(aint(((1.d0+5.d0**.5d0)/2.d0)**i)/5.d0**.5d0)+1
end
Not bad, but did you check your output? A good compiler will
figure out that the square root of 5 is a common subexpression, and
good compilers will even evaluate it at compile time. But I'd have to
run your code through a FORTRAN compiler to see if you got the
truncation right--it looks wrong. (The second term gets small rapidly
but it alternates in sign.) Also I don't think your function works
correctly for the magic numbers of computer science -- zero and one.
My program (in Ada for readability, it should easily translate to
C or Pascal) is append below, together with output. Some explanations
are in order. The 68020 is (relatively) slow on floating point
operations compared to chips which have floating point on chip (I say
the 68020, not the 68881, since most of the overhead is moving stuff
on and off chip). Also the math package I used calls the Unix exp
function, which is fairly slow. Doing the exp by calling the 68881
directly would probably speed up the result using Fib4 below
immensely.
If you are wondering about the complexity of the code for the
"simple" solution, I had to make sure that it would not overflow for
N=46... Solutions which happily return junk are not acceptable! These
function all raise CONSTRAINT_ERROR for values out of range, and only
for values out of range. (If you run the Ada on your machine, check
that the accuracy of float is acceptable. I only used FLOAT here to
allow me to use the Verdix Math package.) I also have another math
package to try out. If you are wondering about the comparison to 42,
that was just to make sure that the compiler didn't optimize stuff
away...
Robert I. Eachus
package Fibonacci is
function Fib1(N: in Natural) return Natural;
function Fib2(N: in Natural) return Natural;
function Fib3(N: in Natural) return Natural;
function Fib4(N: in Natural) return Natural;
function Fib5(N: in Natural) return Natural;
function Fib6(N: in Natural) return Natural;
pragma INLINE(Fib1,Fib2,Fib3,Fib4,Fib5,Fib6);
-- Procedure call overhead will be larger than most of these
-- functions.
end Fibonacci;
with Math;
package body Fibonacci is
SQRT5 : constant := 2.23606_79774_99789_69640_91736_68731_27624;
GOLDEN_RATIO : constant := 1.61803_39887_49894_84820_45868_34365_63812;
NAT_LOG_PHI : constant := 0.48121_18250_59603_44749_77589_13424_36842;
-- borrowed from the Verdix supplied Math package, copied to make the
-- code more portable.
InvSqrt5: constant := 1.0/Sqrt5;
Phi: constant Float := Golden_Ratio;
Rho: constant Float := Sqrt5/2.0 - 0.5; -- also = 1.0/Phi.
-- Phi is a common designator for the golden ratio,
-- Rho is just my choice of name for the inverse.
-- Phi and Rho are declared Float to determine the
-- accuracy used in evaluating the expressions below.
function Fib1(N: in Natural) return Natural is
N1: Natural := 0;
N2: Natural := 1;
begin
if N in 0..1
then return N;
else
if N mod 2 = 0
then
for I in 1..N/2-1 loop
N1 := N1 + N2;
N2 := N2 + N1;
end loop;
return N1+N2;
else
for I in 1..N/2 loop
N1 := N1 + N2;
N2 := N2 + N1;
end loop;
return N2;
end if;
end if;
end Fib1;
function Fib2(N: in Natural) return Natural is
begin
return Natural((Phi**N-(-Rho)**N) / Sqrt5 );
end Fib2;
-- Now to cheat. For large N, Rho**N < 0.5, so all we have to
-- do is round..., How large must N be? Let's see, 1/Sqrt5
-- rounds ok, and so does Phi/Sqrt5, and Rho**2 < 0.5 so no
-- check need be made...
function Fib3(N: in Natural) return Natural is
begin
return Natural(Phi**N/Sqrt5);
end Fib3;
-- Now, since a float to an integer power is supposed to be
-- calculated iteratively in Ada (or iteratively followed by an
-- inverse, let's see what happens when we explicitly force the power
-- operation to be done by exponentiation.
function Fib4(N: in Natural) return Natural is
begin
return Natural(Math.Exp(Nat_Log_Phi*Float(N))/Sqrt5);
end Fib4;
-- See if explicit looping for exponentiation is faster or slower...
function Fib5(N: in Natural) return Natural is
Temp: Float := InvSqrt5;
begin
for I in 1..N loop
Temp := Temp * Phi;
end loop;
return Natural(Temp);
end Fib5;
-- Last but not least, see if multiplication is faster...
function Fib6(N: in Natural) return Natural is
begin
return Natural(InvSqrt5*Phi**N);
end Fib6;
end Fibonacci;
with Text_IO;
with Fibonacci; use Fibonacci;
with Calendar;
procedure Main is
Start, Stop: Calendar.Time;
function "-" (L,R: Calendar.Time) return Duration
renames Calendar."-";
package Int_IO is new Text_IO.Integer_IO(Integer);
package Duration_IO is new Text_IO.Fixed_IO(Duration);
use Int_IO, Duration_IO;
-- If you don't know Ada, ignore these lines. They are only here
-- to make right aligned columnar output of integers and output
-- of timestamps easier.
begin
-- First check that everything works:
for I in 0..46 loop
Put(I,5);
Put(Fib1(I),12);
Put(Fib2(I),12);
Put(Fib3(I),12);
Put(Fib4(I),12);
Put(Fib5(I),12);
Put(Fib6(I),12);
Text_IO.New_Line;
end loop;
delay 0.5; -- wait for quiescence
Start := Calendar.Clock;
for I in 1..1_000_000 loop
if Fib1(I mod 47) = 42 then Text_IO.Put_Line("Ooops. :-)"); end if;
end loop;
Stop := Calendar.Clock;
Text_IO.Put(" Fib1 took, on average, ");
Put(Stop-Start,4,2);
Text_IO.Put_Line(" microseconds.");
delay 0.5; -- wait for quiescence
Start := Calendar.Clock;
for I in 1..1_000_000 loop
if Fib2(I mod 47) = 42 then Text_IO.Put_Line("Ooops. :-)"); end if;
end loop;
Stop := Calendar.Clock;
Text_IO.Put(" Fib2 took, on average, ");
Put(Stop-Start,4,2);
Text_IO.Put_Line(" microseconds.");
delay 0.5; -- wait for quiescence
Start := Calendar.Clock;
for I in 1..1_000_000 loop
if Fib3(I mod 47) = 42 then Text_IO.Put_Line("Ooops. :-)"); end if;
end loop;
Stop := Calendar.Clock;
Text_IO.Put(" Fib3 took, on average, ");
Put(Stop-Start,4,2);
Text_IO.Put_Line(" microseconds.");
delay 0.5; -- wait for quiescence
Start := Calendar.Clock;
for I in 1..1_000_000 loop
if Fib4(I mod 47) = 42 then Text_IO.Put_Line("Ooops. :-)"); end if;
end loop;
Stop := Calendar.Clock;
Text_IO.Put(" Fib4 took, on average, ");
Put(Stop-Start,4,2);
Text_IO.Put_Line(" microseconds.");
delay 0.5; -- wait for quiescence
Start := Calendar.Clock;
for I in 1..1_000_000 loop
if Fib5(I mod 47) = 42 then Text_IO.Put_Line("Ooops. :-)"); end if;
end loop;
Stop := Calendar.Clock;
Text_IO.Put(" Fib5 took, on average, ");
Put(Stop-Start,4,2);
Text_IO.Put_Line(" microseconds.");
delay 0.5; -- wait for quiescence
Start := Calendar.Clock;
for I in 1..1_000_000 loop
if Fib6(I mod 47) = 42 then Text_IO.Put_Line("Ooops. :-)"); end if;
end loop;
Stop := Calendar.Clock;
Text_IO.Put(" Fib6 took, on average, ");
Put(Stop-Start,4,2);
Text_IO.Put_Line(" microseconds.");
end Main;
Output on a Sun 3/260, using VADS Ada 6.03:
aries% fib
0 0 0 0 0 0 0
1 1 1 1 1 1 1
2 1 1 1 1 1 1
3 2 2 2 2 2 2
4 3 3 3 3 3 3
5 5 5 5 5 5 5
6 8 8 8 8 8 8
7 13 13 13 13 13 13
8 21 21 21 21 21 21
9 34 34 34 34 34 34
10 55 55 55 55 55 55
11 89 89 89 89 89 89
12 144 144 144 144 144 144
13 233 233 233 233 233 233
14 377 377 377 377 377 377
15 610 610 610 610 610 610
16 987 987 987 987 987 987
17 1597 1597 1597 1597 1597 1597
18 2584 2584 2584 2584 2584 2584
19 4181 4181 4181 4181 4181 4181
20 6765 6765 6765 6765 6765 6765
21 10946 10946 10946 10946 10946 10946
22 17711 17711 17711 17711 17711 17711
23 28657 28657 28657 28657 28657 28657
24 46368 46368 46368 46368 46368 46368
25 75025 75025 75025 75025 75025 75025
26 121393 121393 121393 121393 121393 121393
27 196418 196418 196418 196418 196418 196418
28 317811 317811 317811 317811 317811 317811
29 514229 514229 514229 514229 514229 514229
30 832040 832040 832040 832040 832040 832040
31 1346269 1346269 1346269 1346269 1346269 1346269
32 2178309 2178309 2178309 2178309 2178309 2178309
33 3524578 3524578 3524578 3524578 3524578 3524578
34 5702887 5702887 5702887 5702887 5702887 5702887
35 9227465 9227465 9227465 9227465 9227465 9227465
36 14930352 14930352 14930352 14930352 14930352 14930352
37 24157817 24157817 24157817 24157817 24157817 24157817
38 39088169 39088169 39088169 39088169 39088169 39088169
39 63245986 63245986 63245986 63245986 63245986 63245986
40 102334155 102334155 102334155 102334155 102334155 102334155
41 165580141 165580141 165580141 165580141 165580141 165580141
42 267914296 267914296 267914296 267914296 267914296 267914296
43 433494437 433494437 433494437 433494437 433494437 433494437
44 701408733 701408733 701408733 701408733 701408733 701408733
45 1134903170 1134903170 1134903170 1134903170 1134903170 1134903170
46 1836311903 1836311903 1836311903 1836311903 1836311903 1836311903
Fib1 took, on average, 47.52 microseconds.
Fib2 took, on average, 141.74 microseconds.
Fib3 took, on average, 83.50 microseconds.
Fib4 took, on average, 275.32 microseconds.
Fib5 took, on average, 317.32 microseconds.
Fib6 took, on average, 82.98 microseconds.
aries%
--
Robert I. Eachus
When the dictators are ready to make war upon us, they will not
wait for an act of war on our part." - Franklin D. Roosevelt
gl8f@astsun.astro.Virginia.EDU (Greg Lindahl) (01/15/91)
In article <josef.663497368@ugum01> josef@nixpbe.nixdorf.de (josef Moellers) writes: >losing out of sight the purpose of a >benchmark: not to produce the numbers in the least amount of time, but >rather to keep Your machine busy for some time. Gee, and here I thought the purpose of a benchmark was to predict the relative performance of various machines on the applications that I use them for. Silly me. I propose we use the "infinite loop" benchmark instead of SPEC.
burley@geech.ai.mit.edu (Craig Burley) (01/15/91)
In article <1991Jan14.202646.5677@murdoch.acc.Virginia.EDU> gl8f@astsun.astro.Virginia.EDU (Greg Lindahl) writes:
Silly me. I propose we use the "infinite loop" benchmark instead of
SPEC.
At my last job, I did some work on the ROM-based microcode, primarily
optimizing for space and, sometimes, speed, so new features could be
added to the ROM.
One thing I made smaller and, therefore, faster, was the idle loop.
It had half the instructions of the previous version yet worked as
well.
Since the idle loop accounted for 99.99% of the execution profile of
these machines, I reported to management that I had singlehandedly
improved system performance by a factor of two!
They were very impressed! (-:
Seriously, it does seem difficult to produce benchmarks that don't show,
to an unspecifiable extent, a combination of hardware speed and compiler-
produced optimizations. Globally optimizing compilers, as they become
available, might well turn many popular benchmarks into, Fortranically,
STOP statements, or WRITEs of constants.
One way to help avoid this is to have a two-phase benchmark. The first
phase uses system information like date and time to seed a random number,
then writes a data file containing values derived from this number.
The second phase, the one actually measured, then reads this file and
does the measureable calculations on it.
Of course, this might appear to corrupt timings wanted for just CPU
activity with disk activity. Even trying to isolate, in the code, one
mode from another by reading the whole file into memory, has its
problems (a global optimizer might decide, if it doesn't see the
call to begin the timer between the reads and the computes, to
reduce memory requirements and perhaps increase speed by folding the
compute loop into the I/O loop).
If the data set were small compared to the compute size of the problem --
for example, if an NP-complete problem like traveling salesman were
computed using whatever favorite (to-be-benchmarked) techniques based
on distance data in a file -- this I/O problem could be reduced to the
vanishing point. Also, by using an NP-complete problem, even a
globally optimizing compiler is less likely to find a better algorithm
that significantly affects the benchmark (if it does, that's big news,
right?!).
The other important solution, I'd like to stand on my soapbox and say,
is the proliferation of free software such that any machine you want
to benchmark is likely to already have compiler technology equivalent
to that used for the machine with which it is being compared. For
example, doing 80486 vs. 68040 benchmarks for C code is likely to be
much more helpful in isolating the hardware characteristics (when this
is desired -- it isn't always, of course!) if GNU CC is used to do
the compilations on both machines. Of course, one machine's description
might have more coverage as far as various optimizations, and differences
in library routines would remain, but at least you wouldn't be comparing
code compiled using cse against code compiled without it, for example.
--
James Craig Burley, Software Craftsperson burley@ai.mit.edu