[net.works] Ackermania

@RUTGERS.ARPA:hoey@nrl-aic (05/02/85)

From: Dan Hoey <hoey@nrl-aic.ARPA>

>From: terak!doug@topaz.arpa (Doug Pardee)
>
>I took that ol' Ackerman[n] function and ran some benchmarks.  I
>compiled it using VAX/UNIX 4.2BSD "cc -O" and ran it on our VAX
>11/750.  To compute "acker(3,6)" ten times required 62.2 seconds.  I
>then rewrote it into Z-80 assembler.  The same computation takes 18.8
>seconds on a 4 MHz Z-80A.

Gee, our 750's are out of commission right now, but are you sure you're
implementing these for speed?

>From: amdahl!mat@topaz.arpa (Mike Taylor)
>
>Why not compare apples to apples?

Silly.  We are comparing computation per dollar.  Out in the real world
we sometimes have to choose between fresh rats and rotten apples.

>From: mordor!jdb@topaz.arpa (John Bruner)
>
>By recoding Ackermann as a non-recursive function (in C), I was able
>to make my 11/750 run faster than a 4MHz Z80 :-).  My non-recursive
>version computes ack(3,6) ten times in 15.4 seconds (user time).
>...my non-recursive Ackermann takes 7.1 seconds (user) on an 11/780 but
>only 6.5 seconds on an 11/70.

Oh come on.  Let us compare apples and photon torpedoes.

I just reimplemented Ackermann's function in C on a 780.  It computes
acker(3,6) in 31 microseconds, as measured by computing it a million
times in 31 seconds.  Call it 20,000 times as fast as John's and 100
times as fast as Doug's (assuming he can write assembler code for the
Vax and get a 500X speedup over his Z80).

And it computes acker(3,28)=2147483645 in 31 microseconds, measured the
same way.  I haven't heard any results on this from the net, but I
think I can claim at least another seven orders of magnitude speedup
over the competition.  If you understand Ackermann's function you know
how I did it.  And if you think that's cheating....

Out here in the real world, we are interested in results.  No one cares
how you compute acker(3,6), they just want the answer (if that).  If a
780 costs $500K and lasts 10 years, my computation of acker(3,6) cost
about five millionths of a cent.  Doug's twenty seconds on a $500 Z-80
that lasts 5 years costs over a thousand times as much!  The bottom
line is that we came out even within a hundredth of a cent.

Well, it took me an hour--say $50 of programmer time--to design, write,
and debug my program.  I would rather bet it took Doug longer to write
the Z80 program.  But hell, what's a couple hundred bucks in the high-
paying world of computer programming?

But now let us suppose that Ackermann's function gets really popular.
It gets evaluated 1000 times a day using 10 different brands of CPU's.
Coding ten assembler versions will cost $1K, and the thousand
evaluations might add up to a penny a day.  This is opposed to a
straightforward C code that costs only $100 to write once but runs a
thousand times slower:  $10/day.  It seems that the assembly coder will
break even in about three months.

But better algorithms are developed all the time.  Let's imagine I come
on the scene after one month with a C program that runs as fast as the
assembler version.  Well, the assembler coder has saved $300 in that
month, but his development costs were $800 more, and he isn't narrowing
the gap any more.  He could invest another $1K to implement my
algorithm in assembler and save almost all of that penny a day, but he
would be better off using the shotgun.

If you don't believe me, where are Doug's figures for Vax assembler?

Dan Hoey
Hoey@NRL-AIC.ARPA

P.S.  Now that I've written the program, anyone who's interested can
drop me a line -- hoey@NRL-AIC.ARPA -- and I'll try to send a copy.  If
there's a lot of response, I suppose it could go to net.sources.  It
does have a nice proof that it computes Ackermann's function correctly.

But it really needs a manual page, and I'm pretty lazy.  I was so lazy
I didn't feel like hacking multiple-precision arithmetic, so it doesn't
compute correctly much larger than acker(3,28).  If anyone is
interested in such things, go for it.  In fact, I would be interested
in buying a paper hardcopy of the correct value of acker(4,3) printed
as a decimal integer, if anyone can produce one inexpensively.  :-) DH

doug@terak.UUCP (Doug Pardee) (05/13/85)

I probably should've let this rest, but I can't resist...

When I selected the Ackermann function to "benchmark" HLLs vs. assembler
I did so knowing that because it was written recursively, the calls
would eat C alive.  I even recall saying so in my original posting.

But now I see news postings and get mail insisting that it was "unfair"
to multiply this effect by comparing a VAX against a Z-80A, because
the VAX is crippled by having an excruciatingly slow "call" instruction.
Now, I personally think that save-everything-in-the-machine-because-we-
don't-know-what-the-subroutine's-gonna-do type "call" instructions
are a direct result of HLL-mania and thus are fair game, but I'll give
the VAX the benefit of the doubt.

Dan Hoey was kind enough to pass along the non-recursive version of
Ackermann that he'd developed, so I benchmarked that version.  The
Z-80A didn't do nearly as well this time.

This time, it beat the VAX 750, but not the 780.

The time for 1 million computations of acker(3,6) (seconds):
  VAX 750  optimized C  -- 59.2u
  4MHz Z-80A  assembler -- 47.1
  VAX 780  optimized C  -- 34.4u

So I guess that for code which doesn't do a lot of "call" instructions,
you *can* make up the CPU time overhead of C code by trading in your
Timex/Sinclair 1000 for a VAX, as long as you get at least a 780.
-- 
Doug Pardee -- Terak Corp. -- !{ihnp4,seismo,decvax}!noao!terak!doug
               ^^^^^--- soon to be CalComp