[comp.arch] "Life" benchmarks

rick@pcrat.UUCP (Rick Richardson) (12/29/87)

In article <1295@sugar.UUCP> peter@sugar.UUCP (Peter da Silva) writes:
>Ah, benchmarks.
>
>The only non-graphics algorithm that does anywhere near this that I have
>heard of is a 20-generation-per-second 320-by-200 LIFE program.

	[ 1.2 million life pixels/second ]

This reminds me of an interesting machine I worked on circa 1977.  It
was (is?) called the Cytocomputer - a cellular image processor.  The
machine was a long pipeline of identical MSI processors.  I think
we had around 87 processors in the prototype.  The image (1024x1024x8)
was pumped through the pipeline at 1 Mhz.  The prototype required
two processors to perform one generation of life (one for deaths,
one for births).  Thus, we could do 43 generations/second, neglecting
pipeline fill time.  Thats about 45 million life pixels / second.
Of course, you couldn't see the intermediate results.  A later
design was supposed to clock at 10Mhz and would only require
one processor per life generation.

I wonder what current processors are capable of on the "life" benchmark?

-- 
	Rick Richardson, President, PC Research, Inc.
(201) 542-3734 (voice, nights)   OR   (201) 834-1378 (voice, days)
		seismo!uunet!pcrat!rick

brucek@hpsrla.HP.COM (Bruce Kleinman) (01/01/88)

+--------
| I wonder what current processors are capable of on the "life" benchmark?
+--------

BEFORE I POST ANY NUMBERS, I wish to make it clear that the performance
achieved with BKLife is mostly due to the algorithm, not the processor.

I assume that the algorithms used with the Blitter and the Cytocomputer used
a brute-force approach (processing all cells in the universe, or perhaps a
bounded area of cells).  Even fairly dense life patterns rarely exceed 10
percent density; that is, only one in ten cells are occupied at any time.
A brief analysis shows that a brute-force algorithm will spend most of its
time on these empty cells.

Enter the BKLife algorithm, a fairly efficient life routine which achieves
very high performance on very large universes.  The routine tracks all living
cells, and processes only the appropriate regions of the universe.  Empty cell
areas have no effect on performance.


AS FOR WHICH NUMBERS TO POST, the concept of "life pixels/second" isn't
valid for an adaptive algorithm, as I can simply place a blinker (or any other
very small pattern) in the middle of a huge virtual universe (say, 8192 by
1024), and produce numbers in the vicinity of 10 billion life pixels/second.
Makes an impressive marketing benchmark, huh?  Anyway, it has no real value,
as only 15 of the 8M cells are being recalculated each generation.

After long and careful consideration, my associates and I arrived on the
puffer train as the 'standard' life pattern for our benchmarks.  The puffer
produces an active tail as it progresses through the universe.  The tail does
not stabilize until generation 5533, so the pattern is fairly interesting to
boot.  Benchmarks run on workstations do not include screen updates, as I/O
has a nasty habit of slowing things down ;-).

Benchmark number one is the first 1000 generations of the puffer train, which
must be run on at least a 600 by 600 universe to avoid boundary problems.  This
configuration requires about 0.7MB with BKLife, which makes it manageable on
smallish machines.

Benchmark number two is the first 5600 generations of the puffer train, which
requires at least 6000 by 600 universe.  This requires about 6MB with BKLife.


AT LAST THE NUMBERS, which are for a Hewlett Packard 9000 series 350 (25 MHz
68020 box) with 8 MBytes DRAM, running version 5.5 of HP-UX.

     Benchmark 1 ... 19.6 s = 50.9 puffer generations per second (pGPS)
     Benchmark 2 ... 1118 s =  5.0 puffer generations per second (pGPS)

As the numbers show that the puffer train grows rapidly, which slows down the
algorithm.  Further, BKLife accesses memory with a rather odd sense of
locality, and at a certain point the d-cache becomes useless.

Although the user interface does not yet support it, BKLife was designed to
be used in two- or three-dimensional universes.  The central algorithm is
flexible enough to handle an arbitrary number of dimensions with absolutely
no modifications.  Additionally the life rule (Conway's life rule is 2233) can
be specified, allowing more investigation.


INTERESTED PARTIES can contact me via e-mail for a copy of the code.  It was
initially written entirely in C, using the curses library for I/O (a version
using X Windows is in the works).  The critical paths were later coded in 68020
assembly, but I still have C versions of these routines.  I also have a
selection of interesting life patterns, including the ever-popular glider gun
and space rake.  Furthermore, the package includes both a user guide and a
reasonable discussion of the code itself (documentation - hard to believe).


                              Bruce  Kleinman
              Hewlett Packard - Network Measurements Division

                            brucek@hpnmd.hp.com
                       ...ucbvax!hplabs!hpnmd!brucek

brucek@hpsrla.HP.COM (Bruce Kleinman) (01/05/88)

+--------
|                             Bruce  Kleinman
|             Hewlett Packard - Network Measurements Division
|
|                           brucek@hpnmd.hp.com
|                      ...ucbvax!hplabs!hpnmd!brucek
+--------

Those having trouble reaching me with the above addresses might try the
following ...

        brucek%hpnmd@hplabs.hp.com
     ...ucbvax!hpcea!hpnmd!brucek


Those who trust the net as little as I do might try the following ...

        Hewlett-Packard Company
        Bruce Kleinman - MS 4USQ
        1400 Fountaingrove Parkway
        Santa Rosa, California 95403

brucek@hpsrla.HP.COM (Bruce Kleinman) (01/07/88)

It seems that plenty of people are, in fact, getting through to me with
requests for the BKLife package.  I am, in fact, filling requests within
24 hours of their arrival.  One observation - PLEASE GIVE ME A PATH TO USE!

Trying to decifer a uucp/bitnet/arpanet return path is as bad as ... uh ...
well ... as bad as trying to decifer a uucp/bitnet/arpanet return path.  I
prefer full domain addressing ( i.e. brucek%hpnmd@hplabs.hp.com ), but uucp is
fine ( i.e. ucbvax!hpcea!hpnmd!brucek ).

If you are sending me a new request - include a nice solid path.  If you have
already sent in a request - if you do not get a reply within 3 days, drop me
another line, with a nice solid path.

A final note - in return for the package I would like benchmark numbers for
the machine(s) you are running.  I am not attempting anything comprehensive,
but I am very interested in performance numbers for a variety of machines.
The benchmark methodology is discuessed in the FIRST.README file.  Thanks.


Bruce "don't trust that return path" Kleinman.

+--------
| +--------
| |                             Bruce  Kleinman
| |             Hewlett Packard - Network Measurements Division
| |
| |                           brucek@hpnmd.hp.com
| |                      ...ucbvax!hplabs!hpnmd!brucek
| +--------
| 
| Those having trouble reaching me with the above addresses might try the
| following ...
| 
| brucek%hpnmd@hplabs.hp.com
| ...ucbvax!hpcea!hpnmd!brucek
+--------

peter@sugar.UUCP (Peter da Silva) (01/08/88)

Smiley.

Sorta.

LPPS: Life pixels per second.

Machine			Wraparound		No wraparound.

Amiga			1.28M			1.47M
Cytocomputer (a,b)	45M			45M

(a) Whether wraparound at the edge of the universe is implemented is not
known.

(b) No display of intermediate results.
-- 
-- Peter da Silva  `-_-'  ...!hoptoad!academ!uhnix1!sugar!peter
-- Disclaimer: These U aren't mere opinions... these are *values*.

peter@sugar.UUCP (Peter da Silva) (01/08/88)

In article ... brucek@hpsrla.HP.COM (Bruce Kleinman) writes:
> | I wonder what current processors are capable of on the "life" benchmark?
> 
> BEFORE I POST ANY NUMBERS, I wish to make it clear that the performance
> achieved with BKLife is mostly due to the algorithm, not the processor.

You can provide a faster algorithm for calculating primes than the seive of
Eratosthenes, but if you do then you aren't running the Seive benchmark.

So, the Puffer benchmark isn't the same as the LPPS benchmark. The idea of
the LPPS benchmark is that it's a good test of parallel and array processors,
which the seive is notoriously bad at.
-- 
-- Peter da Silva  `-_-'  ...!hoptoad!academ!uhnix1!sugar!peter
-- Disclaimer: These U aren't mere opinions... these are *values*.

ortiz@think.COM (Luis F. Ortiz) (01/08/88)

In article <1380@sugar.UUCP> peter@sugar.UUCP (Peter da Silva) writes:
...
	>LPPS: Life pixels per second.

	>Machine		Wraparound		No wraparound.

	>Amiga			1.28M			1.47M
	>Cytocomputer (a,b)	45M			45M

	Commnection Machine     1,500M                  ???

This was computed by timing the machine as it updated all of a 1024x1024
life grid on a 64K processor Connection Machine.

Rumor has it that someone here has gotten around 3,000 MLPPS around
TMC, but I have yet to see it demonstrated to me.

Anyone out there have a better figure?

-- cisco



----------------------------------------------------------------
Luis F. Ortiz
Thinking Machines Corporation

ARPANET: ortiz@think.com, ortiz@yale.edu
UUCP:    {decvax,harvard,seismo,cmcl2}!yale!ortiz,
	 harvard!think!ortiz
BITNET:  ortiz@yalecs.BITNET

"Whenever people agree with me, I always think I must be wrong."
   --- Oscar Wilde
----------------------------------------------------------------

mjr@osiris.UUCP (Marcus J. Ranum) (01/10/88)

:-)	:-)	:-)	:-)	:-)	:-)

	There's also the amazing "Jerry Pournelle BASIC Benchmark" for you
BYTE readers !!  Is there anyone out there in netland with a Cray and a 
Cray BASIC compiler so I can benchmark their slo-mo-fo against my Turbo XT ??

--mjr();
-- 
------------------------------------------------------------------------------
...ich bin in einem dusenjet ins jahr 53 vor chr...ich lande im antiken Rom...
                     einige gladiatoren spielen scrabble...ich rieche PIZZA...