[comp.benchmarks] VGX benchmark redux

thomson@cs.utah.edu (Rich Thomson) (04/09/91)

In article <1991Mar28.213128.9355@hellgate.utah.edu>, I posted a VGX
benchmark program mailed to me by Brian McClendon of SGI.  I have
recently found the time to take a close look at this program and the
one posted by Kurt Akeley.

Although I have yet to try this particular program out on a VGX
machine, I will postpone that effort folr this particular program.  If
we examine the code of the program, we find that the polygons it is
attempting to display are created with the following loop:

#define SQRT3_2	(1.7321/2.0)
    /* initialize data arrays */
    for (i=0; i<(1 + NUMTRI/2); i++) {
	tribuf[i*8+0] = size*i;
	tribuf[i*8+1] = 0;
	tribuf[i*8+2] = 0;
	tribuf[i*8+4] = size*i + size/2;
	tribuf[i*8+5] = size*SQRT3_2;
	tribuf[i*8+6] = 0;
    }

    [...]

    bgntmesh();
    for(i=0;i<(1 + NUMTRI/2);i++)
    {
    	n3f(&normbuf[(i%2)*4]);
    	v3f(&tribuf[i*8]);
    	n3f(&normbuf[(i%4)*4]);
    	v3f(&tribuf[i*8 + 4]);
    }
    endtmesh();
    closeobj();

Notice that this creates a big, linear triangle strip that stretches
off the right side of the screen (especially if the triangles are the
50-pixel triangles quoted in the marketing literature).  This results
in most of the triangles being clipped from the view volume.

The program that Kurt Akeley posted in article
<1991Apr1.154902.17858@odin.corp.sgi.com> was much more reasonable, it
created a certain number of triangles per strip, with each strip being
linear, but with all the strips beginning at the same position
relative to the display window:

    /* initialize data arrays */
    for (i=0; i<MAXVERTEX; i+=1) {
	meshbuf[VERTSIZE*i+0] = (i&1) ? 0.0 : 1.0;
	meshbuf[VERTSIZE*i+1] = 0.0;
	meshbuf[VERTSIZE*i+2] = (i&1) ? 1.0 : 0.0;
	meshbuf[VERTSIZE*i+3] = 0;
	meshbuf[VERTSIZE*i+4] = 10.0 + (float)(size*(i>>1)) +
				       (float)(offset*(i&1));
	meshbuf[VERTSIZE*i+5] = 10.0 + (float)(size*(i&1));
	meshbuf[VERTSIZE*i+6] = 0.0;
	meshbuf[VERTSIZE*i+7] = 0;
    }

	[...]

#define LIGHTVERT(i) n3f(fp+(VERTSIZE*(i))); v3f(fp+(VERTSIZE*(i))+4)
    for (i=events; i>0; i--) {
	fp = meshbuf;
	bgntmesh();
	LIGHTVERT(0);
	LIGHTVERT(1);
	LIGHTVERT(2);
	endtmesh();
    }

Now on to some comments on Kurt's article:

> We take our graphics performance claims very seriously here at Silicon
> Graphics.

I'm sure you take them as seriously as MIPS, HP, and IBM take their
spec mark ratings.  Sadly the graphics community does not yet have the
equivalent of the specmark rating on which to intelligently compare
different platforms.  Just look at the claims made when comparing X
implementations.  The customer gets left in the lurch unless they
undertake analyzing the voluminous output of x11perf to find out the
real story.

I began to be skeptical when I saw the figure posted several times on
comp.graphics and queries to the poster responded with "its from out
marketing literature, I'll ask a ``tech type'' to send you a program"
(I never heard back from him).  Also, at a recent VGX demonstration at
the U, the rep couldn't tell me details about the figure, nor could he
show me a program with a high polygon rate.  He also didn't have any
models with several hundred thousand (say, 40% of the peak figure,
or 300K - 400K polygons) polygons, although he's a sharp enough man
that I imagine he WILL have them next time in case I'm there. ;-}

Hopefully, when the Graphics Performance Committee releases its
Picture Level Benchmark program (& numbers come forth from vendors)
this situation will be alleviated.  For now, we are stuck with
comparing performance numbers from each different vendor and
attempting to infer useful comparisons from widely differing measures.

For instance, you say:
> [quoted performance comes from] tuned programs that use ONLY
> commands that are available in the Graphics Library.

So these numbers are highly tuned for the architecture of the VGX and
are reproducible only with a vendor-specific library.  This is very
understandable, giving the position SGI holds in the 3D market, but it
is very difficult to compare different platforms with these kinds of
numbers in your hand.  [Perhaps that is the intention of the marketing
dept? ;-]

> I ran this program on my 5-span VGX with the following results:
>    size=8, offset=4, zbuffer(1), events=500000, lighting=1
>    running on cashew, GL4DVGX-4.0, Fri Mar 29 15:22:58 1991
>    Triangle mesh performance (lighted):
>       1 triangles per mesh: 189393 triangles per second
[stuff deleted]
>      30 triangles per mesh: 675648 triangles per second
>      62 triangles per mesh: 714240 triangles per second
>    Display listed triangle mesh (lighted):
>      62 triangles per mesh: 769181 triangles per second
>    Display listed triangle mesh (colored):
>      62 triangles per mesh: 1020342 triangles per second

I find this interesting.  Apparently, the way to max out the VGX is to
use display lists.  I thought SGI considered display lists "naughty".
Several times on comp.graphics, SGI folks have bashed display-list
oriented techniques and the company's position paper on "PEX & PHIGS"
states over and over the advantages of immediate mode over display-list
techniques.  I find it particularly ironic then that the 1 M p/s
number comes from display-list techniques.

Another poster asked about how things change when lights are turned
on, etc.  I think Kurt's table (along with examining the source)
answers this question.  Naturally, the more lights are turned on, the
slower things get (can't compute everything instantaneously).  Also, I
notice that these polygons aren't depth cued, which would also reduce
the numbers somewhat (naturally, as stated they are PEAK numbers).

> Note that performances of well over 1 million triangles per second are
> achieved for long meshes of single- and multi-colored triangles, with
> the zbuffer enabled.  When lighting and smooth shading are enabled, the
> performance drops to roughly 3/4 of a million triangles per second.

I notice that the zbuffer was enabled, but that the Z test was set to
ZF_ALWAYS.  I can imagine a good microcoder optimizing that case so as
to not perform the read-modify-write cycle to the Z buffer (since the
test will always win anyway).  Is a r-m-w cycle taking place, or is it
just being written through?

Thanks again Kurt for clarifying these mysteries!

						-- Rich
Rich Thomson	thomson@cs.utah.edu  {bellcore,hplabs,uunet}!utah-cs!thomson
    ``Read my MIPs -- no new VAXes!!''  --George Bush after sniffing freon

tohanson@gonzo.lerc.nasa.gov (Jeff Hanson) (04/10/91)

Rich Thomson writes (and makes some good points, too.)

[ ... stuff deleted ... ]

> Sadly the graphics community does not yet have the
> equivalent of the specmark rating on which to intelligently compare
> different platforms.  Just look at the claims made when comparing X
> implementations.  The customer gets left in the lurch unless they
> undertake analyzing the voluminous output of x11perf to find out the
> real story.

Any interested in x11perf benchmarking and/or information on PLB benchmark
should get the following publication.  HP Apollo 9000 Series 700 - 
Performance Brief (5091-1137E 3/91).  In it you will find x11perf organized
into 4 groups as proposed by Digital Review.  (I wrote to DR urging them
to make their programs available that organize the data and draw the Kiviat
graph, no reply so far.  Perhaps HP could make this available.)  You will
also find the preliminary PLB numbers that were published in the January
issue of the Anderson Report.  These numbers were also published in Unix
Today.  I urge anyone involved in graphics and benchmarking to get more
information about PLB because you will be able to create PLB benchmarks and
run them in the very near future (say 6 months, max).  A brief synopsis is
below.

The Picture-Level Benchmark - The Industry's Solution for Measuring Graphics
Display Performance.

What is the PLB - The PLB is a software package that provides a standard
method of measuring graphics display performance for different hardware
platforms.  It consists of three elements:

The Benchmark Interface Format (BIF), a standardized file structure that
allows users to port application geometry and actions the geometry will
perform to the PLB program.

The Benchmark Timing Methodology (BTM), which provides a consistent method
of measuring the time it takes for hardware to display and perform actions
on a user's application geometry.

The Benchmark Reporting Format (BRF), which provides a standardized report
that allows "apple-to-apple" comparisons of graphics display performance
for different hardware platforms.

How do you use the PLB? - The first step is to translate your data sets from
a typical application into the standard BIF.  Once your data set has been
translated, you are ready to run performance test.  At the vendor's site or
your own, you can view your data set as it runs on the vendor's system.
The viewing is important, since the PLB does not measure image quality --
it is up to you to make these visual comparisons among the different
systems you test.

For more information contact:

NCGA Technical Services and Standards
2722 Merrilee Drive, Suite 200
Fairfax, VA 22031
Phone: 703-698-9600, ext. 318
Fax: 703-560-2752

[ ... stuff deleted ... ]

> Also, at a recent VGX demonstration at
> the U, the rep couldn't tell me details about the figure, nor could he
> show me a program with a high polygon rate.  He also didn't have any
> models with several hundred thousand (say, 40% of the peak figure,
> or 300K - 400K polygons) polygons, although he's a sharp enough man
> that I imagine he WILL have them next time in case I'm there. ;-}

The powerflip program accepts several models so you can load up a few thousand
polygons.  It also gives the polygons/second.

> Hopefully, when the Graphics Performance Committee releases its
> Picture Level Benchmark program (& numbers come forth from vendors)
> this situation will be alleviated.  For now, we are stuck with
> comparing performance numbers from each different vendor and
> attempting to infer useful comparisons from widely differing measures.

Beat on your vendor of choice for PLB numbers.  User demands shall be heard!

[ ... stuff deleted ... ]
--
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
 \ / \ / \ / \ / \ / \ /        Jeff Hanson            \ / \ / \ / \ / \ / \ / 
  *   ViSC: Better    *  tohanson@gonzo.lerc.nasa.gov   *   *   *   *   *   *  
 / \ / \ Science / \ / \  NASA Lewis Research Center   / \ / \ Through / \ / \ 
*   *   *   *   *   *   *   Cleveland, Ohio 44135     *   *   *  Pictures *   *
 \ / \ / \ / \  Telephone - (216) 433-2284  Fax - (216) 433-2182   \ / \ / \ / 
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*

kurt@cashew.asd.sgi.com (Kurt Akeley) (04/11/91)

In article <1991Apr9.154616.1976@hellgate.utah.edu>, thomson@cs.utah.edu (Rich Thomson) writes:

[stuff deleted]

|> 
|> >    Display listed triangle mesh (colored):
|> >      62 triangles per mesh: 1020342 triangles per second
|> 
|> I find this interesting.  Apparently, the way to max out the VGX is to
|> use display lists.  I thought SGI considered display lists "naughty".

While we may have implied this, it is not our technical position.  The
Graphics Library has included graphical objects from its creation, and will
continue to do so.  Graphical objects are the right choice for network
graphics, for example, and may also yield the best performance in simplistic
example codes (such as my benchmark).  What *is* naughty is to force
programmers to use graphical objects, or to force them to use immediate mode.
We do neither.

[stuff deleted]

|> I notice that the zbuffer was enabled, but that the Z test was set to
|> ZF_ALWAYS.  I can imagine a good microcoder optimizing that case so as
|> to not perform the read-modify-write cycle to the Z buffer (since the
|> test will always win anyway).  Is a r-m-w cycle taking place, or is it
|> just being written through?

The r-m-w cycle is taking place.  Because ZF_ALWAYS does not eliminate the
nead for the write cycle, it simply isn't worth it to us to optimize this
case.

-- kurt