thomson@cs.utah.edu (Rich Thomson) (04/09/91)
In article <1991Mar28.213128.9355@hellgate.utah.edu>, I posted a VGX benchmark program mailed to me by Brian McClendon of SGI. I have recently found the time to take a close look at this program and the one posted by Kurt Akeley. Although I have yet to try this particular program out on a VGX machine, I will postpone that effort folr this particular program. If we examine the code of the program, we find that the polygons it is attempting to display are created with the following loop: #define SQRT3_2 (1.7321/2.0) /* initialize data arrays */ for (i=0; i<(1 + NUMTRI/2); i++) { tribuf[i*8+0] = size*i; tribuf[i*8+1] = 0; tribuf[i*8+2] = 0; tribuf[i*8+4] = size*i + size/2; tribuf[i*8+5] = size*SQRT3_2; tribuf[i*8+6] = 0; } [...] bgntmesh(); for(i=0;i<(1 + NUMTRI/2);i++) { n3f(&normbuf[(i%2)*4]); v3f(&tribuf[i*8]); n3f(&normbuf[(i%4)*4]); v3f(&tribuf[i*8 + 4]); } endtmesh(); closeobj(); Notice that this creates a big, linear triangle strip that stretches off the right side of the screen (especially if the triangles are the 50-pixel triangles quoted in the marketing literature). This results in most of the triangles being clipped from the view volume. The program that Kurt Akeley posted in article <1991Apr1.154902.17858@odin.corp.sgi.com> was much more reasonable, it created a certain number of triangles per strip, with each strip being linear, but with all the strips beginning at the same position relative to the display window: /* initialize data arrays */ for (i=0; i<MAXVERTEX; i+=1) { meshbuf[VERTSIZE*i+0] = (i&1) ? 0.0 : 1.0; meshbuf[VERTSIZE*i+1] = 0.0; meshbuf[VERTSIZE*i+2] = (i&1) ? 1.0 : 0.0; meshbuf[VERTSIZE*i+3] = 0; meshbuf[VERTSIZE*i+4] = 10.0 + (float)(size*(i>>1)) + (float)(offset*(i&1)); meshbuf[VERTSIZE*i+5] = 10.0 + (float)(size*(i&1)); meshbuf[VERTSIZE*i+6] = 0.0; meshbuf[VERTSIZE*i+7] = 0; } [...] #define LIGHTVERT(i) n3f(fp+(VERTSIZE*(i))); v3f(fp+(VERTSIZE*(i))+4) for (i=events; i>0; i--) { fp = meshbuf; bgntmesh(); LIGHTVERT(0); LIGHTVERT(1); LIGHTVERT(2); endtmesh(); } Now on to some comments on Kurt's article: > We take our graphics performance claims very seriously here at Silicon > Graphics. I'm sure you take them as seriously as MIPS, HP, and IBM take their spec mark ratings. Sadly the graphics community does not yet have the equivalent of the specmark rating on which to intelligently compare different platforms. Just look at the claims made when comparing X implementations. The customer gets left in the lurch unless they undertake analyzing the voluminous output of x11perf to find out the real story. I began to be skeptical when I saw the figure posted several times on comp.graphics and queries to the poster responded with "its from out marketing literature, I'll ask a ``tech type'' to send you a program" (I never heard back from him). Also, at a recent VGX demonstration at the U, the rep couldn't tell me details about the figure, nor could he show me a program with a high polygon rate. He also didn't have any models with several hundred thousand (say, 40% of the peak figure, or 300K - 400K polygons) polygons, although he's a sharp enough man that I imagine he WILL have them next time in case I'm there. ;-} Hopefully, when the Graphics Performance Committee releases its Picture Level Benchmark program (& numbers come forth from vendors) this situation will be alleviated. For now, we are stuck with comparing performance numbers from each different vendor and attempting to infer useful comparisons from widely differing measures. For instance, you say: > [quoted performance comes from] tuned programs that use ONLY > commands that are available in the Graphics Library. So these numbers are highly tuned for the architecture of the VGX and are reproducible only with a vendor-specific library. This is very understandable, giving the position SGI holds in the 3D market, but it is very difficult to compare different platforms with these kinds of numbers in your hand. [Perhaps that is the intention of the marketing dept? ;-] > I ran this program on my 5-span VGX with the following results: > size=8, offset=4, zbuffer(1), events=500000, lighting=1 > running on cashew, GL4DVGX-4.0, Fri Mar 29 15:22:58 1991 > Triangle mesh performance (lighted): > 1 triangles per mesh: 189393 triangles per second [stuff deleted] > 30 triangles per mesh: 675648 triangles per second > 62 triangles per mesh: 714240 triangles per second > Display listed triangle mesh (lighted): > 62 triangles per mesh: 769181 triangles per second > Display listed triangle mesh (colored): > 62 triangles per mesh: 1020342 triangles per second I find this interesting. Apparently, the way to max out the VGX is to use display lists. I thought SGI considered display lists "naughty". Several times on comp.graphics, SGI folks have bashed display-list oriented techniques and the company's position paper on "PEX & PHIGS" states over and over the advantages of immediate mode over display-list techniques. I find it particularly ironic then that the 1 M p/s number comes from display-list techniques. Another poster asked about how things change when lights are turned on, etc. I think Kurt's table (along with examining the source) answers this question. Naturally, the more lights are turned on, the slower things get (can't compute everything instantaneously). Also, I notice that these polygons aren't depth cued, which would also reduce the numbers somewhat (naturally, as stated they are PEAK numbers). > Note that performances of well over 1 million triangles per second are > achieved for long meshes of single- and multi-colored triangles, with > the zbuffer enabled. When lighting and smooth shading are enabled, the > performance drops to roughly 3/4 of a million triangles per second. I notice that the zbuffer was enabled, but that the Z test was set to ZF_ALWAYS. I can imagine a good microcoder optimizing that case so as to not perform the read-modify-write cycle to the Z buffer (since the test will always win anyway). Is a r-m-w cycle taking place, or is it just being written through? Thanks again Kurt for clarifying these mysteries! -- Rich Rich Thomson thomson@cs.utah.edu {bellcore,hplabs,uunet}!utah-cs!thomson ``Read my MIPs -- no new VAXes!!'' --George Bush after sniffing freon
tohanson@gonzo.lerc.nasa.gov (Jeff Hanson) (04/10/91)
Rich Thomson writes (and makes some good points, too.) [ ... stuff deleted ... ] > Sadly the graphics community does not yet have the > equivalent of the specmark rating on which to intelligently compare > different platforms. Just look at the claims made when comparing X > implementations. The customer gets left in the lurch unless they > undertake analyzing the voluminous output of x11perf to find out the > real story. Any interested in x11perf benchmarking and/or information on PLB benchmark should get the following publication. HP Apollo 9000 Series 700 - Performance Brief (5091-1137E 3/91). In it you will find x11perf organized into 4 groups as proposed by Digital Review. (I wrote to DR urging them to make their programs available that organize the data and draw the Kiviat graph, no reply so far. Perhaps HP could make this available.) You will also find the preliminary PLB numbers that were published in the January issue of the Anderson Report. These numbers were also published in Unix Today. I urge anyone involved in graphics and benchmarking to get more information about PLB because you will be able to create PLB benchmarks and run them in the very near future (say 6 months, max). A brief synopsis is below. The Picture-Level Benchmark - The Industry's Solution for Measuring Graphics Display Performance. What is the PLB - The PLB is a software package that provides a standard method of measuring graphics display performance for different hardware platforms. It consists of three elements: The Benchmark Interface Format (BIF), a standardized file structure that allows users to port application geometry and actions the geometry will perform to the PLB program. The Benchmark Timing Methodology (BTM), which provides a consistent method of measuring the time it takes for hardware to display and perform actions on a user's application geometry. The Benchmark Reporting Format (BRF), which provides a standardized report that allows "apple-to-apple" comparisons of graphics display performance for different hardware platforms. How do you use the PLB? - The first step is to translate your data sets from a typical application into the standard BIF. Once your data set has been translated, you are ready to run performance test. At the vendor's site or your own, you can view your data set as it runs on the vendor's system. The viewing is important, since the PLB does not measure image quality -- it is up to you to make these visual comparisons among the different systems you test. For more information contact: NCGA Technical Services and Standards 2722 Merrilee Drive, Suite 200 Fairfax, VA 22031 Phone: 703-698-9600, ext. 318 Fax: 703-560-2752 [ ... stuff deleted ... ] > Also, at a recent VGX demonstration at > the U, the rep couldn't tell me details about the figure, nor could he > show me a program with a high polygon rate. He also didn't have any > models with several hundred thousand (say, 40% of the peak figure, > or 300K - 400K polygons) polygons, although he's a sharp enough man > that I imagine he WILL have them next time in case I'm there. ;-} The powerflip program accepts several models so you can load up a few thousand polygons. It also gives the polygons/second. > Hopefully, when the Graphics Performance Committee releases its > Picture Level Benchmark program (& numbers come forth from vendors) > this situation will be alleviated. For now, we are stuck with > comparing performance numbers from each different vendor and > attempting to infer useful comparisons from widely differing measures. Beat on your vendor of choice for PLB numbers. User demands shall be heard! [ ... stuff deleted ... ] -- *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* \ / \ / \ / \ / \ / \ / Jeff Hanson \ / \ / \ / \ / \ / \ / * ViSC: Better * tohanson@gonzo.lerc.nasa.gov * * * * * * / \ / \ Science / \ / \ NASA Lewis Research Center / \ / \ Through / \ / \ * * * * * * * Cleveland, Ohio 44135 * * * Pictures * * \ / \ / \ / \ Telephone - (216) 433-2284 Fax - (216) 433-2182 \ / \ / \ / *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
kurt@cashew.asd.sgi.com (Kurt Akeley) (04/11/91)
In article <1991Apr9.154616.1976@hellgate.utah.edu>, thomson@cs.utah.edu (Rich Thomson) writes: [stuff deleted] |> |> > Display listed triangle mesh (colored): |> > 62 triangles per mesh: 1020342 triangles per second |> |> I find this interesting. Apparently, the way to max out the VGX is to |> use display lists. I thought SGI considered display lists "naughty". While we may have implied this, it is not our technical position. The Graphics Library has included graphical objects from its creation, and will continue to do so. Graphical objects are the right choice for network graphics, for example, and may also yield the best performance in simplistic example codes (such as my benchmark). What *is* naughty is to force programmers to use graphical objects, or to force them to use immediate mode. We do neither. [stuff deleted] |> I notice that the zbuffer was enabled, but that the Z test was set to |> ZF_ALWAYS. I can imagine a good microcoder optimizing that case so as |> to not perform the read-modify-write cycle to the Z buffer (since the |> test will always win anyway). Is a r-m-w cycle taking place, or is it |> just being written through? The r-m-w cycle is taking place. Because ZF_ALWAYS does not eliminate the nead for the write cycle, it simply isn't worth it to us to optimize this case. -- kurt