XBR2D96D@DDATHD21.BITNET (Knobi der Rechnerschrat) (12/19/88)
Hello Netlanders, I've a few questions about SGI's new GTX architecture. They are based on the 3.1 release notes and a document called "IRIS GTX: A Technical Report, Rev 2": - which type of CPU (16 MHZ or 25 MHZ) and how many of them do I need to get the full graphics speed (100.000 Z-buffered 4-sided, G-shaded, P-lighted, independent polygons). I ask this question, because one of SGI's competitors (they have a vector/parallel-oriented Workstation with up to 4 CPU's, Graphics computations done in the CPU) had to admit (after applying some spanish inqusition tools) that they need 4 CPU's to reach their maximum graphics performance and that there may exist situations, where graphics can consume all resources of the system. - Chapter "8.2 Graphics Notes" in the 4D-3.1 release notes states that some of the graphics routines (c3*, c4*, n3f, v2*, v3*, v4*) should be called with quadword-aligned data to get full GTX performance. Does this mean all the variables have to be "double" (which I don't beleave) or that the first byte of a "float x[3]" vector has to start on a quadword-address? In the latter case I only have to rearrange our data-structures. - does shademodel(FLAT) work again under 3.1? As a last point I want to comment on Jim Frost who wrotes a note about > Subject: SGI's interesting idea of a "speedup" . . . . >Interestingly, the 10x factor seems to be correct as one of our >customers reported that our product "ran ten times slower" on the GT. > >We happily followed the SGI guide to speed them up. At one point we >changed all our readpixel() calls to rectread() calls, a non-trivial >task because they don't have the same arguments at all. To our great >surprise, the following was printed when the new call was made: > > <rectread> is not implemented. > >We were impressed at just how fast their new function didn't work, as >I'm sure you can guess. > >Curious, we investigated. Making use of "strings", we found that >libgl_s.a contained the string "<%s> is not implemented.". Just how >many functions might call whatever routine has that string is >something that scares me. > >Jim Frost >Associative Design Technology >(508) 366-9166 >madd@bu-it.bu.edu Did you get your "not implemented" on a G or GT. If its on a G (as I suspect) how can you expect routines to be implemented that make only sense on the GT architecture (another example is smoothline())? I think its a good idea to allow you to use the calls, but to tell you that they don't work. Have a merry Christmas and a happy new year 89 Martin Knoblauch TH-Darmstadt Physical Chemistry 1 Petersenstrasse 20 D-6100 Darmstadt West-Germany BITNET: <XBR2D96D@DDATHD21>
jmb@patton.SGI.COM (Jim Barton) (12/22/88)
In article <8812200602.aa17057@SMOKE.BRL.MIL>, XBR2D96D@DDATHD21.BITNET (Knobi der Rechnerschrat) writes: > Hello Netlanders, > > I've a few questions about SGI's new GTX architecture. They are based > on the 3.1 release notes and a document called "IRIS GTX: A Technical > Report, Rev 2": > > - which type of CPU (16 MHZ or 25 MHZ) and how many of them do I need > to get the full graphics speed (100.000 Z-buffered 4-sided, G-shaded, > P-lighted, independent polygons). I ask this question, because one of > SGI's competitors (they have a vector/parallel-oriented Workstation > with up to 4 CPU's, Graphics computations done in the CPU) had to admit > (after applying some spanish inqusition tools) that they need 4 CPU's > to reach their maximum graphics performance and that there may exist > situations, where graphics can consume all resources of the system. ALL GTX class machines can reach full graphics performance with a single CPU driving the graphics. In a 4-popper, this means you get >3 CPU's of compute performance to use as you wish. (Unlike the competition, a GTX has 100 MFlops dedicated to graphics; the CPU performance is yours to use or abuse as you wish). Part of this is the result of a custom bus cycle and small block DMA facility which the processor uses to send geometry to the pipeline. We call this feature the "3-way-transfer". More below ... > - Chapter "8.2 Graphics Notes" in the 4D-3.1 release notes states that > some of the graphics routines (c3*, c4*, n3f, v2*, v3*, v4*) should be > called with quadword-aligned data to get full GTX performance. > Does this mean all the variables have to be "double" (which I don't > beleave) or that the first byte of a "float x[3]" vector has to start > on a quadword-address? In the latter case I only have to rearrange our > data-structures. As you surmised, the quadword alignment is just for the first byte of the data structure you are sending. The reason for doing this to get full performance is related to the 3-way-transfer and the MP backplane. As in most multiprocessors, memory data is transferred in large blocks for efficiency, and then cached at each CPU. The POWERSeries uses a 4-word (16-byte) cache line, which is also the basic unit of transfer to the graphics pipeline. The 3-way-transfer is designed to allow the programmer to lay out his data in an arbitrary way without alignment restrictions. Thus, if your vertex crosses a 4-word boundary, two bus cycles will be necessary to send the data (thus the "3-way": the first part of the data may come from cache or memory, and the second part may come from some other cache or memory, or the initiating CPU may own none of the data, in which case other cache(s) or memory will supply the data). [Sorry if this is confusing; remember that the POWERSeries uses write-back cacheing, so the "real" memory image is distributed between caches and memory.] Quad word aliging the vertex assures that the transfer happens in a single bus cycle, giving you the best performance (but remember, your code will still work, no matter how the data is aligned). > - does shademodel(FLAT) work again under 3.1? I hope so. -- Jim Barton Silicon Graphics Computing Systems "UNIX: Live Free Or Die!" jmb@sgi.sgi.com, sgi!jmb@decwrl.dec.com, ...{decwrl,sun}!sgi!jmb "I used to be disgusted, now I'm just amused." - Elvis Costello, 'Red Shoes' --