XBR2D96D@DDATHD21.BITNET (Knobi der Rechnerschrat) (10/25/90)
Hello, after having read the current discussion on the efficient use of the lighting model (Scott Kahn/Kurt Akeley) I've got a question. Using a 70/GT (and other models) we know that multi-colored surfaces (every vertex using a different material) are slower than uni-colored surfaces (all verteces using the same material). Our current algorithm for the multi-colored stuff looks alike: bgntmesh; for all vertices { if(newmat != oldmat) { lmbind(MATERIAL,newmat);oldmat=newmat;} /* the oldmat/newmat stuff is just to avoid unneccessary lmbind's */ n3f(normals); v3f(coordinates); } endtmesh; The question arises wether it is faster to use just one material and change its properties using lmcolor/cpack like this: lmcolor(LMC_DIFFUSE); lmbind(MATERIAL,template); bgntmesh; for all vertices { if(newcol != oldcol) { cpack(newcol);oldcol=newcol;} /* the oldcol/newcol stuff is just to avoid unneccessary cpack's */ n3f(normals); v3f(coordinates); } endtmesh; lmcolor(LMC_COLOR); This of course would allow to change only one property (i.e DIFFUSE) in this loop (two for the case of LMC_AD). If you HAVE TO change more properties at once (and can't simulate that by changing one property) I believe you have to insert lmcolor commands inside the loop. As I understand Kurt, this is not desirable. I'm really interested in an answer to this problem, as it eventually would force a desing- decission for our software. We have also observed that the speed differences between uni/multi-colored surfaces (using the first algorithm) vary dramatically when using different graphics platforms. Especially the VGX seems to have problems on that algorithm. Is this observation true and is there an answer to this problem that covers all SGI graphics machines (PI, PI/TG, GT, GTX, VGX)? A second question arises for the LMC_AD mode. How does it work? Does it set AMBIENT and DIFFUSE to the same RGB-values? Next I have a suggestion. Very often you find combinations like: n3f(normals); v3f(coordinates); I may overestimate the overhead for the function calls and the interaction between CPU and graphics-pipeline, but would a combination of both routines (call it nv3f) not result in some performance gain? Finally I've got a question concerning the memory alignment of normal and coordinate data. From the release notes I know that for the GTX this kind of data has to be quadword aligned to get best performance. We are currently allocating normals and coordinates in onedimensional arrays (float) of this form: x0,y0,z0,x1,y1,z1,.... and pass the adress of the xn-element to n3f/v3f. Should we better allocate additional (dummy)-space for the w-elements to get the best performance on the gtx? As this would mean 33% more memory usage for vertices and normals, we like to avoid it if possible. How large is the performance loss if one does not use quadword aligned data. What are the effects on other machines (esp. VGX)? This mail has gotten longer than I've thought in the beginning. "Sorry" to everybody who is not interested in the topic. Regards Martin Knoblauch TH-Darmstadt Physical Chemistry 1 Petersenstrasse 20 D-6100 Darmstadt, FRG BITNET: <XBR2D96D@DDATHD21>
kurt@cashew.asd.sgi.com (Kurt Akeley) (10/30/90)
+ 5520 Efficient use of lighting models XBR2D96D@DDATHD21.BITNETIn article <9010250912.aa15161@VGR.BRL.MIL>, XBR2D96D@DDATHD21.BITNET (Knobi der Rechnerschrat) writes: |> ... |> Using a 70/GT (and other models) we know that multi-colored surfaces |> (every vertex using a different material) are slower than uni-colored |> surfaces (all verteces using the same material). Our current algorithm |> for the multi-colored stuff looks alike: |> |> bgntmesh; |> for all vertices { |> if(newmat != oldmat) { lmbind(MATERIAL,newmat);oldmat=newmat;} |> /* the oldmat/newmat stuff is just to avoid unneccessary lmbind's */ |> n3f(normals); |> v3f(coordinates); |> } |> endtmesh; |> |> The question arises wether it is faster to use just one material and |> change its properties using lmcolor/cpack ... The answer is yes, it is and will continue to be faster to change a single material property by using lmcolor mode than by changing materials. |> This of course would allow to change only one property (i.e DIFFUSE) |> in this loop (two for the case of LMC_AD). If you HAVE TO change more |> properties at once (and can't simulate that by changing one property) |> I believe you have to insert lmcolor commands inside the loop. As I |> understand Kurt, this is not desirable. I'm really interested in an |> answer to this problem, as it eventually would force a desing- |> decission for our software. We have also observed that the speed |> differences between uni/multi-colored surfaces (using the first |> algorithm) vary dramatically when using different graphics platforms. |> Especially the VGX seems to have problems on that algorithm. Is this |> observation true and is there an answer to this problem that covers |> all SGI graphics machines (PI, PI/TG, GT, GTX, VGX)? The intention of lmcolor is to allow the graphics system to expect a LIMITED amount of extra data per vertex to modify the material properties. The point is that we limit the data volume, not the complexity of the requested material change. If your algorithm really requires uncorrelated change to multiple material properties, then rebinding materials is the way to go, and will probably never be fast. If the changes are related, perhaps a new lmcolor mode should be defined. I'd like to hear back on this, though perhaps not over the net. |> A second question arises for the LMC_AD mode. How does it work? Does |> it set AMBIENT and DIFFUSE to the same RGB-values? Yes. And ALPHA is set to the alpha value too. |> Next I have a suggestion. Very often you find combinations like: |> |> n3f(normals); |> v3f(coordinates); |> |> I may overestimate the overhead for the function calls and the interaction |> between CPU and graphics-pipeline, but would a combination of both |> routines (call it nv3f) not result in some performance gain? In the case where graphics performance is traversal limited a performance increase could result. On the VGX, for example, lighted vertexes could be sent 10 to 20 percent faster with such a command. We felt that this improvement was not worth the required user recoding effort. Also, much real code is either transform limited (by lmcolor, for example) or fill limited, and would not benefit from such an optimization. We will continue to consider more efficient interface commands, however. |> Finally I've got a question concerning the memory alignment of normal |> and coordinate data. From the release notes I know that for the GTX |> this kind of data has to be quadword aligned to get best performance. |> We are currently allocating normals and coordinates in onedimensional |> arrays (float) of this form: |> |> x0,y0,z0,x1,y1,z1,.... |> |> and pass the adress of the xn-element to n3f/v3f. Should we better |> allocate additional (dummy)-space for the w-elements to get the |> best performance on the gtx? As this would mean 33% more memory usage |> for vertices and normals, we like to avoid it if possible. How large |> is the performance loss if one does not use quadword aligned data. |> What are the effects on other machines (esp. VGX)? Here's the facts, make of them what you will. Vertex data are transferred to GTX and VGX graphics systems using special 3-way operations (see other SGI publications for explanation). A 3-way transfer takes 10 bus clocks to complete if its data are quad-word aligned, 14 bus clocks otherwise. Since the bus clock is always 16 MHz, this translates into 1.6 million aligned transfers per second, and 1.15 million unaligned transfers per second. If both transform and fill limits support a call rate that is greater than 1.15 million calls per second (counting each c, n, v, and t call) then quad-word alignment will improve performance. This situation is common on well tuned VGX code, somewhat less common on GTX code. -- kurt
mace@lum.asd.sgi.com (Rob Mace) (10/30/90)
In article <9010250912.aa15161@VGR.BRL.MIL>, XBR2D96D@DDATHD21.BITNET (Knobi der Rechnerschrat) writes: > The question arises wether it is faster to use just one material and > change its properties using lmcolor/cpack like this: > > lmcolor(LMC_DIFFUSE); > lmbind(MATERIAL,template); > bgntmesh; > for all vertices { > if(newcol != oldcol) { cpack(newcol);oldcol=newcol;} > /* the oldcol/newcol stuff is just to avoid unneccessary cpack's */ > n3f(normals); > v3f(coordinates); > } > endtmesh; > lmcolor(LMC_COLOR); Using lmcolor as you describe is the prefered way. The VGX lmcolor speed problem will be fixed. > A second question arises for the LMC_AD mode. How does it work? Does > it set AMBIENT and DIFFUSE to the same RGB-values? Yes, it sets AMBIENT and DIFFUSE to the same RGB-values. To vary the contribution of AMBIENT and DIFFUSE just adjust the amount of AMBIENT light in your scene. > Finally I've got a question concerning the memory alignment of normal > and coordinate data. From the release notes I know that for the GTX > this kind of data has to be quadword aligned to get best performance. > We are currently allocating normals and coordinates in onedimensional > arrays (float) of this form: > > x0,y0,z0,x1,y1,z1,.... > > and pass the adress of the xn-element to n3f/v3f. Should we better > allocate additional (dummy)-space for the w-elements to get the > best performance on the gtx? As this would mean 33% more memory usage > for vertices and normals, we like to avoid it if possible. How large > is the performance loss if one does not use quadword aligned data. > What are the effects on other machines (esp. VGX)? It is important to quad word align data on both the GTX and VGX. Given your example you would want your data to look like the following. x0,y0,z0,dummy,nx0,ny0,nz0,dummy,x1,y1,z1,dummy,nx1,ny1,nz1,dummy,... The preformance loss if you do not do this depends completely on how complex your current drawing algorithm is (i.e. It will effect things more if you are doing infinite as opposed to local lighting). If you are coloring your data also the cpacks can fit in place of a dummy and you will not be wasteing as much memory. Rob Mace