[comp.sys.sgi] Efficient use of lighting models

XBR2D96D@DDATHD21.BITNET (Knobi der Rechnerschrat) (10/25/90)

Hello,

  after having read the current discussion on the efficient use of
the lighting model (Scott Kahn/Kurt Akeley) I've got a question.

  Using a 70/GT (and other models) we know that multi-colored surfaces
(every vertex using a different material) are slower than uni-colored
surfaces (all verteces using the same material). Our current algorithm
for the multi-colored stuff looks alike:

  bgntmesh;
  for all vertices {
    if(newmat != oldmat) { lmbind(MATERIAL,newmat);oldmat=newmat;}
    /* the oldmat/newmat stuff is just to avoid unneccessary lmbind's */
    n3f(normals);
    v3f(coordinates);
    }
  endtmesh;

 The question arises wether it is faster to use just one material and
change its properties using lmcolor/cpack like this:

  lmcolor(LMC_DIFFUSE);
  lmbind(MATERIAL,template);
  bgntmesh;
  for all vertices {
    if(newcol != oldcol) { cpack(newcol);oldcol=newcol;}
    /* the oldcol/newcol stuff is just to avoid unneccessary cpack's */
    n3f(normals);
    v3f(coordinates);
    }
  endtmesh;
  lmcolor(LMC_COLOR);

 This of course would allow to change only one property (i.e DIFFUSE)
in this loop (two for the case of LMC_AD). If you HAVE TO change more
properties at once (and can't simulate that by changing one property)
I believe you have to insert lmcolor commands inside the loop. As I
understand Kurt, this is not desirable. I'm really interested in an
answer to this problem, as it eventually would force a desing-
decission for our software. We have also observed that the speed
differences between uni/multi-colored surfaces (using the first
algorithm) vary dramatically when using different graphics platforms.
Especially the VGX seems to have problems on that algorithm. Is this
observation true and is there an answer to this problem that covers
all SGI graphics machines (PI, PI/TG, GT, GTX, VGX)?

 A second question arises for the LMC_AD mode. How does it work? Does
it set AMBIENT and DIFFUSE to the same RGB-values?

 Next I have a suggestion. Very often you find combinations like:

   n3f(normals);
   v3f(coordinates);

I may overestimate the overhead for the function calls and the interaction
between CPU and graphics-pipeline, but would a combination of both
routines (call it nv3f) not result in some performance gain?

 Finally I've got a question concerning the memory alignment of normal
and coordinate data. From the release notes I know that for the GTX
this kind of data has to be quadword aligned to get best performance.
We are currently allocating normals and coordinates in onedimensional
arrays (float) of this form:

   x0,y0,z0,x1,y1,z1,....

and pass the adress of the xn-element to n3f/v3f. Should we better
allocate additional (dummy)-space for the w-elements to get the
best performance on the gtx? As this would mean 33% more memory usage
for vertices and normals, we like to avoid it if possible. How large
is the performance loss if one does not use quadword aligned data.
What are the effects on other machines (esp. VGX)?


  This mail has gotten longer than I've thought in the beginning.
"Sorry" to everybody who is not interested in the topic.

Regards
Martin Knoblauch

TH-Darmstadt
Physical Chemistry 1
Petersenstrasse 20
D-6100 Darmstadt, FRG

BITNET: <XBR2D96D@DDATHD21>

kurt@cashew.asd.sgi.com (Kurt Akeley) (10/30/90)

+  5520	Efficient use of lighting models             
XBR2D96D@DDATHD21.BITNETIn article <9010250912.aa15161@VGR.BRL.MIL>,
XBR2D96D@DDATHD21.BITNET (Knobi der Rechnerschrat) writes:
|> ...
|>   Using a 70/GT (and other models) we know that multi-colored surfaces
|> (every vertex using a different material) are slower than uni-colored
|> surfaces (all verteces using the same material). Our current algorithm
|> for the multi-colored stuff looks alike:
|> 
|>   bgntmesh;
|>   for all vertices {
|>     if(newmat != oldmat) { lmbind(MATERIAL,newmat);oldmat=newmat;}
|>     /* the oldmat/newmat stuff is just to avoid unneccessary lmbind's */
|>     n3f(normals);
|>     v3f(coordinates);
|>     }
|>   endtmesh;
|> 
|>  The question arises wether it is faster to use just one material and
|> change its properties using lmcolor/cpack ...

The answer is yes, it is and will continue to be faster to change a single
material property by using lmcolor mode than by changing materials.

|>  This of course would allow to change only one property (i.e DIFFUSE)
|> in this loop (two for the case of LMC_AD). If you HAVE TO change more
|> properties at once (and can't simulate that by changing one property)
|> I believe you have to insert lmcolor commands inside the loop. As I
|> understand Kurt, this is not desirable. I'm really interested in an
|> answer to this problem, as it eventually would force a desing-
|> decission for our software. We have also observed that the speed
|> differences between uni/multi-colored surfaces (using the first
|> algorithm) vary dramatically when using different graphics platforms.
|> Especially the VGX seems to have problems on that algorithm. Is this
|> observation true and is there an answer to this problem that covers
|> all SGI graphics machines (PI, PI/TG, GT, GTX, VGX)?

The intention of lmcolor is to allow the graphics system to expect a
LIMITED amount of extra data per vertex to modify the material properties.
The point is that we limit the data volume, not the complexity of the
requested material change.  If your algorithm really requires uncorrelated
change to multiple material properties, then rebinding materials is the
way to go, and will probably never be fast.  If the changes are related,
perhaps a new lmcolor mode should be defined.  I'd like to hear back on
this, though perhaps not over the net.

|>  A second question arises for the LMC_AD mode. How does it work? Does
|> it set AMBIENT and DIFFUSE to the same RGB-values?

Yes.  And ALPHA is set to the alpha value too.

|>  Next I have a suggestion. Very often you find combinations like:
|> 
|>    n3f(normals);
|>    v3f(coordinates);
|> 
|> I may overestimate the overhead for the function calls and the interaction
|> between CPU and graphics-pipeline, but would a combination of both
|> routines (call it nv3f) not result in some performance gain?

In the case where graphics performance is traversal limited a performance
increase could result.  On the VGX, for example, lighted vertexes could be
sent 10 to 20 percent faster with such a command.  We felt that this
improvement
was not worth the required user recoding effort.  Also, much real code
is either
transform limited (by lmcolor, for example) or fill limited, and would
not benefit from such an optimization.  We will continue to consider more
efficient interface commands, however.

|>  Finally I've got a question concerning the memory alignment of normal
|> and coordinate data. From the release notes I know that for the GTX
|> this kind of data has to be quadword aligned to get best performance.
|> We are currently allocating normals and coordinates in onedimensional
|> arrays (float) of this form:
|> 
|>    x0,y0,z0,x1,y1,z1,....
|> 
|> and pass the adress of the xn-element to n3f/v3f. Should we better
|> allocate additional (dummy)-space for the w-elements to get the
|> best performance on the gtx? As this would mean 33% more memory usage
|> for vertices and normals, we like to avoid it if possible. How large
|> is the performance loss if one does not use quadword aligned data.
|> What are the effects on other machines (esp. VGX)?

Here's the facts, make of them what you will.  Vertex data are transferred
to GTX and VGX graphics systems using special 3-way operations (see other
SGI publications for explanation).  A 3-way transfer takes 10 bus clocks to
complete if its data are quad-word aligned, 14 bus clocks otherwise.  Since
the bus clock is always 16 MHz, this translates into 1.6 million aligned
transfers per second, and 1.15 million unaligned transfers per second.
If both transform and fill limits support a call rate that is greater than
1.15 million calls per second (counting each c, n, v, and t call) then
quad-word alignment will improve performance.  This situation is common on
well tuned VGX code, somewhat less common on GTX code.

-- kurt

mace@lum.asd.sgi.com (Rob Mace) (10/30/90)

In article <9010250912.aa15161@VGR.BRL.MIL>, XBR2D96D@DDATHD21.BITNET (Knobi der Rechnerschrat) writes:
>  The question arises wether it is faster to use just one material and
> change its properties using lmcolor/cpack like this:
> 
>   lmcolor(LMC_DIFFUSE);
>   lmbind(MATERIAL,template);
>   bgntmesh;
>   for all vertices {
>     if(newcol != oldcol) { cpack(newcol);oldcol=newcol;}
>     /* the oldcol/newcol stuff is just to avoid unneccessary cpack's */
>     n3f(normals);
>     v3f(coordinates);
>     }
>   endtmesh;
>   lmcolor(LMC_COLOR);

Using lmcolor as you describe is the prefered way.  The VGX lmcolor speed
problem will be fixed.

>  A second question arises for the LMC_AD mode. How does it work? Does
> it set AMBIENT and DIFFUSE to the same RGB-values?

Yes, it sets AMBIENT and DIFFUSE to the same RGB-values.  To vary the
contribution of AMBIENT and DIFFUSE just adjust the amount of AMBIENT
light in your scene.

>  Finally I've got a question concerning the memory alignment of normal
> and coordinate data. From the release notes I know that for the GTX
> this kind of data has to be quadword aligned to get best performance.
> We are currently allocating normals and coordinates in onedimensional
> arrays (float) of this form:
> 
>    x0,y0,z0,x1,y1,z1,....
> 
> and pass the adress of the xn-element to n3f/v3f. Should we better
> allocate additional (dummy)-space for the w-elements to get the
> best performance on the gtx? As this would mean 33% more memory usage
> for vertices and normals, we like to avoid it if possible. How large
> is the performance loss if one does not use quadword aligned data.
> What are the effects on other machines (esp. VGX)?

It is important to quad word align data on both the GTX and VGX.  Given
your example you would want your data to look like the following.

   x0,y0,z0,dummy,nx0,ny0,nz0,dummy,x1,y1,z1,dummy,nx1,ny1,nz1,dummy,...

The preformance loss if you do not do this depends completely on how
complex your current drawing algorithm is (i.e. It will effect things
more if you are doing infinite as opposed to local lighting).

If you are coloring your data also the cpacks can fit in place of a dummy
and you will not be wasteing as much memory.

Rob Mace