smith@nextone.niehs.nih.gov ("Howard C. Smith") (04/04/91)
Speaking of performance... Does anyone have numbers as to the relative cost of particular GL calls? (for each machine in the 4D series). Maybe all normalized as a percentage of gconfig (presumably the most expensive). What I would like to see is a breakdown of timing : GL call cycles --------------------------------------------------------- gconfig 100000.00 v 12.00 swapbuffers 32.00 pnt 32.00 clear 34.00 etc etc Howard Smith smith@nextone.niehs.nih.gov
"dwilliam@larry.ATL.GE.COM"@andrew.dnet.ge.com (04/05/91)
"Howard C. Smith" <smith@nextone.niehs.nih.gov> writes: > Does anyone have numbers as to the relative cost of > particular GL calls? (for each machine in the 4D series). Maybe all > normalized as a percentage of gconfig (presumably the most > expensive). > > Howard Smith > smith@nextone.niehs.nih.gov > /* * this might be what you are looking for. * let me know if you make any interesting enhancements. * compile with: * cc -prototypes -acpp -O -s glbench.c -lm -lgl_s -lc_s -o glbench * * dan (dwilliams@atl.ge.com) * * GL benchmarking results sorted numerically for a 210GTX: * * swapbuffers : 61 calls per second * drawmode(OVER)/drawmode(NORMAL) : 3600 calls per second * winset(win1)/winset(win2) : 5900 calls per second * pushname/popname : 6000 calls per second * pushattributes/popattributes : 6100 calls per second * getcolor : 8100 calls per second * setlinestyle(1)/setlinestyle(0) : 20000 calls per second * setpattern(1)/setpattern(0) : 22000 calls per second * rot : 31000 calls per second * pushviewport/popviewport : 32000 calls per second * getpid : 34000 calls per second * reshapeviewport : 34000 calls per second * rotate : 41000 calls per second * scale : 70000 calls per second * translate : 70000 calls per second * linewidth(3)/linewidth(1) : 82000 calls per second * pushmatrix/popmatrix : 110000 calls per second * getgdesc(GD_XPMAX) : 290000 calls per second * color(1)/color(2) : 580000 calls per second * winset(win1) : 600000 calls per second * winget : 940000 calls per second * getorigin : 980000 calls per second * getsize : 980000 calls per second */ #define __EXTENSIONS__ #include <stdio.h> #include <math.h> #include <sys/types.h> #include <sys/time.h> #include <gl/gl.h> float testtime = 2.0; /* seconds each test should last */ int windowid1, windowid2; /* * raise <x> to the integer power <power> */ double powi (double x, int power) { double y; if (power < 0) for (y = 1.0; power < 0; power++) y /= x; else for (y = 1.0; power > 0; power--) y *= x; return (y); } /* * return <x> rounded to <digits> significant digits (float version) */ float fsignif (float x, int digits) { double sign, temp; if ((digits <= 0) || (x == 0.0)) return (0.0); sign = copysign (1.0, x); x = fabs (x); temp = powi (10.0, digits - (((-1.0 < x) && (x < 1.0)) ? 0 : 1) - (int) ftrunc (flog10 (x))); return ((float) copysign (rint (x * temp) / temp, sign)); } /* * return time (in seconds) between two timevals */ float elapsed (struct timeval *t1, struct timeval *t2) { return ((float) ((t2->tv_sec + t2->tv_usec / 1000000.0) - (t1->tv_sec + t1->tv_usec / 1000000.0))); } /* * do as little as possible without being optimized away */ void nothing (void) { volatile int x = 0; } /* * return a calls per second value for the input function <func> */ int calls (void (*func)(void)) { register int i, j, count; struct timeval start, stop; float nulltime, functime; void (*nullfunc)(); /* * determine number of times to call function */ for (j = 1; ; j *= 2) { gettimeofday (&start, (struct timezone *) NULL); for (i = 0; i < j; i++) func (); gettimeofday (&stop, (struct timezone *) NULL); functime = elapsed (&start, &stop) - nulltime; if (functime >= 0.5) { count = (int) (j * testtime / functime); break; } } /* * call a function which does nothing to get a tare */ nullfunc = nothing; gettimeofday (&start, (struct timezone *) NULL); for (i = 0; i < count; i++) nullfunc (); finish (); gettimeofday (&stop, (struct timezone *) NULL); nulltime = elapsed (&start, &stop); /* * time the function */ gettimeofday (&start, (struct timezone *) NULL); for (i = 0; i < count; i++) func (); finish (); gettimeofday (&stop, (struct timezone *) NULL); /* and subtract the tare */ functime = elapsed (&start, &stop) - nulltime; if (functime <= 0.0) { (void) fprintf (stderr, "bad time in calls\n"); return (-1); } /* * return calls per second rounded to two significant digits */ return ((int) fsignif ((count / functime + 0.5), 2)); } /* * functions to time */ void mycolor (void) { color (1); color (2); } void mydrawmode (void) { drawmode (OVERDRAW); drawmode (NORMALDRAW); } void mygetcolor (void) { (void) getcolor (); } void mygetgdesc (void) { (void) getgdesc (GD_XPMAX); } void mygetorigin (void) { int x, y; getorigin (&x, &y); } void mygetpid (void) { (void) getpid (); } void mygetsize (void) { int x, y; getsize (&x, &y); } void mylinewidth (void) { linewidth (3); linewidth (1); } void mypushatt (void) { pushattributes (); popattributes (); } void mypushmat (void) { pushmatrix (); popmatrix (); } void mypushname (void) { pushname (0); popname (); } void mypushview (void) { pushviewport (); popviewport (); } void myreshape (void) { reshapeviewport (); } void myrot (void) { rot (10.0, 'z'); } void myrotate (void) { rotate (100, 'z'); } void myscale (void) { scale (0.9, 0.9, 0.9); } void mysetlinestyle (void) { setlinestyle (1); setlinestyle (0); } void mysetpattern (void) { setpattern (1); setpattern (0); } void myswap (void) { swapbuffers (); } void mytranslate (void) { translate (10.0, 10.0, 10.0); } void mywinget (void) { (void) winget (); } void mywinset1 (void) { winset (windowid1); } void mywinset2 (void) { winset (windowid1); winset (windowid2); } /* * table of functions to time, with their names */ struct { char *name; void (*func)(); } table[] = { {"color(1)/color(2)", mycolor}, {"drawmode(OVER)/drawmode(NORMAL)", mydrawmode}, {"getcolor", mygetcolor}, {"getgdesc(GD_XPMAX)", mygetgdesc}, {"getorigin", mygetorigin}, {"getpid", mygetpid}, {"getsize", mygetsize}, {"linewidth(3)/linewidth(1)", mylinewidth}, {"pushattributes/popattributes", mypushatt}, {"pushmatrix/popmatrix", mypushmat}, {"pushname/popname", mypushname}, {"pushviewport/popviewport", mypushview}, {"reshapeviewport", myreshape}, {"rot", myrot}, {"rotate", myrotate}, {"scale", myscale}, {"setlinestyle(1)/setlinestyle(0)", mysetlinestyle}, {"setpattern(1)/setpattern(0)", mysetpattern}, {"swapbuffers", myswap}, {"translate", mytranslate}, {"winget", mywinget}, {"winset(win1)", mywinset1}, {"winset(win1)/winset(win2)", mywinset2}, }; main (int argc, char *argv[]) { int i; /* * possibly override the default number of seconds per test with a * command line value */ if (argc == 2) testtime = (float) atof (argv[1]); /* * do graphics setup */ foreground (); prefposition (0, 100, getgdesc (GD_YPMAX) - 101, getgdesc (GD_YPMAX) - 1); windowid1 = winopen ("glbench1"); doublebuffer (); gconfig (); prefposition (101, 201, getgdesc (GD_YPMAX) - 101, getgdesc (GD_YPMAX) - 1); windowid2 = winopen ("glbench2"); doublebuffer (); gconfig (); color (WHITE); deflinestyle (1, 0xAAAA); defpattern (1, 16, (unsigned short *) "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"); /* * step through the table */ for (i = 0; i < sizeof table / sizeof table[0]; i++) (void) fprintf (stdout, "%-35s : %7d calls per second\n", table[i].name, calls (table[i].func)); /* * leave noisily */ ringbell (); exit (0); } -- Dan Williams, Systems & Scientific Software, consultant to: GE Advanced Technology Labs | Internet: dwilliams@atl.ge.com 300 Route 38, Bldg. 145-1 | uucp: ...!mcnc!ge-rtp!atl.ge.com!dwilliams Moorestown, NJ 08057 | Voice: (609) 866-6220
bam@sgi.com (Brian McClendon) (04/05/91)
In article <9104041941.AA29344@ge-dab.GE.COM> "dwilliam@larry.ATL.GE.COM"@andrew.dnet.ge.com writes: >"Howard C. Smith" <smith@nextone.niehs.nih.gov> writes: >> Does anyone have numbers as to the relative cost of >> particular GL calls? (for each machine in the 4D series). Maybe all >> normalized as a percentage of gconfig (presumably the most >> expensive). >> >> Howard Smith >> smith@nextone.niehs.nih.gov >> > >/* > * this might be what you are looking for. > * let me know if you make any interesting enhancements. > * compile with: > * cc -prototypes -acpp -O -s glbench.c -lm -lgl_s -lc_s -o glbench > * > * dan (dwilliams@atl.ge.com) > * > * GL benchmarking results sorted numerically for a 210GTX: > * > * swapbuffers : 61 calls per second It's hard to derive a true cost for a GL routine when it involves the hardware gfx pipeline. Because the bottleneck can be deep in the pipe and lots of FIFO-ing inbetween, pixie/prof results _can_ be very misleading. If you write a benchmark prg (like glbench.c) and run the same primitive over and over, then you _should_ get a reasonable idea of the cost of a particular primitive (as long as you do a finish() to flush the pipe or do enough iterations that the depth of the pipe is insignificant). Unfortunately there are exceptions to the above. Swapbuffers & gsync wait for the next vertical retrace, so benchmarking them is difficult. I do know they each make a system call, but the whole routine shouln't take more than 100 usecs itself (leaving you 16.56... msec to draw at a 60hz framerate). Also, benchmarking mapcolor on some machines is difficult due to the way mapcolor was microcoded. Here are some real numbers for mapcolor performance. VGX: 31750 slots/sec GTX: 7400 G: 2200 PI: 4000 The problem with these is that their inverse is _not_ the cost of the routine on most machines because when inserted in a stream of unrelated cmds (that happen not to tickle the same bit of hardware) the cost may drop down to a usec or less. On a dumb frame buffer most of this would be very easy because there is only one processor, but on the VGX there can be 11, some in parallel, some in series. -- ---------------------------------------------------------------------------- Brian McClendon bam@rudedog.SGI.COM ...!uunet!sgi!rudedog!bam 415-335-1110 ----------------------------------------------------------------------------