mike@BRL.MIL (Mike Muuss) (01/11/90)
I have been running RT, BRL's parallel-processing ray-tracing code, on our 4D/240 and 4D/280 machines. I have noticed that there seems to be an unusual amount of time recorded by gr_osview (and regular osview) in the "system" category. When I am lucky, about 10% of all processors is consumed this way; when I am unlucky, about 60% of all processor time is consumed this way. Thanks to the superb DBX that SGI provides, I was able to isolate this activity to the library routine _hsetlock() calling the system call sginap(0). Very odd. I fussed around for a while, and eventually determined that the routine _hsetlock() only tries to acquire the hardware interlock 20 times (in a *very* tight loop) before giving up, and calling sginap(0). This constant of 20 would seem to be from <ulocks.h> variable _USDEFSPIN: #define _USDEFSPIN 20 /* default spin for lock */ Suspeciting the worst, I wrapped my calls to the library locking routines with my own spin-lock checking first, and got an ENORMOUS speedup -- virtually all the system time went away. I would therefore request that in the next IRIX release, either (a) the built-in constant be chosen so that the system call isn't performed until at least 1 microsecond of looping has passed, or (b) that this constant be user-settable, perhaps via the usconfig() call. I suppose that this should be sent to the hotline, but I'm working nights this week, so you get E-mail instead. Somebody at SGI please forward this to the right folk(s). Best, -Mike ----------- PS: For the curious, here is a chunk of the code I'm using in order to handle the locks on the SGI: #ifdef SGI_4D # include <sys/types.h> # include <sys/prctl.h> # include <ulocks.h> static char *lockfile = "/usr/tmp/rtmplockXXXXXX"; static usptr_t *lockstuff = 0; void RES_INIT(p) register int *p; { register int i = p - (&rt_g.res_syscall); ulock_t ltp; if( !rt_g.rtg_parallel ) return; if (lockstuff == 0) { (void)mktemp(lockfile); if( rt_g.debug & DEBUG_PARALLEL ) { if( usconfig( CONF_LOCKTYPE, _USDEBUGPLUS ) == -1 ) perror("usconfig CONF_LOCKTYPE"); } lockstuff = usinit(lockfile); if (lockstuff == 0) { fprintf(stderr, "RES_INIT: usinit(%s) failed, unable to allocate lock space\n", lockfile); exit(2); } } ltp = usnewlock(lockstuff); if (ltp == 0) { fprintf(stderr, "RES_INIT: usnewlock() failed, unable to allocate another lock\n"); exit(2); } *p = (int) ltp; lock_usage[i] = 0; } void RES_ACQUIRE(ptr) register int *ptr; { register int i = ptr - (&rt_g.res_syscall); if( !rt_g.rtg_parallel ) return; /* Attempt to reduce frequency of library calling sginap() */ if( lock_busy[i] ) { lock_spins[i]++; /* non-interlocked */ while( lock_busy[i] ) lock_waitloops[i]++; } ussetlock((ulock_t) *(ptr)); lock_busy[i] = 1; lock_usage[i]++; /* interlocked */ } void RES_RELEASE( ptr ) register int *ptr; { register int i = ptr - (&rt_g.res_syscall); if( !rt_g.rtg_parallel ) return; lock_busy[i] = 0; /* interlocked */ usunsetlock((ulock_t) *(ptr)); } #endif /* SGI 4D */ PPS: The 4D/280 is **fast**!