gdb@hare.udev.cdc.com (Jerry Branham) (05/30/90)
> Does anyone have a handle on the level of locking that is necessary in > the kernel in order to support mp hardware? Otherwords, at what point > does it stop paying off to go from a single threaded kernel to where > every shared structure is controlled by a spin lock. Many programmers are addressing an mp problem with the number of processors being very large - ~250 processors. I meant it for around 10 or less. Programmers seem to agree that the master/slave approach was not useful for a production system. But the degree of fine grain locking necessary to avoid contention was not spelled out. In some cases, atomic operations were useful. You don't want to put spin locks on all structures. Some structures need more than one lock, some don't need more than one per group. Good advice was - "Think, measure, read, measure, think, then code". Many comments where base on intuition and talked about probable collisions and interlocks on table entries rather than the entire table. In general, there was no specific numbers about improvements on performance of specific bench- marks when a kernel was more finely interlocked. Many studies were done on wait times etc. but did not related this to total system throughput. For example, if a number of cpus have to wait on a structure and the disk is completely saturated anyway, there is no loss in the time for the job mix as long as the CPUs are available to CPU saturated jobs (monitor interlocked tasks that can sleep). > > 1) Based on your answer was such a low level of granularity found to be > necessary from benchmark experience or from programmer experience and > knowledge of the code? Both, lock contention studies were done to see which large grain locks needed to be refined. > 2) In order to demonstrate the advantage of low level locking are there > any benchmarks or instructions to produce benchmarks that will make a > high level interlocked kernel perform badly while a lower level of > granularity kernel will run faster - cpu usage and/or real time? Most answers were parallel makes. Real parallel programs were NOT used. Dining philosophers' problem in ADA (I am not familiar with this one) was recommended. Also, terminal I/O if you are running more than one line at high speed. Certain special benchmarks were run to test specific parts of the kernel. One person said that "People from Encore may have published on this subject." Kernel profiling should use a microsecond clock and cannot be excluded even by spl6() or spl7() calls. Thanks for the comments and of course, coarse is spelled coArse (sorry about that). (I speak for me only, etc.) Jerry Branham (612) 482-3853 e-mail gdb@kronos.udev.cdc.com
aglew@dwarfs.csg.uiuc.edu (Andy Glew) (05/31/90)
>Kernel profiling should use a microsecond clock and cannot be >excluded even by spl6() or spl7() calls. But if the timer sampling tick excluded by some spl level, you can still get useful information: if you are using flat profiling (instead of per-procedure gprof style profiling) the blocked timer ticks accumulate at the splx() or spl0() that unblocks the high spl. So, while you don't get too much detail, you can at least tell which (end of a) critical section took up the time. (Of course, if you unblock only on return to the user you lose.) -- Andy Glew, aglew@uiuc.edu