[net.bugs.4bsd] Losing time?

gill (12/23/82)

I don't know if this bug has appeared in netnews before, but
here it is anyway:

	When the number of processes running or sleeping under 4.1 approaches
90, the system begins to loose clock ticks, resulting in lost time.

The bug is in sys/clock.c:

		for (;;) {
			s = spl7();
			if ((p1 = calltodo.c_next) == 0 || p1->c_time > 0)
				break;

should be:

		for (;;) {
			s = spl7();
			if ((p1 = calltodo.c_next) == 0 || p1->c_time > 0)
			{
				splx(s); 
				break;
			}


In the unfixed case, after hitting the end of the callout list,
the system would run the rest of the "soft" clock procedures at spl7,
preventing clock interrupts. In particular, if the number of processes
was large, the process priority recomputation would increase the time 
spent at spl7 past a 60th of a second, and bingo.

It is useful to add the following code, also in sys/clock.c to detect
such problems in the future:


	/*
	 * reprime clock
	 */

>	if (mfpr(ICCS) & ICCS_ERR)			<	NEW
>		printf ("\nLost Clock Interrupt\n");	<	CODE

	clkreld();

This simply checks the hardware error bit in the interval timer to
see if the last interrupt was not serviced before each reprime.

Credit goes to alice!ark for fixing this before I; a check of
the code on alice shows that it was already repaired there.

Sorry if this is a repeat; I've unfortunately not been keeping up with the
traffic on unix-wizards and the BSD bug groups.

	May all your anachronisms be creative,

		Gill Pratt

		...alice!gill OR gill@mc