[net.unix-wizards] bug with setitimer on 4.2 BSD?

mr-frog@sdcsvax.UUCP (03/30/84)

:
I've been experimenting recently with the system call setitimer,
and I was quite distressed to find that after several successive
calls (around 300 or so) it stopped working!  What the call is supposed
to do is to send the process an alarm (SIGALRM) when an appropriate
number of microseconds have passed.  It worked just fine for a while,
but at some (seemingly random) point it simply doesn't send that
SIGALRM, and the program waiting for it hangs.

Am I using setitimer properly?

Here is the code I ran to test and duplicate the problem:
/*************************************************************************/
#include <stdio.h>
#include <sys/time.h>
#include <signal.h>

struct	itimerval timer;
int		wakeup;

main() {
	int	alarmtrap();
	register int i;

	signal(SIGALRM, alarmtrap);
	timer.it_interval.tv_sec = 0;
	timer.it_interval.tv_usec = 0;
	timer.it_value.tv_sec = 0;
	wakeup = 0;
	for (i=0; i<1000; i++) {
		timer.it_value.tv_usec = 30000;
		setitimer(ITIMER_REAL, &timer, NULL);
		sigpause(0);
		printf("alarm %d\n", wakeup);
	}
}

alarmtrap() {
	wakeup++;
}
/*******************************************************************/


Dave Pare
ucbvax!sdcsvax!mr-frog

muller@sdccsu3.UUCP (03/31/84)

There is not a problem with setitimer on 4.2, but in fact this is a classic
critical section problem. The important issue in this example is the period
that ITIMER_REAL is small: 30 millisecs. The call to setitimer causes a 
context switch to the kernel to start the counter. The counter is set and
another context switch is performed back to the program. Once back in the
program the sigpause is executed by performing another context switch to
the kernal to wait for the signal from the counter. This is fine for time
period values which are large in relation to the time required for the two
context switches (from the kernal to the program after the timer was set,
and from the program back to the kernel to execute the sigpause). However
if the signal is delivered BEFORE the sigpause is executed, the handler is
invoked, control flow returns to the main program, executes the sigpause,
and the program waits forever for a signal that will never arrive! (The 
current ITIMER_REAL has expired).

As with every critical section type problem you can never assume any
"rate" at which a program executes. The "real time" that it takes the
system to complete a context switch must be considered an unpredictable
value. So what must be done is to guard against (prevent) the SIGALRM
from being delivered after the ITIMER_REAL is set and before the sigpause
is executed. On way is to use sigblock to block off SIGALRM. Now sigblock
returns the old signal mask of blocked signals. Sigpause takes an arguement
of a mask of signals that are blocked. What sigpause does nicely is
to provide an ATOMIC operation (in one single context switch) that waits
for a signal using the signal mask provided as an arguement. When an
unmasked signal arrives the signal mask that exsisted before the call to
sigpause is restored. (Note: I would not even attempt to do this with 
System V unix).

So to make the program work you block off SIGALRM before the for loop
saving the signal mask that is returned by sigblock. Now when setitimer
returns no matter how long the context switches take the SIGALRM is blocked.
So even if SIGALRM is pending, the sigpause is guarenteed to execute. The
old mask returned by sigblock is used as the signal mask that sigpause uses
to wait for the SIGALRM (that mask does NOT have SIGALRM masked). This
provides protection of the "critical section" (the code between the setitimer
and the sigpause). 

The following code is one way to protect the sigpause.

/*******************************************************************/
#include <stdio.h>
#include <sys/time.h>
#include <signal.h>

struct	itimerval timer;
int	wakeup;

main(argc, argv)
int argc;
char **argv;
{
	int	alarmtrap();
	register int i;
	int oldmask;

	signal(SIGALRM, alarmtrap);
	timer.it_interval.tv_sec = 0;
	timer.it_interval.tv_usec = 0;
	timer.it_value.tv_sec = 0;
	wakeup = 0;
	/*
	 * block off that nasty SIGALRM
	 */
	oldmask = sigblock(1 << (SIGALRM-1));
	for (i=0; i<1000; i++) {
		timer.it_value.tv_usec = 30000;
		setitimer(ITIMER_REAL, &timer, 0);
		/*
		 * crical section now protected from SIGALRM
		 */
		sigpause(oldmask);
	}
}

alarmtrap() {
	wakeup++;
	fprintf(stderr,"alarm %d\n", wakeup);
}



			Keith Muller
			UCSD Computer Center
			ucbvax!sdcsvax!sdccsu3!muller

zemon@felix.UUCP (04/09/84)

I had the same problem with the interval timer under 4.2.
Sometimes the ALRM signal just failed to arrive.  Anyone
with solutions to this problem, please send them to me also.

	Art Zemon
	FileNet Corp.
	...!{decvax, ucbvax}!trwrb!felix!zemon
	(714)966-2344