[net.unix-wizards] Problems with alarm

johan@philmds.UUCP (johan) (05/22/84)

Due to a bug in the kernel it may happen that an alarm
signal/interrupt is processed after the execution of alarm(0).
This may cause problems as I will demonstrate.

Several system calls may last long/forever, like read/write to
a character device.
Bounding this period can be done by using alarm()/signal(), like in

	onalarm()
	{
		signal(SIGALRM, onalarm);
	A:
	}

	{
		...
		signal(SIGALRM, onalarm);
		alarm(2);
	B:
		i = read(fd, buf, sizeof(buf));
		alarm(0);
		if (i == -1 && errno == EINTR) {
			...  /* handle timeout */
		} else {
			...  /* handle data read */
		}
		...
	}

This scheme may fail on busy systems if your process is suspended for
several seconds at label B. This can be cured by restarting the timer
in the routine onalarm() by adding the statement

		alarm(2)

at label A.

All this is not new and used in many situations in existing code.
The problem is that even this last version is not behaving properly.
In all UNIX kernels I have seen it is possible that a SIGALRM signal
is processed whenever the alarm(0) returns from system to user mode.
This causes the timer to be restarted and may lead to failures of any
future interruptable system calls in that process.

A short description of the sequence of events that makes this happen:
 - the kernel is processing the alarm(0) system call
 - a clock interrupt occurs
 - the statement 'if (++lbolt >= HZ)' in clock() happens to be true
 - BASEPRI(ps) is 0, so p_clktim processing is done in clock()
 - for our process p_clktim happens to reach 0, so psignal() is called
 - the appropriate bit for SIGALRM in p_sig is set
 - alarm(0) processing is continued
 - just before returning to user mode p_sig is checked and will cause
   the SIGALRM signal to be processed.
BINGO.

Two possible solutions I can think of:

1 - Change the kernel to avoid this strange behaviour.
    The p_sig bit corresponding to SIGALRM must be reset for alarm(0).
    The statement:
	p->p_clktim = uap->deltat;
    must be replaced by
	if ((p->p_clktim = uap->deltat) == 0)
		p->p_sig &= ~(1<<(SIGALRM-1));
    Notice that this code is not surrounded by any spl()'s !!!!
    Adding a spl1()/spl0() pair may result in a more consistent return
    value from alarm().
    Always resetting p_sig is possible if spl()'s added.

2 - For the time being the above piece of program can be modified
    slightly. The trick is to maintain a variable indicating when the
    timer must be restarted in onalarm(). Clear this variable
    just before the alarm(0). The modified example looks like:

	int	keepticking;
	
	onalarm()
	{
		signal(SIGALRM, onalarm);
		if (keepticking)
			alarm(2);
	}

	{
		...
		signal(SIGALRM, onalarm);
		keepticking = 1;
		alarm(2);
		i = read(fd, buf, sizeof(buf));
		keepticking = 0;
		alarm(0);
		if (i == -1 && errno == EINTR) {
			...  /* handle timeout */
		} else {
			...  /* handle data read */
		}
		...
	}

    A suggested alternative:
	while (alarm(0) == 0)
		;
    fails because the return value can not be trusted unless the
    spl()'s are added.



NOTE: We haven't seen this problem in real life, but came across the
possibility during some discussions. It would be appreciated if anyone
struck by this phenomenon could confirm.

				Johan Stevenson,
				Philips S&I, T&M, PMDS,
				Building TQV-5,
				Eindhoven, The Netherlands.
				phone: +31 40 784736
				uucp: mcvax!philmds!johan

bruce@ism780.UUCP (05/25/84)

#R:philmds:-17900:ism780:14400008:000:182
ism780!bruce    May 24 11:59:00 1984

A much simpler solution is to insert right after the alarm(0) statement
the following statement:

	signal(SIGALRM, SIG_IGN);

Be sure to insert it after the alarm(0), not before it!

jm@wlbr.UUCP (05/26/84)

x
johan@philmds complains that bracketing a read/write with alarm()s
may fail because the SIGALRM may come in even after an alarm(0) call.

My customary way of preventing this is to ignore the SIGALRM before
calling alarm(0).

	onalarm()
	{
		signal(SIGALRM, onalarm);
		alarm(2);
	A:
	}

	{
		...
		signal(SIGALRM, onalarm);
		alarm(2);
	B:
		i = read(fd, buf, sizeof(buf));
ADDED LINE>>>>	signal(SIGALRM, SIG_IGN);
		alarm(0);
		if (i == -1 && errno == EINTR) {
			...  /* handle timeout */
		} else {
			...  /* handle data read */
		}
		...
	}

In this manner, even if the SIGALRM is given to the process, it will do
no harm.

		Jim Macropol
		{scgvaxd,ihnp4,trwrb,vortex}!wlbr!jm

andyb@dartvax.UUCP (Andy Behrens) (10/23/85)

Does anybody know why...

On XENIX 3.2e, the sleep routine calls signal, passing the routine to
execute on an alarm call, then calls alarm.  The routine to execute
mysteriously executes the following 286 code before returning:

	push di
	push si
	pop  si
	pop  di

Why?  I can't get timer traps to work.  Must I do something mysterious
like this?  I am getting segmentation violations on the Altos 2086.  I
am calling signal( SIGALRM, xxx ), where xxx is set to routine(), and
routine is:

	int routine() {
		flag = TRUE;
	}

P.S.  This works MOST of the time, but sometime produces a segmentation
violation.

P.P.S. I would dearly love to find someone who can answer such questions.  
Dealing with ALTOS is impossible!

					Andy Behrens

{astrovax,decvax,cornell,ihnp4,linus}!dartvax!andyb.UUCP
andyb@dartmouth.CSNET
andyb%dartmouth@csnet-relay.ARPA
RFD 1 Box 116, Union Village, Vt. 05043