[comp.unix.internals] alarm

kevin@cscnj (Kevin Walsh) (11/15/90)

I am working with an application which runs on Amdahl's UTS 1.2, which is
their port of UNIX system V. In this application there are numerous instances
where a blocking read is initiated on a message queue using "msgrcv" and a
time-out is implemented using the alarm () and signal () functions. The typical
scenario is like this:

msg_timeout ()
{
    timed_out = 1;
}
	.
	.
	.

	timed_out = 0;
	signal (SIGALRM, msg_timeout);
	alarm (5);

	msgrcv (q_id, buffer_addr, q_size, 0, 0);

	alarm(0);
	signal (SIGALRM, SIG_IGN);

	if (timed_out) {
	/* handling for time-outs */
	}
	.
	.
	.

Most of the time, everything works as expected; if the timer expires the
blocking read is interrupted and the alarm clock is cleared, otherwise an
incoming message is read and the blocking read returns and again the alarm 
clock is cleared. In both cases the code check if the time-out flag has been
set by the function handling the SIGALRM signal. 

The problem is that sometimes (an not always in the same module), the alarm
clock appears to expire at the instant it is being set. When this happens,
the alarm expires before the blocking read is even initiated. The result is
that the call to "msgrcv" block with no time-out -- until a message is received.
We have verified that this is happening by setting calling time() at various 
points during this processing and printing the timestamps when the alarm is
set, when the signal-handling function is entered, and when the "msgrcv" 
function has returned. The timestamps printed are all the same time, to the
second.

I have posted this in the hope that someone else out there may have had or 
heard of this type of problem. Any comments or proposed solutions would be
appreciated. 

Thanks, Kevin

-- 
---------
Kevin Walsh  Computer Sciences Corporation  Piscataway, NJ  08854
..!rutgers.rutgers.edu!cscnj!kevin

kc@cbnewsl.att.com (keith.coulson) (11/15/90)

In article <1990Nov14.163443.4991@cscnj>, kevin@cscnj (Kevin Walsh) writes:
> I am working with an application which runs on Amdahl's UTS 1.2, which is
> their port of UNIX system V. In this application there are numerous instances
> where a blocking read is initiated on a message queue using "msgrcv" and a
> time-out is implemented using the alarm () and signal () functions. The typical
> scenario is like this:
> 
> msg_timeout ()
> {
>     timed_out = 1;
> }
> 	.
> 	.
> 	.
> 
> 	timed_out = 0;
> 	signal (SIGALRM, msg_timeout);
> 	alarm (5);
> 
> 	msgrcv (q_id, buffer_addr, q_size, 0, 0);
> 
> 	alarm(0);
> 	signal (SIGALRM, SIG_IGN);
> 
> 	if (timed_out) {
> 	/* handling for time-outs */
> 	}
> 	.
> 	.
> 	.
> 
> Most of the time, everything works as expected; if the timer expires the
> blocking read is interrupted and the alarm clock is cleared, otherwise an
> incoming message is read and the blocking read returns and again the alarm 
> clock is cleared. In both cases the code check if the time-out flag has been
> set by the function handling the SIGALRM signal. 
> 
> The problem is that sometimes (an not always in the same module), the alarm
> clock appears to expire at the instant it is being set. When this happens,
> the alarm expires before the blocking read is even initiated. The result is
> that the call to "msgrcv" block with no time-out -- until a message is received.

Looks like you are context switching after "alarm (5)" and your process
does not get back in until after the alarm expires.  This is one of several
race conditions associated with System V signals.

I havent tried it but this might help ...

 msg_timeout ()
 {
     msgflg = IPC_NOWAIT;
 }
	.
	.
	.

	msgflg = 0;
	signal (SIGALRM, msg_timeout);
	alarm (5);

	msgrcv (q_id, buffer_addr, q_size, 0, msgflg); /* use msgflg */

 	if (errno == EINTR || errno == ENOMSG) {

		/* handling for time-outs */
 	}
	else {
		alarm(0);
		signal (SIGALRM, SIG_IGN);
	}
	.
	.
	.

This will cause msgrcv to use IPC_NOWAIT if the timer expires before it,
and hence it will not block if there is no message.
It wont fix the problem altogether but it should reduce its occurence
a lot.

You could also try raising the process priority for the alarm and
msgrcv calls.