kevin@cscnj (Kevin Walsh) (11/15/90)
I am working with an application which runs on Amdahl's UTS 1.2, which is their port of UNIX system V. In this application there are numerous instances where a blocking read is initiated on a message queue using "msgrcv" and a time-out is implemented using the alarm () and signal () functions. The typical scenario is like this: msg_timeout () { timed_out = 1; } . . . timed_out = 0; signal (SIGALRM, msg_timeout); alarm (5); msgrcv (q_id, buffer_addr, q_size, 0, 0); alarm(0); signal (SIGALRM, SIG_IGN); if (timed_out) { /* handling for time-outs */ } . . . Most of the time, everything works as expected; if the timer expires the blocking read is interrupted and the alarm clock is cleared, otherwise an incoming message is read and the blocking read returns and again the alarm clock is cleared. In both cases the code check if the time-out flag has been set by the function handling the SIGALRM signal. The problem is that sometimes (an not always in the same module), the alarm clock appears to expire at the instant it is being set. When this happens, the alarm expires before the blocking read is even initiated. The result is that the call to "msgrcv" block with no time-out -- until a message is received. We have verified that this is happening by setting calling time() at various points during this processing and printing the timestamps when the alarm is set, when the signal-handling function is entered, and when the "msgrcv" function has returned. The timestamps printed are all the same time, to the second. I have posted this in the hope that someone else out there may have had or heard of this type of problem. Any comments or proposed solutions would be appreciated. Thanks, Kevin -- --------- Kevin Walsh Computer Sciences Corporation Piscataway, NJ 08854 ..!rutgers.rutgers.edu!cscnj!kevin
kc@cbnewsl.att.com (keith.coulson) (11/15/90)
In article <1990Nov14.163443.4991@cscnj>, kevin@cscnj (Kevin Walsh) writes: > I am working with an application which runs on Amdahl's UTS 1.2, which is > their port of UNIX system V. In this application there are numerous instances > where a blocking read is initiated on a message queue using "msgrcv" and a > time-out is implemented using the alarm () and signal () functions. The typical > scenario is like this: > > msg_timeout () > { > timed_out = 1; > } > . > . > . > > timed_out = 0; > signal (SIGALRM, msg_timeout); > alarm (5); > > msgrcv (q_id, buffer_addr, q_size, 0, 0); > > alarm(0); > signal (SIGALRM, SIG_IGN); > > if (timed_out) { > /* handling for time-outs */ > } > . > . > . > > Most of the time, everything works as expected; if the timer expires the > blocking read is interrupted and the alarm clock is cleared, otherwise an > incoming message is read and the blocking read returns and again the alarm > clock is cleared. In both cases the code check if the time-out flag has been > set by the function handling the SIGALRM signal. > > The problem is that sometimes (an not always in the same module), the alarm > clock appears to expire at the instant it is being set. When this happens, > the alarm expires before the blocking read is even initiated. The result is > that the call to "msgrcv" block with no time-out -- until a message is received. Looks like you are context switching after "alarm (5)" and your process does not get back in until after the alarm expires. This is one of several race conditions associated with System V signals. I havent tried it but this might help ... msg_timeout () { msgflg = IPC_NOWAIT; } . . . msgflg = 0; signal (SIGALRM, msg_timeout); alarm (5); msgrcv (q_id, buffer_addr, q_size, 0, msgflg); /* use msgflg */ if (errno == EINTR || errno == ENOMSG) { /* handling for time-outs */ } else { alarm(0); signal (SIGALRM, SIG_IGN); } . . . This will cause msgrcv to use IPC_NOWAIT if the timer expires before it, and hence it will not block if there is no message. It wont fix the problem altogether but it should reduce its occurence a lot. You could also try raising the process priority for the alarm and msgrcv calls.