jpl@allegra.UUCP (John P. Linderman) (11/24/84)
>In review: the subject is the hung uucico processes that we have here >at astrovax and godot. This is when running rtiuucp under 4.2 BSD. >A typical hang point is main()/conn()/Acuopn()/vadopn()/expect()/read(). >allegra has also reported such problems with honey danber under 4.2 BSD. To be more explicit, Phil Karn and I would occasionally find a hung honey danber uucico. The processes were not always in the same place. We found, after tweaking pstat to provide a little extra information, that the common theme was that the processes had an alarm pending, but with alarms masked off. Alarms are not masked off by honey dan-ber, so the problem appears to be a race somewhere in the 4.2 signal code, somehow failing to reset signals following a longjmp out of an alarm handler. We replaced alarm calls with a macro-sequence that explicitly set SIGALRM on in the signal mask before doing the alarm call. We haven't seen a hung uucico since. In summary, honey danber seems to be guilty only of exercising alarms rather more heavily than typical programs, thereby exposing some problems in the underlying 4.2 operating system. We didn't have problems with honey danber under 4.2, we exposed problems with 4.2 through honey danber. John P. Linderman Department of Alarming Errors allegra!jpl
chris@umcp-cs.UUCP (Chris Torek) (11/28/84)
Actually, there *is* a bug in the 4.2BSD sleep; it's just that the bug isn't a missing sigsetmask() but two missing calls (sigblock() and sigsetmask()). It is *always* *always* *always* a mistake to call sigpause() to await a signal when the new mask is trying to unblock a signal that has never been blocked. (Often it works, but it is still a mistake.) The reason is simple: if it's not blocked now, the signal might happen between the C ``sigpause()'' call and the actual entry into system code. System calls are not atomic operations (at user level) until kernel code is entered; that's *why* sigpause is *defined* as ``atomically set signal mask and await signal''. (Gee, maybe we should kludge up the kernel to gripe about sigpause()s that don't release any signals, giving the name of the offending program... :-) ) -- (This line accidently left nonblank.) In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (301) 454-7690 UUCP: {seismo,allegra,brl-bmd}!umcp-cs!chris CSNet: chris@umcp-cs ARPA: chris@maryland