paul@mecazh.UUCP (Paul Breslaw) (01/17/90)
This problem cropped up in the context of Xlib, but could equally apply to any Unix library. Hence the posting to more than one group. Our application (a CAM package on HP9000/3xx machines under HP-UX6.5 X11.R2) crashes sometimes when we handle a signal and return from the signal handler in a different context from the one in which the handler was entered. In other words we do a longjmp(3) from inside the handler. We found that this is an elegant way to design certain features into a program. [ Those of you who might want to argue this assertion read on. Those who are prepared to accept it can skip to the end of this []'ed bit. Our CAM package is a monolithic application running as a single process. Until Open Look or Motif is declared winner of the current X Look and Feel War, our application remains implemented using no tool kit, ie only pure Xlib calls. A user of our package can start a computation/display operation that might take a long time to complete. We wanted to allow him to hit a key to stop it, which would take him back to an earlier point in the dialogue. There are a large number of such long operations, so we needed a fairly general mechanism. We did not want to sprinkle calls to X arbitrarily in the code in the hope that they would provide a frequent enough poll. Neither did we want a signal handler to set a global flag and return normally, because that is simply the same polling problem in a different guise. You then have to sprinkle calls to check the global flag in the hope ... etc etc. So we had to have a signal handler to implement the required asynchronousness, and it had to exit abnormally to achieve its end. ] It is all the same, a pretty dangerous thing to do. This is especially so if the signal is allowed to interrupt any old bit of code that might be updating some data structure that is subsequently needed. And this, of course, is what happened when certain Xlib routines were interrupted. Now good old BSD and friends (like Ultrix and HP-UX) offer a number of means for dealing with the problem. 1. Interrupted system calls can be identified, and restarted when (if) the signal handler returns normally. 2. The application can be defensively programmed so that system calls which can be interrupted or partially completed are correctly handled. 3. Critical regions can be created with sigblock(2) and sigsetmask(2) providing DISABLE and ENABLE capabilities. Clearly 1 and 2 are fine for system calls, but useless for libraries. That leaves 3 - but whose responsibility is it to defend the data in the library - the implementor or the user? I suppose someone out there will cry `caveat emptor', but there are literally hundreds of X calls. How do I know which ones are critical and which ones not? If I bracket all the ones I use, I will end up with ugly code that runs slowly (remember it's two system calls per X call). Clearly this is a general problem, but I do not recall seeing anything about it on the net. Advice welcomed. Paul Breslaw. -- ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Paul Breslaw, Mecasoft SA, | telephone : 41 1 362 2040 Guggachstrasse 10, CH-8057 Zurich, | e-mail : mcsun!chx400!mecazh!paul Switzerland. | paul@mecazh.UUCP
gwyn@smoke.BRL.MIL (Doug Gwyn) (01/17/90)
In article <373@node17.mecazh.UUCP> paul@mecazh.UUCP (Paul Breslaw) writes: >This is especially so if the signal is allowed to interrupt any old bit of >code that might be updating some data structure that is subsequently needed. >And this, of course, is what happened when certain Xlib routines were >interrupted. >That leaves 3 - but whose responsibility is it to defend the data in the >library - the implementor or the user? >Clearly this is a general problem, but I do not recall seeing anything >about it on the net. The relevant properties are reentrancy and noninterruptibility. These issues were recognized by the various standardization groups. For example, ANSI C requires that signal() be invokable within any signal handler, and that a signal handler function terminate only via return, abort(), exit(), or longjmp(). IEEE 1003.1 adds a large number of ("system call") functions that are required to be invokable reentrantly or else block signals during their operation (so that reentrance is not possible). The X/Open Portability Guide adds chroot() to this list and imposes these constraints on abort(), exit(), and longjmp() (which are therefore hard to implement!). Note that stdio functions and other similar library functions were NOT so constrained, in order to avoid paying a run-time penalty on each use of these heavily-used functions. However, some vendors of multiprocessor implementations of UNIX have decided to go ahead and use semaphores to protect critical regions within such library functions, in order to prevent the kind of problem you encountered. Unless the specification of a library function states that it is safe to abort or reenter it, you the application programmer should take steps to avoid doing so.
barmar@think.com (Barry Margolin) (01/18/90)
In article <373@node17.mecazh.UUCP> paul@mecazh.UUCP (Paul Breslaw) writes: >That leaves 3 - but whose responsibility is it to defend the data in the >library - the implementor or the user? I think it *should* be the implementor's responsibility. However, given that most library implementors don't do so, it is effectively the user's responsibility. The best situation would be for library implementors to protect their critical regions. Next best would be for them to document which routines have critical regions, so that the caller can bracket calls to those routines with signal masks (unfortunately, this means that signals are masked for longer than they need to -- the critical region may be a small part of the library routine). Every routine for which such documentation doesn't exist must be assumed to have critical data, and cannot be aborted. In addition to maintaining consistent data structures, it's also necessary for library routines to clean up after themselves. For instance, if a subroutine opens and closes a file, that file should always be closed when the subroutine is exited. I'm primarily a Lisp programmer, and C and Unix (among others) are missing a really important facility for systems programming: UNWIND-PROTECT. This is a mechanism for insisting that a particular piece of code be run upon exiting a context, no matter how that context is exited (either by returning or by non-local transfer). When I was a Multics programmer we had a similar thing; a handler could be written for the "cleanup" condition, and the handler is run when a frame is exited via non-local transfer. In C it's possible to implement something like this using a setjmp/longjmp protocol, but it only works with cooperating routines; library routines won't obey the protocol, though. -- Barry Margolin, Thinking Machines Corp. barmar@think.com {uunet,harvard}!think!barmar
casey@gauss.llnl.gov (Casey Leedom) (01/19/90)
| From: paul@mecazh.UUCP (Paul Breslaw) | | [Mentions three possibilities to deal with interrupted library calls | leaving corrupted data structures, etc. behind when the signal catching | routine decides to head off elsewhere's via a longjmp(3).] You missed a couple of possibilities Paul. Barry Margolin mentions one which is by far the best [in my mind], but would require: 1. The addition on an ``unwind-protect'' code pointer to the C stack frame. 2. Modification of the C function entry and exit code to allocate and initialize the unwind-protect pointer to NULL on entry and execute the pointed to code if the unwind-protect pointer is non-NULL on exit. 3. A mechanism to manipulate this pointer (probably just a couple of macros.) 4. A change in the semantics of longjmp(3) to indicate that it calls each of the non-NULL unwind-protect code segments as it unwinds the stack Perhaps Barry will explain what happens when an interrupt happens while unwind-protect code is being executed ... The issue of what happens with respect to inlined functions is also interesting. Perhaps the presence of unwind-protect pointer manipulation should simply prevent a function from being inlined ... Another possibility which is perhaps a little more practical, given the inertia of language standards, is to implement the above outside of the regular C stack by providing a separate unwind-protect stack and routines to manipulate it. This would require large amounts of standardization effort and recoding to use it, but wouldn't require changing anything in the ANSI C standard. On a final note, work is now being done to look into the possibility of moving most of the signal facilities into user space. This would essentially reduce the expence of calling setsigvec, sigblock, etc. to a function call instead of a system call. The penalty would be that *all* signals would be delivered from the kernel to the signal dispatcher in user space even if they were blocked or ignored. It's expected that overall this should be a major win for most applications. If it is implemented, then the thought of putting signal blocking around critical sections in library routines isn't cause for quite so much queasiness .. Casey