pardo@june.cs.washington.edu (David Keppel) (04/05/88)
According to both my Springer Verlag C book and to Chris Torek, the dpANS document says that longjmp() out of nested signal handlers is undefined. I would like to know why this is; the reason is very non-obvious to me. This seems (possibly) like a really common occurance. For instance in "vi", ^C is supposed to take you back to the main loop from wherever you are, and it uses a longjmp() to do this (I think). When "autowrite" is set, sending a ^Z causes changes to be written before "vi" is suspended. Sending ^C during ^Z is a well-defined operation and appears to be implemented by longjmp() out of nested signal handers, and would be broken by the new standard. This is at minimum a bothersome restriction from the programmers' view. I assume that there is some good reason for this restrcition either from the compiler-writer's view or because of some hardware organizations. Somebody locally suggested that it might have something to do with machines that use a seperate stack for signal handlers, but so far neither of us has had any ideas about why this (or anything else, for that matter) would cause such a restriction. Doug Gwyn suggests that this might have something to do with trying to wrap multiple levels of trampoline code around longjmp(), but doesn't have the gory details available. So far only Doug Gwyn and Chris Torek have responded to my first posting. If you understand, please write! ;-D on ( Coming soon: ping-pong code ) Pardo pardo@june.cs.washington.edu ..!ucbvax!uw-beaver!uw-june!pardo
greg@csanta.UUCP (Greg Comeau) (04/07/88)
In article <4609@june.cs.washington.edu> pardo@uw-june.UUCP (David Keppel) writes: > >According to both my Springer Verlag C book and to Chris Torek, the >dpANS document says that longjmp() out of nested signal handlers is >undefined. I would like to know why this is; the reason is very >non-obvious to me. The problem that occurs with longjmp() is not within the function itself, but with any side-effects that may occur because of the longjmp(). (This BTW, only happens to be spelled out in dpANSI, but has been true even without that being said since its first implementations). For instance, within library functions that you don't have source code to (or even your own routines that you do have source code to), a given routine may be setting an external variable for later use or maybe let's say as a semaphore. If a longjump occurs after the setting of the variable but before it's put to any use, then you're in trouble. Another quite obtuse reason is that the corresponding setjmp() may be called in a line of code where it was part of a subexpression. Yickie poo for that one! And of course, their is the always problamatic code that longjmp's back to a routine that had made use of some register variable, variable that were made into registers by the compiler or jumping to a routine that has already returned! This is all about the dangers of longjmp in general though and does not address nested signal handlers specifically, although this does present problems for them. Especially when it involves global variables or ensuring that some event has completed before processing the second signal (regardless of whether it is the same or not). The main gist though is that a signal handler should do what it has to do as quickly as possible. Also of concern is the way signal are handled on a given machine. Whatever particular things it does to handle the interrupt must be able to be reversable for normal return of the interrupt. Allowing longjmp's to occur in a nested handler could make the signal cleanup a real mess.
weiser.pa@xerox.com (04/07/88)
Well, what does the standard say about the values of register variables after a longjmp is taken? The 4.3bsd Vax implementation of longjmp carefully unwinds the stack and restores the registers to their values when control left the procedure doing the setjmp, which has the effect of preserving assignments to register variables. This makes register variables in the setjmping procedure behave like other variables in the setjmping procedure. The 3.X SunOS 68020 longjmp does not restore register values (although the 3.X SunOS Sparc longjmp does). We needed a longjmp that would restore register values for our port of Cedar to the Suns. So we wrote a stack unwinding longjmp. This was not easy, because... The calling convention used on the 68020 by Sun (and I presume others) does not place onto the stack anyplace information about how many registers were saved, or where. The registers are not always saved at the same offset from the frame pointer, for instance. So, our unwinder has to, for each procedure in the stack: (1) find the PC value (fortunately this is in a standard place), (2) figure out the first instruction in the procedure from the current PC, (3) examine the first few instructions to figure out what registers were saved where, (4) recover the register values. The hard step? Number 2. It requires keeping around for each procedure the starting and ending PC values. We considered trying to look at the call instruction to figure out what it was calling, but the 68020 instruction set doesn't enable us to unambiguously distinguish between a call through a register value and a call immediate. Now, what happens if there are signal handlers on the stack, and one wants to do this unwinding? Well, signal handlers don't look like procedure calls exactly: in SunOS, at least, the kernel throws some junk on the stack, and then proceeds to build a valid procedure frame. One cannot just return through that junk. So our unwinder notices when it is about to unwind through _sigtramp, picks a couple of registers out of the junk and skips the rest, and then keeps going. It could be complications like this which cause the longjmp standard to punt on returning through nested signal handlers. But I'd prefer to see a standard say that it must be possible to do it, because it is clearly a good thing to be able to do. -mark
mishkin@apollo.uucp (Nathaniel Mishkin) (04/11/88)
In article <122@csanta.UUCP> greg@csanta.UUCP (Greg Comeau) writes: >In article <4609@june.cs.washington.edu> pardo@uw-june.UUCP (David Keppel) writes: >The problem that occurs with longjmp() is not within the function itself, >but with any side-effects that may occur because of the longjmp(). >(This BTW, only happens to be spelled out in dpANSI, but has been true >even without that being said since its first implementations). For instance, >within library functions that you don't have source code to (or even your own >routines that you do have source code to), a given routine may be setting an >external variable for later use or maybe let's say as a semaphore. If a >longjump occurs after the setting of the variable but before it's put to any >use, then you're in trouble. > >Another quite obtuse reason is that the corresponding setjmp() may be >called in a line of code where it was part of a subexpression. Yickie >poo for that one! Apollo defines a package called PFM (Process Fault Manager) to deal with this sort of problem. The two primitives relevant to this discussion are "pfm_$cleanup" and "pfm_$signal". They are analogous to "setjmp" and "longjmp" except they are "stacked". Basically, you use them like: boolean SomeImportantStateVariable = false; foo() { pfm_$cleanup_rec crec; status = pfm_$cleanup(crec); if (status.all != pfm_$cleanup_set) { /* first time? */ SomeImportantStateVariable = false; /* No, restore state */ pfm_$signal(status); /* resignal */ } else { SomeImportantStateVariable = true; /* ... Do some important stuff ... */ SomeImportantStateVariable = false; pfm_$rls_cleanup(crec); } } (Lisp people should recognize this as something like UNWIND-PROTECT.) "pfm_$cleanup" returns the first time with the constant value "cleanup set" (ala "setjmp" returning 0). It returns the second time with the integer value "thrown" by a "pfm_$signal". "pfm_$signal" causes a long jump to the site of the most recent "pfm_$cleanup". "pfm_$signal" can be called explicitly (like "setjmp"). Also, The various Unix signals are automatically turned into calls to "pfm_$signal" if no signal handler exists. Cleanup handlers (the term for the "then" clause of the above "if" statement) can either choose to resignal by calling "pfm_$signal" or eat the signal and continue process as that level. Generally, you're supposed to resignal unless you recognize the signal that was thrown. I won't make any argument for this being syntactically "pretty", but it is at least conceptually the right thing. (I'd like language support for exception handling, but I don't get to pick these things.) As part of making Apollo's Network Computing System (NCS) portable, I had to deal with making a portable subset of PFM. NCS supports a remote procedure call (RPC) facility and depends on the above cleanup mechanism. When you make a call to a remote procedure, if the target of the call doesn't respond, NCS raises an exception using PFM. The remote call looks syntactically like a local call (you're calling a local stub) so even if we thought it was the right thing to indicate call failure by returning some "error status" (we don't), we can't since we don't get to pick the signature of the remote procedure. Checking global status variables (like "errno") for failure indications is also forbidden if you ever want your software to work in an environment where there are multiple threads of control per address space (we do). It turns out that implementing the necessary parts of PFM on vanilla Unix systems via "setjmp/longjmp" was pretty trivial. (We're talking 300 lines of code here.) Really, all PFM amounts to is a consistent and disciplined use of "setjmp/lonjmp". I think I can post the source (to some appropriate group) if anyone expresses interest. By the way, the vanilla Unix implementation of PFM does in fact depend on the "yicky poo" use of a "setjmp" inside an expression. "pfm_$cleanup" is: #define pfm_$cleanup(crec) \ pfm_$_cleanup(setjmp(crec.buf), &crec) So far, I've found two compilers/runtimes that don't handle this right. For them, I write "pfm_$cleanup" as: #define pfm_$cleanup(crec) ( \ pfm_$global_setjmp_value = setjmp(crec.buf), \ pfm_$_cleanup(pfm_$global_setjmp_value, &crec) \ ) at the cost of introducing a yicky poo global variable. (Hey, I can't fix *all* of Unix's problems at once!) -- -- Nat Mishkin Apollo Computer Inc. Chelmsford, MA {decvax,mit-eddie,umix}!apollo!mishkin