campbell@redsox.bsw.com (Larry Campbell) (10/08/90)
We have implemented a portable (or so we thought) exception handling facility for C. In order to allow exception handlers to have the same scope as the code being guarded, we used setjmp/longjmp instead of ssignal. However, the ambiguous definition of setjmp/longjmp is giving us heartburn. Consider the following code: ---------------------------------------- 1 { 2 int x; 3 x = 0; 4 if (! setjmp(foo)) 5 { 6 x = 1; 7 foo(); 8 } 9 else 10 { 11 printf(x = %d\n", x); 12 } 13 } ---------------------------------------- If foo() calls longjmp, the value of x when line 11 gets executed appears to be undefined (I don't have a copy of the ANSI standard, but I've checked about eight compiler manuals; most say it's undefined, or undefined if x isn't declared volatile). In the three compilers I've tested that claim ANSI compliance, declaring x to be volatile yields the desired result (x = 1). In the non-ANSI compilers, disabling optimization yields the desired result, but enabling optimization usually yields x = 0. I've never seen any value for x other than 0 or 1. My real question is this: Why not define the behavior of setjmp/longjmp so that the values of ALL local variables are defined, whether or not they've been allocated to registers? Otherwise, setjmp/longjmp are significantly less useful. For what it's worth, it seems to me that the description of setjmp/longjmp in K&R 2 does imply that x should have the value 1; is this an area of disagreement between K&R and ANSI? -- Larry Campbell The Boston Software Works, Inc. campbell@redsox.bsw.com 120 Fulton Street wjh12!redsox!campbell Boston, MA 02109
poser@csli.Stanford.EDU (Bill Poser) (10/08/90)
In article <1597@redsox.bsw.com> campbell@redsox.bsw.com (Larry Campbell) writes: >Consider the following code: > 4 if (! setjmp(foo)) > 5 { > 6 x = 1; > 7 foo(); > 8 } I agree that it is unfortunate that setjmp does not save non-register locals, but this code is wrong. The argument to setjmp is a jmpbuf structure, not a function.
henry@zoo.toronto.edu (Henry Spencer) (10/08/90)
In article <1597@redsox.bsw.com> campbell@redsox.bsw.com (Larry Campbell) writes: >My real question is this: Why not define the behavior of setjmp/longjmp so >that the values of ALL local variables are defined, whether or not they've >been allocated to registers? ... Because it is painful to implement in certain situations, and there are many existing compilers that punt said situations as a result. One would really like longjmp to act much like a multi-level return. This is hard, because there may be saved register values on the stack which would have to be restored. If the format of the stack frame is fixed (pdp11) or self-describing (VAX), this is easy enough... but on modern machines you have neither of those happy situations, and it can be arbitrarily hard to figure out which parts of the stack represent values that should be put back into registers. (You don't want to incur overhead on every function call just because somebody might someday call longjmp.) ANSI C puts enough constraints on setjmp that a smart compiler can notice a call to it, and bracket other calls from that function with a special save-return sequence so that stack unravelling is not needed. Unfortunately, this really requires a compiler that compiles a whole function at a time, and many simple or fast compilers compile a statement at a time. Some implementations restore all the registers to the way they were when the *setjmp* was called, but this is often unsatisfactory in general and can be very unsatisfactory when compilers really start playing games with register usage. Said register-usage games also make it impractical to specify behavior that depends on whether the programmer explicitly declared things "register". (Although some of us tried to point out that the set of compilers which *do* play register games but *don't* compile whole functions at a time must be pretty small, so it would not be a disaster to require the register-game compilers to do call bracketing. Alas, our wise words :-) were not heeded.) There just ain't no graceful way. -- Imagine life with OS/360 the standard | Henry Spencer at U of Toronto Zoology operating system. Now think about X. | henry@zoo.toronto.edu utzoo!henry
tom@ssd.csd.harris.com (Tom Horsley) (10/08/90)
>>>>> Regarding Re: Ambiguity in definition of setjmp/longjmp makes them much less useful; henry@zoo.toronto.edu (Henry Spencer) adds: henry> Some implementations restore all the registers to the way they were henry> when the *setjmp* was called, but this is often unsatisfactory in henry> general and can be very unsatisfactory when compilers really start henry> playing games with register usage. Wrong! With compilers that play register games restoring the registers as they were at the time of the setjmp is the ONLY implementation that works at all (unless setjmp is recognized as a special construct by the compiler, which I agree is the best way). In any optimizing compiler which is likely to do things like keep common sub-expressions in registers, the following simple example shows the requirement for restoring the registers as of the setjmp() call: { ... /* compiler computes a CSE and keeps it in register 47 */ if (setjmp(...) != 0) { /* compiler references the CSE in register 47 */ } /* compiler makes last reference to CSE in register 47 */ ... /* compiler now has something totally different in register 47 */ longjmp(...) } (In the above example register 47 is assumed to be a register that is not normally destroyed by a function call). If you were to unwind the stack and restore the registers as of the longjmp() call, you would get back to the setjmp() with random gibberish in the register the code generator expected to contain a CSE value. Personally, I believe that compilers should support setjmp() as a special construct - simply making might-goto arcs from every other function call to a point immediately following any setjmp() calls would add enough information to the flow graph for an optimizing compiler to recognize the funny lifetimes that registers might have and volatile would only be needed for variables that interact with signal handling code (since a signal can happen anywhere in the program, not just at a function call). Until the day that compilers properly support setjmp() however, the only implementation of setjmp() that stands a chance of interacting correctly with an optimizing compiler is one that restores the registers as of the setjmp() call. Unfortunately, this also means that the only user code that stands a chance of interacting correctly with an optimizing compiler is code that correctly declares all variables volatile where necessary. Since the phrase "where necessary" is difficult (if not impossible) for an ordinary mortal to figure out, the obviously best solution is to fix compilers to special case setjmp(). -- ====================================================================== domain: tahorsley@csd.harris.com USMail: Tom Horsley uucp: ...!uunet!hcx1!tahorsley 511 Kingbird Circle Delray Beach, FL 33444 +==== Censorship is the only form of Obscenity ======================+ | (Wait, I forgot government tobacco subsidies...) | +====================================================================+
shankar@hpclscu.HP.COM (Shankar Unni) (10/09/90)
> My real question is this: Why not define the behavior of setjmp/longjmp so > that the values of ALL local variables are defined, whether or not they've > been allocated to registers? Otherwise, setjmp/longjmp are significantly > less useful. Because you want to be able to keep variables in registers. By your definition, no local variable in a routine that calls setjmp() can ever be kept in a register beyond a statement boundary. Consider: jmp_buf xxx; foo() { int i = 0; if (setjmp(xxx)) { i = 5; bar(); } } bar() { longjmp (xxx, 10); } Thus, unless foo() is a leaf routine, you have to assume the worst and keep "i" in memory. Most people consider this an unacceptable penalty to pay for what is, at most, fringe functionality. After all, as you discovered yourself, making "i" volatile makes it work exactly the way you want it to. I strongly disagree with the "significantly less useful" part of your statement above. Setjmp/longjmp are relatively expensive operations used to recover from extraordinary situations, and the only sort of guarantees envisioned by the designers were to: - exit the program gracefully, or - re-initialize the program to some known initial state. If you want to implement a general-purpose exception-handling facility, use "volatile" liberally (or use a C++-like front-end which will do it automatically for you). ----- Shankar Unni E-Mail: Hewlett-Packard California Language Lab. Internet: shankar@hpda.hp.com Phone : (408) 447-5797 UUCP: ...!hplabs!hpda!shankar
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (10/09/90)
In article <1990Oct8.031745.28651@zoo.toronto.edu> henry@zoo.toronto.edu (Henry Spencer) writes: [ on the pain of making setjmp() reliable ] > There just ain't no graceful way. Actually, there is a reasonably clean way to correctly allocate registers through jumps, without any overhead. All you need is to be able to load and save all the registers on demand from the fixed memory locations that they correspond to. So you keep a map of register allocation along with the code. The fun part is figuring out how to store the map without wasting too much memory; there are different techniques for different register allocation strategies. I don't remember who I heard this from. Certainly the existing setjmp()/longjmp() is quite useless. I had to give up on a threads library because some machines (notably a Convex, and a Sun under gcc) simply refused to treat register variables correctly across jumps. (I say correctly in the intuitive sense that putting a variable into a register shouldn't change its behavior at all, not in the ANSI sense. Declaring all variables volatile just so your program will work? Gimme a break.) ---Dan
cameron@usage.csd.oz (Cameron Simpson) (10/09/90)
From article <TOM.90Oct8071803@hcx2.ssd.csd.harris.com>, by tom@ssd.csd.harris.com (Tom Horsley): | Personally, I believe that compilers should support setjmp() as a special | construct - simply making might-goto arcs from every other function call to | a point immediately following any setjmp() calls would add enough | information to the flow graph for an optimizing compiler to recognize the | funny lifetimes that registers might have and volatile would only be needed | for variables that interact with signal handling code (since a signal | can happen anywhere in the program, not just at a function call). But think about what happens when you write sigfn(sig) { longjmp(foojmpbuf,1); /*NOTREACHED*/ } Since, as you say, a signal can happen anywhere then there is now a might-goto arc from _every_ point in the program which can conceivably be called from within any function which uses foojmpbuf as a jump buffer. This could easily include large stretches of the C library. It gets much worse if something as bizarre as the following is done: jmp_buf *current_restore_point=NULL; sigfn(sig) { if (current_restore_point == NULL) fprintf(stderr,"ouch! - uncaught signal %d\n",sig); else longjmp(*current_restore_point,sig); } And then set/clear current_restore_point around various bits of code. This puts might-goto arcs from almost every bit of code unless your compiler is almost precognitive, and the programmer aware of this effect. My preferred solution is not to use setjmp/longjmp at all. Of course, it isn't always possible. BSD's non-switch-off-able restartable system calls (like a read from a tty) irk me particularly in this regard. - Cameron Simpson cameron@spectrum.cs.unsw.oz.au "If it can't be turned off, it's not a feature." Karl Huer (I think).
sasrer@unx.sas.com (Rodney Radford) (10/09/90)
In article <1597@redsox.bsw.com> campbell@redsox.bsw.com (Larry Campbell) writes: >My real question is this: Why not define the behavior of setjmp/longjmp so >that the values of ALL local variables are defined, whether or not they've >been allocated to registers? Otherwise, setjmp/longjmp are significantly >less useful. The reason for the ambiguity is because ANSI chose not to make setjmp/longjmp functions known by the C compiler so that it just treats them just like any other functions (ie: does not force local automatics values from registers to storage). ANSI chose not to make any of the functions special case so that an application may redefine the functions (although this is frowned upon in some cases). The setjmp/longjmp functions themselves do not have the required information to force the values from the registers, so we are stuck with this oddity. > >For what it's worth, it seems to me that the description of setjmp/longjmp in >K&R 2 does imply that x should have the value 1; is this an area of >disagreement between K&R and ANSI? I believe (just guessing really) that the original C used 'builtins' for the setjmp/longjmp functions, ie: they special cased them. >-- >Larry Campbell The Boston Software Works, Inc. >campbell@redsox.bsw.com 120 Fulton Street >wjh12!redsox!campbell Boston, MA 02109 -- Rodney Radford DG/UX AViiON developer SAS Institute, Inc. sasrer@unx.sas.com (919) 677-8000 x7703 Box 8000, Cary, NC 27512
richard@aiai.ed.ac.uk (Richard Tobin) (10/10/90)
In article <1597@redsox.bsw.com> campbell@redsox.bsw.com (Larry Campbell) writes: >My real question is this: Why not define the behavior of setjmp/longjmp so >that the values of ALL local variables are defined, whether or not they've >been allocated to registers? Otherwise, setjmp/longjmp are significantly >less useful. The answer is that it's harder and slower. Either you have to store all the local variables in memory (which is why volatile works) or longjmp() has to restore the registers to the right values by "unwinding" the stack, and doing the restores as if each procedure were returning. BSD on the VAX uses the latter approach, but it would be harder for a compiler that wanted to be cleverer about register allocation. What usually happens is that setjmp() saves the values of the registers, and longjmp() restores them. This means that variables which happened to be in registers get restored to the values they had when setjmp() was called - that is, intermediate assignments are lost. It might be possible to be clever and just ensure all variables are in memory before calling a procedure that might do a longjmp(), but the compiler would have to be sure that longjmp() couldn't be called asynchronously from a signal handler. Since ANSI only says that variables which have been changed are undefined, it's hard to think of an implementation that would not result in either the right value or the setjmp() value after a longjmp(). -- Richard -- Richard Tobin, JANET: R.Tobin@uk.ac.ed AI Applications Institute, ARPA: R.Tobin%uk.ac.ed@nsfnet-relay.ac.uk Edinburgh University. UUCP: ...!ukc!ed.ac.uk!R.Tobin
peter@ficc.ferranti.com (Peter da Silva) (10/10/90)
In article <891@usage.csd.unsw.oz.au> cameron@spectrum.cs.unsw.oz.au (Cameron Simpson) writes: > But think about what happens when you write > sigfn(sig) { longjmp(foojmpbuf,1); } > Since, as you say, a signal can happen anywhere then there is now a might-goto > arc from _every_ point in the program which can conceivably be called from > within any function which uses foojmpbuf as a jump buffer. I think it reasonable not to guarantee longjmp behaviour from within signals. In fact, calling longjmp from within signals is evil. The only thing you should do within a signal routine is set a flag... anything else is a bug waiting to happen. Of course, you need to do this in BSD, but BSD is buggier than a dog pound. -- Peter da Silva. `-_-' +1 713 274 5180. 'U` peter@ferranti.com
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (10/10/90)
In article <:_A6T46@xds13.ferranti.com> peter@ficc.ferranti.com (Peter da Silva) writes: > I think it reasonable not to guarantee longjmp behaviour from within > signals. In fact, calling longjmp from within signals is evil. The only > thing you should do within a signal routine is set a flag... anything > else is a bug waiting to happen. Correct. > Of course, you need to do this in BSD, but BSD is buggier than a dog pound. Say what? I've written large BSD applications that don't do anything inside signal handlers other than set flags. Where's this ``need'' you talk about? And if you're going to insist that BSD is buggier than SysV, how about some proof? ---Dan
mcdaniel@adi.com (Tim McDaniel) (10/10/90)
I don't know how Rodney Radford (sasrer@unx.sas.com) managed to get almost everything exactly backwards. Someone must have really misinformed him. He writes: > ANSI chose not to make any of the functions special case so that an > application may redefine the functions (although this is frowned > upon in some cases). From section 4.1.2.1, "Reserved Identifiers", of the ANSI C standard (page 98): All identifiers with external linkage in any of the following sections (including the future library directions) are always reserved for use as identifiers with external linkage. So it's not "frowned upon" to redefine ANSI C functions as functions; it's undefined, and it often won't work in practice. > The reason for the ambiguity is because ANSI chose not to make > setjmp/longjmp functions known by the C compiler so that it just > treats them just like any other functions (ie: does not force local > automatics values from registers to storage). Section 4.6, page 119: It is unspecified whether setjmp is a macro or an identifier declared with external linkage. If a macro definition is suppressed in order to access an actual function, or a program defines an external identifier with the name setjmp, the behavior is undefined. . . . An invokation of the setjmp macro shall appear only in one of the following contexts: - the entire controlling expression of a selection or iteration statement; - one operand of a relational or equality operator with the other operand an integral constand expression, with the resulting expression being the entire controlling expression of a selection or iteration statement; - the operand of a unary ! operator with the resulting expression being the entire controlling expression of a selection or iteration statement; or - the entire expression of an expression statement (possibly cast to void). So setjmp may indeed be special. Note that "&setjmp" is not permitted, nor is "*fp" where fp points to the underlying setjmp function (if any). Thus, setjmp can always be "known by the C compiler", if the compiler chooses to look. Thus, a compiler can always determine which functions call setjmp. > The setjmp/longjmp functions themselves do not have the required > information to force the values from the registers, so we are stuck > with this oddity. Section 4.6.2.1, page 120: the values of objects of automatic storage duration that are local to the function containing the invokation of the corresponding setjmp macro that do not have volatile-qualified type and have been changed between the setjmp invocation and longjmp call are indeterminate. The functions themselves (if they exist) do not. By the previous section, however, the *compiler itself* has enough information, and it can choose to 'do the right thing'. > I believe (just guessing really) that the original C used 'builtins' > for the setjmp/longjmp functions, ie: they special cased them. To the best of my knowledge, the original several generations of C compilers did not special-case them. I'd be surprised if any "standard" compilers (SUN OS, SYS V for VAXen, et cetera) have ever done special-cased any functions. -- Tim McDaniel Applied Dynamics Int'l.; Ann Arbor, Michigan, USA Work phone: +1 313 973 1300 Home phone: +1 313 677 4386 Internet: mcdaniel@adi.com UUCP: {uunet,sharkey}!amara!mcdaniel
peter@ficc.ferranti.com (Peter da Silva) (10/10/90)
> > Of course, you need to do this in BSD, but BSD is buggier than a dog pound. > Say what? I've written large BSD applications that don't do anything > inside signal handlers other than set flags. Where's this ``need'' you > talk about? To use an alarm to break a read. In system V you can do that just by setting a flag. In BSD you have to longjmp out. Now you'll tell me to use sockets. I'm allergic to objects outside the UNIX filesystem name space. > And if you're going to insist that BSD is buggier than SysV, > how about some proof? I didn't say that. I said BSD is buggier than a dog pound. That doesn't imply that System V *isn't*. I just tend to trust System V more because it shows fewer signs of feeping creaturism. Fewer places for bugs to hide. In retrospect the awful tardiness of AT&T in getting streams into someplace you can do something useful with them might be a blessing. -- Peter da Silva. `-_-' +1 713 274 5180. 'U` peter@ferranti.com
henry@zoo.toronto.edu (Henry Spencer) (10/10/90)
In article <1990Oct09.143521.24019@unx.sas.com> sasrer@unx.sas.com (Rodney Radford) writes: >I believe (just guessing really) that the original C used 'builtins' for the >setjmp/longjmp functions, ie: they special cased them. Nope. The original pdp11 C compiler had a predictable stack-frame format and could do stack unravelling right. -- Imagine life with OS/360 the standard | Henry Spencer at U of Toronto Zoology operating system. Now think about X. | henry@zoo.toronto.edu utzoo!henry
peter@ficc.ferranti.com (Peter da Silva) (10/12/90)
In article <1990Oct10.152659.6334@zoo.toronto.edu> henry@zoo.toronto.edu (Henry Spencer) writes: > In article <1990Oct09.143521.24019@unx.sas.com> sasrer@unx.sas.com (Rodney Radford) writes: > >I believe (just guessing really) that the original C used 'builtins' for the > >setjmp/longjmp functions, ie: they special cased them. > Nope. The original pdp11 C compiler had a predictable stack-frame format > and could do stack unravelling right. It also didn't do any optimisations across statement boundaries or past function calls, that I know of. So there wasn't anything sitting in a register waiting to get clobbered when you did the longjmp(). -- Peter da Silva. `-_-' +1 713 274 5180. 'U` peter@ferranti.com
meissner@osf.org (Michael Meissner) (10/12/90)
In article <1990Oct09.143521.24019@unx.sas.com> sasrer@unx.sas.com (Rodney Radford) writes: | In article <1597@redsox.bsw.com> campbell@redsox.bsw.com (Larry Campbell) writes: | >My real question is this: Why not define the behavior of setjmp/longjmp so | >that the values of ALL local variables are defined, whether or not they've | >been allocated to registers? Otherwise, setjmp/longjmp are significantly | >less useful. | | The reason for the ambiguity is because ANSI chose not to make setjmp/longjmp | functions known by the C compiler so that it just treats them just like any | other functions (ie: does not force local automatics values from registers | to storage). ANSI chose not to make any of the functions special case so | that an application may redefine the functions (although this is frowned upon | in some cases). The setjmp/longjmp functions themselves do not have the | required information to force the values from the registers, so we are stuck | with this oddity. In some implementations, setjmp could easily get the information if it so desired. For example on systems that use MIPS chips (MIPS, SGI, DECstation, etc.), there is a side table that contains the information for each function on what registers in the preserved register set are saved and where they are saved, how to calculate the virtual frame pointer, etc. The table can be made to appear in memory by emitting a specific external. The 88k computers have a similar facility, though the table is always in memory. Longjmp could unwind each stack frame, and reset the registers until it got back to where it should be, but that is usually too hard to justify doing to management! -- Michael Meissner email: meissner@osf.org phone: 617-621-8861 Open Software Foundation, 11 Cambridge Center, Cambridge, MA, 02142 Do apple growers tell their kids money doesn't grow on bushes?
pgd@bbt.se (10/16/90)
In article <1990Oct10.152659.6334@zoo.toronto.edu> henry@zoo.toronto.edu (Henry Spencer) writes: >In article <1990Oct09.143521.24019@unx.sas.com> sasrer@unx.sas.com (Rodney Radford) writes: >>I believe (just guessing really) that the original C used 'builtins' for the >>setjmp/longjmp functions, ie: they special cased them. > >Nope. The original pdp11 C compiler had a predictable stack-frame format >and could do stack unravelling right. I am not 100% sure of this, but I think that the original C library just restored some registers without any fancy unravelling. (r5,sp,pc). It also kept all variables, except for register variables, in memory. The original c-library also had nargs(), but that one was a horrible kludge. (It was looking at the machine instructions to check out how many bytes were popped from the stack after return from the call instruction.)
henry@zoo.toronto.edu (Henry Spencer) (10/16/90)
In article <1990Oct15.174203.21441@bbt.se> pgd@bbt.se writes: >>... The original pdp11 C compiler had a predictable stack-frame format >>and could do stack unravelling right. > >I am not 100% sure of this, but I think that the original C library >just restored some registers without any fancy unravelling. (r5,sp,pc). It restored r2-r4 as well, necessarily, since they were the programmer's register variables. This was done by unravelling the stack, looking for a frame whose address was equal to the saved frame pointer (r5), and then, more or less, restoring the three registers you mention *and* doing a return, which unstacked r2-r4. The stack unravelling was pretty simple, because the format was fixed and every call saved r2-r4. So it really was quite straightforward to get it right. >It also kept all variables, except for register variables, in memory. That's what the "register" keyword was for, after all. >The original c-library also had nargs()... Well, it depends on how "original" we are talking about. Nargs() had vanished by the time setjmp()/longjmp() appeared in their definitive form, in V7. It was always of somewhat doubtful usefulness, given the presence of datatypes of different sizes. -- "...the i860 is a wonderful source | Henry Spencer at U of Toronto Zoology of thesis topics." --Preston Briggs | henry@zoo.toronto.edu utzoo!henry