[comp.lang.c] Ambiguity in definition of setjmp/longjmp makes them much less useful

campbell@redsox.bsw.com (Larry Campbell) (10/08/90)

We have implemented a portable (or so we thought) exception handling
facility for C.  In order to allow exception handlers to have the same scope
as the code being guarded, we used setjmp/longjmp instead of ssignal.
However, the ambiguous definition of setjmp/longjmp is giving us heartburn.

Consider the following code:
----------------------------------------
 1    {
 2    int x;
 3    x = 0;
 4    if (! setjmp(foo))
 5        {
 6        x = 1;
 7        foo();
 8        }
 9    else
10        {
11        printf(x = %d\n", x);
12        }
13    }
----------------------------------------

If foo() calls longjmp, the value of x when line 11 gets executed appears to
be undefined (I don't have a copy of the ANSI standard, but I've checked
about eight compiler manuals; most say it's undefined, or undefined if x
isn't declared volatile).

In the three compilers I've tested that claim ANSI compliance, declaring x to
be volatile yields the desired result (x = 1).  In the non-ANSI compilers,
disabling optimization yields the desired result, but enabling optimization
usually yields x = 0.

I've never seen any value for x other than 0 or 1.

My real question is this:  Why not define the behavior of setjmp/longjmp so
that the values of ALL local variables are defined, whether or not they've
been allocated to registers?  Otherwise, setjmp/longjmp are significantly
less useful.

For what it's worth, it seems to me that the description of setjmp/longjmp in
K&R 2 does imply that x should have the value 1; is this an area of
disagreement between K&R and ANSI?
-- 
Larry Campbell                          The Boston Software Works, Inc.
campbell@redsox.bsw.com                 120 Fulton Street
wjh12!redsox!campbell                   Boston, MA 02109

poser@csli.Stanford.EDU (Bill Poser) (10/08/90)

In article <1597@redsox.bsw.com> campbell@redsox.bsw.com (Larry Campbell) writes:
>Consider the following code:
> 4    if (! setjmp(foo))
> 5        {
> 6        x = 1;
> 7        foo();
> 8        }

I agree that it is unfortunate that setjmp does not save non-register
locals, but this code is wrong. The argument to setjmp is a jmpbuf
structure, not a function.

henry@zoo.toronto.edu (Henry Spencer) (10/08/90)

In article <1597@redsox.bsw.com> campbell@redsox.bsw.com (Larry Campbell) writes:
>My real question is this:  Why not define the behavior of setjmp/longjmp so
>that the values of ALL local variables are defined, whether or not they've
>been allocated to registers? ...

Because it is painful to implement in certain situations, and there are
many existing compilers that punt said situations as a result.

One would really like longjmp to act much like a multi-level return.  This
is hard, because there may be saved register values on the stack which would
have to be restored.  If the format of the stack frame is fixed (pdp11) or
self-describing (VAX), this is easy enough... but on modern machines you
have neither of those happy situations, and it can be arbitrarily hard to
figure out which parts of the stack represent values that should be put
back into registers.  (You don't want to incur overhead on every function
call just because somebody might someday call longjmp.)

ANSI C puts enough constraints on setjmp that a smart compiler can notice
a call to it, and bracket other calls from that function with a special
save-return sequence so that stack unravelling is not needed.  Unfortunately,
this really requires a compiler that compiles a whole function at a time,
and many simple or fast compilers compile a statement at a time.

Some implementations restore all the registers to the way they were when
the *setjmp* was called, but this is often unsatisfactory in general and
can be very unsatisfactory when compilers really start playing games with
register usage.

Said register-usage games also make it impractical to specify behavior
that depends on whether the programmer explicitly declared things "register".
(Although some of us tried to point out that the set of compilers which *do*
play register games but *don't* compile whole functions at a time must be
pretty small, so it would not be a disaster to require the register-game
compilers to do call bracketing.  Alas, our wise words :-) were not heeded.)

There just ain't no graceful way.
-- 
Imagine life with OS/360 the standard  | Henry Spencer at U of Toronto Zoology
operating system.  Now think about X.  |  henry@zoo.toronto.edu   utzoo!henry

tom@ssd.csd.harris.com (Tom Horsley) (10/08/90)

>>>>> Regarding Re: Ambiguity in definition of setjmp/longjmp makes them much less useful; henry@zoo.toronto.edu (Henry Spencer) adds:

henry> Some implementations restore all the registers to the way they were
henry> when the *setjmp* was called, but this is often unsatisfactory in
henry> general and can be very unsatisfactory when compilers really start
henry> playing games with register usage.

Wrong! With compilers that play register games restoring the registers as
they were at the time of the setjmp is the ONLY implementation that works
at all (unless setjmp is recognized as a special construct by the compiler,
which I agree is the best way). In any optimizing compiler which is likely
to do things like keep common sub-expressions in registers, the following
simple example shows the requirement for restoring the registers as of the
setjmp() call:

{
   ...
   /* compiler computes a CSE and keeps it in register 47 */
   if (setjmp(...) != 0) {
      /* compiler references the CSE in register 47 */
   }
   /* compiler makes last reference to CSE in register 47 */
   ...
   /* compiler now has something totally different in register 47 */
   longjmp(...)
}

(In the above example register 47 is assumed to be a register that is not
normally destroyed by a function call).

If you were to unwind the stack and restore the registers as of the
longjmp() call, you would get back to the setjmp() with random gibberish in
the register the code generator expected to contain a CSE value.

Personally, I believe that compilers should support setjmp() as a special
construct - simply making might-goto arcs from every other function call to
a point immediately following any setjmp() calls would add enough
information to the flow graph for an optimizing compiler to recognize the
funny lifetimes that registers might have and volatile would only be needed
for variables that interact with signal handling code (since a signal
can happen anywhere in the program, not just at a function call).

Until the day that compilers properly support setjmp() however, the only
implementation of setjmp() that stands a chance of interacting correctly
with an optimizing compiler is one that restores the registers as of the
setjmp() call. Unfortunately, this also means that the only user code that
stands a chance of interacting correctly with an optimizing compiler is code
that correctly declares all variables volatile where necessary.  Since the
phrase "where necessary" is difficult (if not impossible) for an ordinary
mortal to figure out, the obviously best solution is to fix compilers to
special case setjmp().
--
======================================================================
domain: tahorsley@csd.harris.com       USMail: Tom Horsley
  uucp: ...!uunet!hcx1!tahorsley               511 Kingbird Circle
                                               Delray Beach, FL  33444
+==== Censorship is the only form of Obscenity ======================+
|     (Wait, I forgot government tobacco subsidies...)               |
+====================================================================+

shankar@hpclscu.HP.COM (Shankar Unni) (10/09/90)

> My real question is this:  Why not define the behavior of setjmp/longjmp so
> that the values of ALL local variables are defined, whether or not they've
> been allocated to registers?  Otherwise, setjmp/longjmp are significantly
> less useful.

Because you want to be able to keep variables in registers. By your
definition, no local variable in a routine that calls setjmp() can ever be
kept in a register beyond a statement boundary. Consider:


    jmp_buf xxx;
    
    foo()
    {
       int i = 0;
       
       if (setjmp(xxx)) {
          i = 5;
	  
	  bar();
       }
    }

    bar()
    {
       longjmp (xxx, 10);
    }

Thus, unless foo() is a leaf routine, you have to assume the worst and keep
"i" in memory.

Most people consider this an unacceptable penalty to pay for what is, at
most, fringe functionality. After all, as you discovered yourself, making
"i" volatile makes it work exactly the way you want it to.

I strongly disagree with the "significantly less useful" part of your
statement above.  Setjmp/longjmp are relatively expensive operations used
to recover from extraordinary situations, and the only sort of guarantees
envisioned by the designers were to:

   - exit the program gracefully, or
   - re-initialize the program to some known initial state.

If you want to implement a general-purpose exception-handling facility, use
"volatile" liberally (or use a C++-like front-end which will do it
automatically for you).
-----
Shankar Unni                                   E-Mail: 
Hewlett-Packard California Language Lab.     Internet: shankar@hpda.hp.com
Phone : (408) 447-5797                           UUCP: ...!hplabs!hpda!shankar

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (10/09/90)

In article <1990Oct8.031745.28651@zoo.toronto.edu> henry@zoo.toronto.edu (Henry Spencer) writes:
  [ on the pain of making setjmp() reliable ]
> There just ain't no graceful way.

Actually, there is a reasonably clean way to correctly allocate
registers through jumps, without any overhead. All you need is to be
able to load and save all the registers on demand from the fixed memory
locations that they correspond to. So you keep a map of register
allocation along with the code. The fun part is figuring out how to
store the map without wasting too much memory; there are different
techniques for different register allocation strategies. I don't
remember who I heard this from.

Certainly the existing setjmp()/longjmp() is quite useless. I had to
give up on a threads library because some machines (notably a Convex,
and a Sun under gcc) simply refused to treat register variables
correctly across jumps. (I say correctly in the intuitive sense that
putting a variable into a register shouldn't change its behavior at all,
not in the ANSI sense. Declaring all variables volatile just so your
program will work? Gimme a break.)

---Dan

cameron@usage.csd.oz (Cameron Simpson) (10/09/90)

From article <TOM.90Oct8071803@hcx2.ssd.csd.harris.com>, by tom@ssd.csd.harris.com (Tom Horsley):
| Personally, I believe that compilers should support setjmp() as a special
| construct - simply making might-goto arcs from every other function call to
| a point immediately following any setjmp() calls would add enough
| information to the flow graph for an optimizing compiler to recognize the
| funny lifetimes that registers might have and volatile would only be needed
| for variables that interact with signal handling code (since a signal
| can happen anywhere in the program, not just at a function call).

But think about what happens when you write
	sigfn(sig)
	{
		longjmp(foojmpbuf,1);
		/*NOTREACHED*/
	}
Since, as you say, a signal can happen anywhere then there is now a might-goto
arc from _every_ point in the program which can conceivably be called from
within any function which uses foojmpbuf as a jump buffer. This could easily
include large stretches of the C library. It gets much worse if something as
bizarre as the following is done:
	jmp_buf	*current_restore_point=NULL;

	sigfn(sig)
	{
		if (current_restore_point == NULL)
			fprintf(stderr,"ouch! - uncaught signal %d\n",sig);
		else
		longjmp(*current_restore_point,sig);
	}
And then set/clear current_restore_point around various bits of code. This
puts might-goto arcs from almost every bit of code unless your compiler is
almost precognitive, and the programmer aware of this effect.

My preferred solution is not to use setjmp/longjmp at all. Of course, it
isn't always possible. BSD's non-switch-off-able restartable system calls
(like a read from a tty) irk me particularly in this regard.
	- Cameron Simpson
	  cameron@spectrum.cs.unsw.oz.au

"If it can't be turned off, it's not a feature." Karl Huer (I think).

sasrer@unx.sas.com (Rodney Radford) (10/09/90)

In article <1597@redsox.bsw.com> campbell@redsox.bsw.com (Larry Campbell) writes:
>My real question is this:  Why not define the behavior of setjmp/longjmp so
>that the values of ALL local variables are defined, whether or not they've
>been allocated to registers?  Otherwise, setjmp/longjmp are significantly
>less useful.

The reason for the ambiguity is because ANSI chose not to make setjmp/longjmp
functions known by the C compiler so that it just treats them just like any
other functions (ie: does not force local automatics values from registers
to storage). ANSI chose not to make any of the functions special case so
that an application may redefine the functions (although this is frowned upon
in some cases). The setjmp/longjmp functions themselves do not have the
required information to force the values from the registers, so we are stuck
with this oddity.

>
>For what it's worth, it seems to me that the description of setjmp/longjmp in
>K&R 2 does imply that x should have the value 1; is this an area of
>disagreement between K&R and ANSI?

I believe (just guessing really) that the original C used 'builtins' for the
setjmp/longjmp functions, ie: they special cased them.

>-- 
>Larry Campbell                          The Boston Software Works, Inc.
>campbell@redsox.bsw.com                 120 Fulton Street
>wjh12!redsox!campbell                   Boston, MA 02109

-- 
Rodney Radford        DG/UX AViiON developer        SAS Institute, Inc.
sasrer@unx.sas.com    (919) 677-8000 x7703          Box 8000, Cary, NC 27512

richard@aiai.ed.ac.uk (Richard Tobin) (10/10/90)

In article <1597@redsox.bsw.com> campbell@redsox.bsw.com (Larry Campbell) writes:
>My real question is this:  Why not define the behavior of setjmp/longjmp so
>that the values of ALL local variables are defined, whether or not they've
>been allocated to registers?  Otherwise, setjmp/longjmp are significantly
>less useful.

The answer is that it's harder and slower.  Either you have to store all
the local variables in memory (which is why volatile works) or longjmp()
has to restore the registers to the right values by "unwinding" the stack,
and doing the restores as if each procedure were returning.  BSD on the
VAX uses the latter approach, but it would be harder for a compiler
that wanted to be cleverer about register allocation.

What usually happens is that setjmp() saves the values of the
registers, and longjmp() restores them.  This means that variables
which happened to be in registers get restored to the values they
had when setjmp() was called - that is, intermediate assignments are
lost.

It might be possible to be clever and just ensure all variables are in
memory before calling a procedure that might do a longjmp(), but the
compiler would have to be sure that longjmp() couldn't be called 
asynchronously from a signal handler.

Since ANSI only says that variables which have been changed are undefined,
it's hard to think of an implementation that would not result in either
the right value or the setjmp() value after a longjmp().

-- Richard
-- 
Richard Tobin,                       JANET: R.Tobin@uk.ac.ed             
AI Applications Institute,           ARPA:  R.Tobin%uk.ac.ed@nsfnet-relay.ac.uk
Edinburgh University.                UUCP:  ...!ukc!ed.ac.uk!R.Tobin

peter@ficc.ferranti.com (Peter da Silva) (10/10/90)

In article <891@usage.csd.unsw.oz.au> cameron@spectrum.cs.unsw.oz.au (Cameron Simpson) writes:
> But think about what happens when you write
> 	sigfn(sig) { longjmp(foojmpbuf,1); }

> Since, as you say, a signal can happen anywhere then there is now a might-goto
> arc from _every_ point in the program which can conceivably be called from
> within any function which uses foojmpbuf as a jump buffer.

I think it reasonable not to guarantee longjmp behaviour from within
signals. In fact, calling longjmp from within signals is evil. The only
thing you should do within a signal routine is set a flag... anything
else is a bug waiting to happen.

Of course, you need to do this in BSD, but BSD is buggier than a dog pound.
-- 
Peter da Silva.   `-_-'
+1 713 274 5180.   'U`
peter@ferranti.com

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (10/10/90)

In article <:_A6T46@xds13.ferranti.com> peter@ficc.ferranti.com (Peter da Silva) writes:
> I think it reasonable not to guarantee longjmp behaviour from within
> signals. In fact, calling longjmp from within signals is evil. The only
> thing you should do within a signal routine is set a flag... anything
> else is a bug waiting to happen.

Correct.

> Of course, you need to do this in BSD, but BSD is buggier than a dog pound.

Say what? I've written large BSD applications that don't do anything
inside signal handlers other than set flags. Where's this ``need'' you
talk about? And if you're going to insist that BSD is buggier than SysV,
how about some proof?

---Dan

mcdaniel@adi.com (Tim McDaniel) (10/10/90)

I don't know how Rodney Radford (sasrer@unx.sas.com) managed to get
almost everything exactly backwards.  Someone must have really
misinformed him.

He writes:

> ANSI chose not to make any of the functions special case so that an
> application may redefine the functions (although this is frowned
> upon in some cases).

From section 4.1.2.1, "Reserved Identifiers", of the ANSI C standard
(page 98):

   All identifiers with external linkage in any of the following
   sections (including the future library directions) are always
   reserved for use as identifiers with external linkage.

So it's not "frowned upon" to redefine ANSI C functions as functions;
it's undefined, and it often won't work in practice.

> The reason for the ambiguity is because ANSI chose not to make
> setjmp/longjmp functions known by the C compiler so that it just
> treats them just like any other functions (ie: does not force local
> automatics values from registers to storage).

Section 4.6, page 119:

   It is unspecified whether setjmp is a macro or an identifier
   declared with external linkage.  If a macro definition is
   suppressed in order to access an actual function, or a program
   defines an external identifier with the name setjmp, the behavior
   is undefined. . . .

   An invokation of the setjmp macro shall appear only in one of the
   following contexts:

   - the entire controlling expression of a selection or iteration
     statement;

   - one operand of a relational or equality operator with the other
     operand an integral constand expression, with the resulting
     expression being the entire controlling expression of a selection
     or iteration statement;

   - the operand of a unary ! operator with the resulting expression
     being the entire controlling expression of a selection or
     iteration statement; or

   - the entire expression of an expression statement (possibly cast
     to void).

So setjmp may indeed be special.  Note that "&setjmp" is not
permitted, nor is "*fp" where fp points to the underlying setjmp
function (if any).  Thus, setjmp can always be "known by the C
compiler", if the compiler chooses to look.  Thus, a compiler can
always determine which functions call setjmp.

> The setjmp/longjmp functions themselves do not have the required
> information to force the values from the registers, so we are stuck
> with this oddity.

Section 4.6.2.1, page 120:

   the values of objects of automatic storage duration that are local
   to the function containing the invokation of the corresponding
   setjmp macro that do not have volatile-qualified type and have been
   changed between the setjmp invocation and longjmp call are
   indeterminate.

The functions themselves (if they exist) do not.  By the previous
section, however, the *compiler itself* has enough information, and it
can choose to 'do the right thing'.

> I believe (just guessing really) that the original C used 'builtins'
> for the setjmp/longjmp functions, ie: they special cased them.

To the best of my knowledge, the original several generations of C
compilers did not special-case them.  I'd be surprised if any
"standard" compilers (SUN OS, SYS V for VAXen, et cetera) have ever
done special-cased any functions.
--
Tim McDaniel                 Applied Dynamics Int'l.; Ann Arbor, Michigan, USA
Work phone: +1 313 973 1300                        Home phone: +1 313 677 4386
Internet: mcdaniel@adi.com                UUCP: {uunet,sharkey}!amara!mcdaniel

peter@ficc.ferranti.com (Peter da Silva) (10/10/90)

> > Of course, you need to do this in BSD, but BSD is buggier than a dog pound.

> Say what? I've written large BSD applications that don't do anything
> inside signal handlers other than set flags. Where's this ``need'' you
> talk about?

To use an alarm to break a read. In system V you can do that just by setting
a flag. In BSD you have to longjmp out. Now you'll tell me to use sockets. I'm
allergic to objects outside the UNIX filesystem name space.

> And if you're going to insist that BSD is buggier than SysV,
> how about some proof?

I didn't say that. I said BSD is buggier than a dog pound. That doesn't
imply that System V *isn't*. I just tend to trust System V more because
it shows fewer signs of feeping creaturism. Fewer places for bugs to
hide. In retrospect the awful tardiness of AT&T in getting streams into
someplace you can do something useful with them might be a blessing.
-- 
Peter da Silva.   `-_-'
+1 713 274 5180.   'U`
peter@ferranti.com

henry@zoo.toronto.edu (Henry Spencer) (10/10/90)

In article <1990Oct09.143521.24019@unx.sas.com> sasrer@unx.sas.com (Rodney Radford) writes:
>I believe (just guessing really) that the original C used 'builtins' for the
>setjmp/longjmp functions, ie: they special cased them.

Nope.  The original pdp11 C compiler had a predictable stack-frame format
and could do stack unravelling right.
-- 
Imagine life with OS/360 the standard  | Henry Spencer at U of Toronto Zoology
operating system.  Now think about X.  |  henry@zoo.toronto.edu   utzoo!henry

peter@ficc.ferranti.com (Peter da Silva) (10/12/90)

In article <1990Oct10.152659.6334@zoo.toronto.edu> henry@zoo.toronto.edu (Henry Spencer) writes:
> In article <1990Oct09.143521.24019@unx.sas.com> sasrer@unx.sas.com (Rodney Radford) writes:
> >I believe (just guessing really) that the original C used 'builtins' for the
> >setjmp/longjmp functions, ie: they special cased them.

> Nope.  The original pdp11 C compiler had a predictable stack-frame format
> and could do stack unravelling right.

It also didn't do any optimisations across statement boundaries or past
function calls, that I know of. So there wasn't anything sitting in a
register waiting to get clobbered when you did the longjmp().
-- 
Peter da Silva.   `-_-'
+1 713 274 5180.   'U`
peter@ferranti.com

meissner@osf.org (Michael Meissner) (10/12/90)

In article <1990Oct09.143521.24019@unx.sas.com> sasrer@unx.sas.com
(Rodney Radford) writes:

| In article <1597@redsox.bsw.com> campbell@redsox.bsw.com (Larry Campbell) writes:
| >My real question is this:  Why not define the behavior of setjmp/longjmp so
| >that the values of ALL local variables are defined, whether or not they've
| >been allocated to registers?  Otherwise, setjmp/longjmp are significantly
| >less useful.
| 
| The reason for the ambiguity is because ANSI chose not to make setjmp/longjmp
| functions known by the C compiler so that it just treats them just like any
| other functions (ie: does not force local automatics values from registers
| to storage). ANSI chose not to make any of the functions special case so
| that an application may redefine the functions (although this is frowned upon
| in some cases). The setjmp/longjmp functions themselves do not have the
| required information to force the values from the registers, so we are stuck
| with this oddity.

In some implementations, setjmp could easily get the information if it
so desired.  For example on systems that use MIPS chips (MIPS, SGI,
DECstation, etc.), there is a side table that contains the information
for each function on what registers in the preserved register set are
saved and where they are saved, how to calculate the virtual frame
pointer, etc.  The table can be made to appear in memory by emitting a
specific external.  The 88k computers have a similar facility, though
the table is always in memory.  Longjmp could unwind each stack frame,
and reset the registers until it got back to where it should be, but
that is usually too hard to justify doing to management!
--
Michael Meissner	email: meissner@osf.org		phone: 617-621-8861
Open Software Foundation, 11 Cambridge Center, Cambridge, MA, 02142

Do apple growers tell their kids money doesn't grow on bushes?

pgd@bbt.se (10/16/90)

In article <1990Oct10.152659.6334@zoo.toronto.edu> henry@zoo.toronto.edu (Henry Spencer) writes:
>In article <1990Oct09.143521.24019@unx.sas.com> sasrer@unx.sas.com (Rodney Radford) writes:
>>I believe (just guessing really) that the original C used 'builtins' for the
>>setjmp/longjmp functions, ie: they special cased them.
>
>Nope.  The original pdp11 C compiler had a predictable stack-frame format
>and could do stack unravelling right.

I am not 100% sure of this, but I think that the original C library
just restored some registers without any fancy unravelling. (r5,sp,pc).
It also kept all variables, except for register variables, in memory.

The original c-library also had nargs(), but that one was a horrible
kludge. (It was looking at the machine instructions to check out how
many bytes were popped from the stack after return from the call instruction.)

henry@zoo.toronto.edu (Henry Spencer) (10/16/90)

In article <1990Oct15.174203.21441@bbt.se> pgd@bbt.se writes:
>>... The original pdp11 C compiler had a predictable stack-frame format
>>and could do stack unravelling right.
>
>I am not 100% sure of this, but I think that the original C library
>just restored some registers without any fancy unravelling. (r5,sp,pc).

It restored r2-r4 as well, necessarily, since they were the programmer's
register variables.  This was done by unravelling the stack, looking
for a frame whose address was equal to the saved frame pointer (r5),
and then, more or less, restoring the three registers you mention *and*
doing a return, which unstacked r2-r4.  The stack unravelling was pretty
simple, because the format was fixed and every call saved r2-r4.  So
it really was quite straightforward to get it right.

>It also kept all variables, except for register variables, in memory.

That's what the "register" keyword was for, after all.

>The original c-library also had nargs()...

Well, it depends on how "original" we are talking about.  Nargs() had
vanished by the time setjmp()/longjmp() appeared in their definitive form,
in V7.  It was always of somewhat doubtful usefulness, given the presence
of datatypes of different sizes.
-- 
"...the i860 is a wonderful source     | Henry Spencer at U of Toronto Zoology
of thesis topics."    --Preston Briggs |  henry@zoo.toronto.edu   utzoo!henry