[comp.lang.fortran] Exception handling

ok@quintus.uucp (Richard A. O'Keefe) (09/14/88)

Meta-comments:
(1) I have posted a message to comp.arch asking which machines cannot
    support a simple exception-handling system.  If anyone reading this
    newsgroup has this information, let's have it.  Does anyone have
    any hard data on how much an exception-handling system slows things
    down?

(2) This has degenerated into an argument between me and Giles.

    I am proposing that Fortran should have some sort of exception handling
    facility as part of the standard, which facility would not require the
    existence of precise interrupts.  Indeed, the only actual run-time
    support required is a branch to the part of the user's code which says
    what to do if the exception in question cannot be trapped.

    Giles claims that this would be an intolerable burden on people who
    are trying to get maximum performance from their systems, that it
    is inflexible, that "correct" programs don't need any such thing,
    and that an exception-handling facility if it exists at all should
    be "language-independent" (though this does not mean that it should
    work with every current language).

    I really don't think that either of us is likely to change our mind.
    Certainly not if we are the only ones discussing the topic.  Part of
    the problem is that I don't know what it is like to use a Cray or
    any other super-computer.

    What we need, if this discussion is to actually be *useful*, is input
    from other people.  Does anyone other than Giles understand why he
    finds my suggestion objectionable?  For the purpose of discussion,
    let's make my proposal concrete:
	HANDLE ('exception name, ..., exception name') IN
	    statements to be protected
	CANNOT HANDLE
	    statements to execute if any of the exceptions cannot be trapped
	HANDLER
	    statements to execute if any of the exceptions occurs
	END HANDLER
    The compiler is to assume in the HANDLER branch that any variable
    which might be altered by the protected code has been trashed.
    The list of exception names is a CHARACTER*(*) expression.  A compiler
    would be entitled to generate code for *just* the CANNOT HANDLE branch,
    though it would be good manners to print a warning message.  The
    precise location of an exception does not matter, but any pending
    exceptions must be signalled before control leaves the protected code.

    On which machines must this be inefficient (but how _can_ just
    compiling the CANNOT HANDLE branch be inefficient?) and what are the
    reasons for this inefficiency?

    This scheme was designed to be compatible with ADA, or with PL/I, or
    with the only Common Lisp error handling proposal I know anything about,
    or with UNIX C.  Is anyone mixing Fortran with some other language 
    having an exception handling facility that this is not compatible with?

    Because I don't think the discussion is getting anywhere, I shall only
    post on it again if someone else enters the discussion.

In article <3501@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
>From article <390@quintus.UUCP>, by ok@quintus.uucp (Richard A. O'Keefe):
>> 	- the possibility of the program *substituting* a *different*
>> 	  calculation in place of the faulted one.  (Think "recovery
>> 	  blocks".)
>
>This assumes that the original code (which generated the fault) hasn't
>changed the data while it was executing.

NO IT FLAMING WELL DOESN'T!  It assumes that if you want to try to
substitute a different calculation, you have saved a copy of the things
which might change.  I was going to say in one of my messages that a
compiler would assume that everything which *might* have changed in a
protected region was to be treated as trashed, but I decided that that
was so obvious it didn't need to be said.  Oh well.

>Most real applications don't have that kind of memory.

I might have the data on disk.  Or the data which are changed might
be just a few bytes.  (And yes, I do know what recovery blocks are.
Surely it was obvious that I meant "Think in terms of code which is
_like_ recovery blocks explicitly programmed by the programmer"?)

>Deleting scratch files (and other such destructive habits) are _VERY_
>bad ideas.  The cause of the exception might be contained in that scratch
>file.

I guess Giles has never been on a machine which ran out of disc space
because a buggy compiler didn't clean up its scratch files.  I do not
say that a program should *ALWAYS* delete scratch files (though it is
UNIX practice to regard doing so as elementary good manners).  What I
say is that the programmer should have the CHOICE.

>By deleting it, you have destroyed valuable evidence to the
>user in finding the cause of the fault.

Note the assumption that the >>user<< is responsible for debugging the
program.

>Since this is a C VS. Fortran discussion, I guess that you're saying
>C doesn't have this fault.

No, this is >not< a C -vs- Fortran discussion.  At least not on my side.
Error handling in C is pretty appalling.  Didn't I already mention that
there is _no_ guaranteed way in C of obtaining the error code from a
library function?

>> Let's see what assumptions we can find in it.
>> (3) There are non-trivial correct programs.
>I don't know what this means.

Giles said that exception handling should not be required because it
would slow down correct programs.  I claim that there are precious
few correct programs to worry about.

>If you're implying that a code which
>deliberately generates (and recovers from) an exceptional condition
>is 'correct', you have a strange concept of what correctness is.

I did not imply that.  Fortran already has GOTO and alternate RETURN;
it doesn't need another control structure just for jumping around.
Far from saying that such a program was 'correct', I was denying that
there *are* (m)any 'correct' programs.

>Reliable systems are ones which don't produce _wrong_ answers.

Agreed.  (And having apparently created an output file very often
amounts to producing just such a wrong answer.)

>You are the one that is claiming that only one type of person matters:
>those who crave your brand of robustness.

No.  I have proposed an exception handling scheme which has *ZERO*
overhead if it is not used.
>> I also point out once more that my proposal would allow an implementation
>> to say "no, I can't do this", which means that it would have *ZERO*
>> performance impact when a program did not request exception handling.
>
>No, your proposal would allow the language designers to say what _they_
>wanted the implementation to do.  If _they_ say that the option "no,
>I can't do this" is illegal in their language, then that's that!
But my proposal explicitly said that saying "no I can't do this" SHOULD
be legal.

>>>A language-independent library standard could be developed to address these
>>>points.
>> 
>> Let's see it!  Don't forget, there's ADA, APL, BASIC, C++, COBOL, FORTRAN,
>> LISP, ML, MODULA-2, PASCAL, PL/I, SIMULA, Smalltalk-80, ...
>
>I said 'language-independent' not 'universal'.

I didn't say universal either.  I only listed languages which I know can
run on a VAX.  Which languages _should_ a library standard be dependent
on in order to qualify as "language-independent"?

>That doesn't make it a good idea!!  By your logic, language designers
>should never even _try_ to learn from the mistakes of others.

I don't see how he draws this conclusion.  One of the key mistakes is
leaving something out.  I/O was left out of Algol 60, and where is it
today?

>> There are at least two kinds of programs:
>> (s) programs for wizards (yourself, other people in your group, certainly
>>     noone who lacks access to the source code or wouldn't understand it).
>> (o) programs for mortals (especially programs written for sale to people
>>     who do not get the source code).
>> I get the impression that Giles is thinking in terms of (s), while I am
>> thinking in terms of (o).  [...]
>
>No, I'm thinking in terms of _BOTH_.  I know, flexibility is a novel
>idea.  YOU'RE the one who claims that language standards should exclude
>one of these two groups.

But my proposal has ZERO run-time cost if it is not used.
How on earth does that exclude the wizards?
How is having an extra standard feature available which you don't
have to use inflexible?

jlg@lanl.gov (Jim Giles) (09/15/88)

From article <401@quintus.UUCP>, by ok@quintus.uucp (Richard A. O'Keefe):
> [...]
> 	HANDLE ('exception name, ..., exception name') IN
> 	    statements to be protected
> 	CANNOT HANDLE
> 	    statements to execute if any of the exceptions cannot be trapped
> 	HANDLER
> 	    statements to execute if any of the exceptions occurs
> 	END HANDLER

I object to your implication that a proposal to a language standard 
committee will emerge intact.  The CANNOT HANDLE statement would almost
certainly _not_ make it.  Your whole argument hinges on the claim that
CANNOT HANDLE will be a feature of the language.  It's not in any other
languages I'm aware of.

>>This assumes that the original code (which generated the fault) hasn't
>>changed the data while it was executing.
> 
> NO IT FLAMING WELL DOESN'T!  [...]

Yes it does!  It assumes that the data was not changed by the original
code.  You claiming that you _implied_ the same thing was not true,
nor does it make my statement false.  Saving data around suspect code
is an expensive way of meeting the assumption.

>>Most real applications don't have that kind of memory.
> 
> I might have the data on disk.  Or the data which are changed might
> be just a few bytes.  (And yes, I do know what recovery blocks are.

Oh, fine!  Now all code which _might_ generate exceptions has to do disk
I/O to preserve the data.  _THAT_ won't have a performance penalty?

> I guess Giles has never been on a machine which ran out of disc space
> because a buggy compiler didn't clean up its scratch files.  I do not
> say that a program should *ALWAYS* delete scratch files (though it is
> UNIX practice to regard doing so as elementary good manners).  What I
> say is that the programmer should have the CHOICE.

Exactly,  The _programmer_ should choose.  Not the _program_, the
_programmer_.  Only the person who does the debugging can decide
which data will be necessary to figuring out the cause of the fault.
And yes, I have been on machines which ran out of disk space.  On
at least one occasion the cause of the run-away disk grabbing was found 
in one of those temporary files that you want to automatically delete.

>>By deleting it, you have destroyed valuable evidence to the
>>user in finding the cause of the fault.
> 
> Note the assumption that the >>user<< is responsible for debugging the
> program.

The user is the one responsible for debugging.  If he doesn't do it
himself, he must find someone who will (or give up).  If the user or 
the program destroy the data necessary for debugging to be successful, 
then NOBODY will do the debugging.  

At this point in such discussions, the argument is usually raised that 
one could re-run the code from scratch and debug it interactively - 
thereby eliminating the need for keeping the whole context the first 
time the code failed.  This assumes that re-running from scratch is
a cost effective thing to do.  A production code which fails after
100+ hours of Cray time is _not_ a candidate for this kind of debugging.
I've had this argument about auto-destruct temporary files before - 
usually a few hours in the consulting office watching real user problems
being solved will cure anbody who thinks it's a good idea.

> No, this is >not< a C -vs- Fortran discussion.  At least not on my side.

Gee, the subject line said: ... Fortran vs. C ... in it until your
last message.  I guess I maid too much of an assumption that the subject
line had something to do with the articles.

>>I said 'language-independent' not 'universal'.
> 
> I didn't say universal either.  I only listed languages which I know can
> run on a VAX.  Which languages _should_ a library standard be dependent
> on in order to qualify as "language-independent"?

It is desireable for all the languages in a given programming environment
to be call-compatible (procedures in each language capable to call
procedures in some other language).  My proposal would provide for a
_single_ library of exception handling routines for all languages
which are call-compatible.  My proposal would provide for a _single_
library specification for all languages which contain the concept of
procedure calls - whether they are call-compatible or not.

> But my proposal has ZERO run-time cost if it is not used.
> How on earth does that exclude the wizards?
> How is having an extra standard feature available which you don't
> have to use inflexible?

Just because _I_ don't _have_ to use a feature, doesn't make it harmless.
Any feature in the standard _has_ to be implemented in order that the
environment be standard conforming.  As an implementor, I can tell you
that features have effects on each other.  Even a rarely used feature
may have a profound effect on more commonly used ones.  Suppose that
your 'ZERO cost' option doesn't make it into the standard (like ADA).
Suppose instead that some option like ON OVERFLOW SET RESULT TO BLIVET
(where BLIVET is some predefined constant) is added?  I've talked
to hardware designers in Cray and IBM - such a feature is _VERY_ expensive
(at least for vector and/or pipelined hardware).

Your proposed solution has (at least) three problems:

1) it requires the cooperation of the language standards committees
   (at times these people can be _very_ intractable).

2) it _doesn't_ promote and increased compatibility _between_ languages.

3) because of 1) and 2), your proposal would require as many different
   implementations of the _same_ functionality as there are languages.

J. Giles
Los Alamos

ok@quintus.uucp (Richard A. O'Keefe) (09/15/88)

In article <3576@lanl.gov> jlg@lanl.gov (Jim Giles)
continued the great debate.

(1) I owe readers an apology for not changing the subject line.

(2) Mr Giles now reveals that his condemnation of my suggestion is not
    so much for its present half-baked form as for what some hypothetical
    standards committee *might* do to it.  Well, of course, they could
    make it arbitrarily bad.  I can't think of any defence against that
    method of judgement.

(3) Correcting <3576@anl.gov> wouldn't get us anywhere.  I repeat what
    I said in my previous message:  it is time that we had someone else
    in this discussion, or dropped it entirely. 

jlg@lanl.gov (Jim Giles) (09/16/88)

From article <405@quintus.UUCP>, by ok@quintus.uucp (Richard A. O'Keefe):
> [...]
> (2) Mr Giles now reveals that his condemnation of my suggestion is not
>     so much for its present half-baked form as for what some hypothetical
>     standards committee *might* do to it.  Well, of course, they could
>     make it arbitrarily bad.  I can't think of any defence against that
>     method of judgement.

You have half my objection.  You proposal requires two things which I
think are completely unnecessary: 1) cooperation from the standards
committees; 2) modification of the compiler for each language to support
a feature which _should_ be common to all of them.

My proposal is that a single standard be devised (so only one committee
is involved - not one for each language), that the standard be for a
set of callable routines (procedure calls are already sufficiently
similar in most procedural languages to make a common interface possible),
and that these routines should be provided in a single common library
for all languages in a given environment which are call-compatible.
This proposal is _very_ different from yours.
> 
> (3) Correcting <3576@anl.gov> wouldn't get us anywhere.  I repeat what
>     I said in my previous message:  it is time that we had someone else
>     in this discussion, or dropped it entirely.

Fine, go ahead and drop the discussion.  I only responded _this_ time
because you again misrepresented my argument.

J. Giles
Los Alamos

bill@hcx2.SSD.HARRIS.COM (09/16/88)

Only fools rush in...  I will probably regret jumping into this
discussion; for the record, I don't intend to respond further unless
some other interesting point(s) is raised.  I merely want to answer
some questions and misconceptions (I think).  Background: I work
for Harris Computer Systems, maintaining compilers; I helped develop
our Ada compiler and continue to be involved in its maintenance.

> HANDLE ('exception name, ..., exception name') IN
> statements to be protected
> CANNOT HANDLE
> statements to execute if any of the exceptions cannot be trapped
> HANDLER
> statements to execute if any of the exceptions occurs
> END HANDLER
> The compiler is to assume in the HANDLER branch that any variable
> which might be altered by the protected code has been trashed.
> The list of exception names is a CHARACTER*(*) expression.  A compiler
> would be entitled to generate code for *just* the CANNOT HANDLE branch,
> though it would be good manners to print a warning message.  The
> precise location of an exception does not matter, but any pending
> exceptions must be signalled before control leaves the protected code.

Oh, but it does.  Without a precise location, how can the generated
code (or the runtime, or whoever) figure out whether the exception
occurred _inside_ the HANDLE section or _outside_.  This single fact
causes us headaches in Ada, because it is VERY expensive on our
hardware to determine exactly where the exception occurred.  (I believe
this is almost universally true in cases where floating-point
operations are done by an independent co-processor.)

> On which machines must this be inefficient (but how _can_ just
> compiling the CANNOT HANDLE branch be inefficient?) and what are the
> reasons for this inefficiency?

Again, in order to know that the CANNOT HANDLE branch should be
executed requires knowing where the exception occurred.

>>If you're implying that a code which
>>deliberately generates (and recovers from) an exceptional condition
>>is 'correct', you have a strange concept of what correctness is.

This is quite common (I am told by Ada users).  The most common
situation is doing some calculation on data obtained from some
real-time measuring device; overflow is possible, but the desired
result is just "the maximum value".  In Ada, this is easy: trap
the overflow exception and substitute the desired value.

Bill Leonard
Harris Computer Systems Division
2101 W. Cypress Creek Road
Fort Lauderdale, FL  33309
bill@ssd.harris.com or bill%ssd.harris.com@eddie.mit.edu

chris@mimsy.UUCP (Chris Torek) (09/17/88)

The original proposal from Richard O'Keefe (ok@quintus.UUCP):

>>HANDLE ('exception name, ..., exception name') IN
>>statements to be protected
>>CANNOT HANDLE
>>statements to execute if any of the exceptions cannot be trapped
>>HANDLER
>>statements to execute if any of the exceptions occurs
>>END HANDLER

He goes on to note that
>>... A compiler would be entitled to generate code for *just* the
>>CANNOT HANDLE branch, though it would be good manners to print a
>>warning message.  The precise location of an exception does not
>>matter, but any pending exceptions must be signalled before control
>>leaves the protected code.

In article <44400022@hcx2> bill@hcx2.SSD.HARRIS.COM writes:
>Oh, but it does.  Without a precise location, how can the generated
>code (or the runtime, or whoever) figure out whether the exception
>occurred _inside_ the HANDLE section or _outside_.

It does not need to `figure it out':

Consider three kinds of machines.

Primus: machines on which trapping some exception is in fact impossible
(the hardware explodes, turning the computer into shrapnel :-) ).  On
such a machine, the compiler must treat

	HANDLE exception-list IN
		primary-block
	CANNOT HANDLE
		cannot-handle-block
	HANDLER
		handler-block
	END

as the simpler statement-list

	cannot-handle-block

This is not particularly useful (all the working code in primary-block
vanishes) but at least the user need not wear battle-armour.

Secundus: machines on which trapping some exception is trivial and
cost-free.  These are not interesting either.

Tertius: machines on which trapping some exception is possible but
non-trivial.  This is most interesting if the machine has `imprecise
faults'.  By Richard O'Keefe's definitions, a compiler writer is
free to pretend this machine is like the first, and merely generate
the cannot-handle-block (along with a civil apology at compilation
time).  This is not interesting, so we will consider the other case.

>This single fact causes us headaches in Ada, because it is VERY
>expensive on our hardware to determine exactly where the exception
>occurred.  (I believe this is almost universally true in cases
>where floating-point operations are done by an independent co-processor.)

In the presence of pipelining, yes.  In its absence, no.  Morever, one
can simulate the absence of pipelining (or multi-operations) by simply
delaying until the exception is sure to have occurred.  Hence the
compiler could, instead of trying to pinpoint the error after it
occurs, generate code more like the following:

procedure delay past exception:
	for int i in [1..100] do skip; rof;
	# on some machines this might even be a system call.
	# on others it might be a single (rather slow) instruction.
erudecorp

	...
	delay past exception;	# make sure we are not hit by an old one
	enable trap exceptions (exception-list) at HANDLER;
	primary-block;		# do the usual computation
	delay past exception;	# wait for any exceptions we generated
	disable trap exceptions (exception-list);
	goto DONE;		# made it!
HANDLER:
	# where this disable is placed depends on the language definition
	# (I myself would prefer stacked exception handler semantics).
	disable trap exceptions (exception-list);

	handler-block;
DONE:

[ok@quintus:]
>>On which machines must this be inefficient (but how _can_ just
>>compiling the CANNOT HANDLE branch be inefficient?) and what are the
>>reasons for this inefficiency?

[bill@hcx2.SSD.HARRIS.COM:]
>Again, in order to know that the CANNOT HANDLE branch should be
>executed requires knowing where the exception occurred.

Not at all.  The CANNOT HANDLE branch turns into no code whatsoever
when catching exceptions.  It is irrelevant when discussing exception
catching efficiency, as it exists only for the case when exception
catching is not done at all.

The important thing is that a compiler that generates only the
cannot-handle statements is likely to be viewed as deficient (unless we
have one of those machines on which trapping exceptions is indeed
impossible).  And, if `delay past exception' is as slow as is the one
shown above, obviously detailed exception catching will be slow (since
at least one delay must be generated for each HANDLE statement).  Well,
so it goes.  If you really *need* detailed exception catching, and you
have a machine on which it is slow, let it be slow.  If you do not
*need* it, you can either leave it out of your code, or compile with a
switch that tells the compiler, `For all HANDLE statements, generate
only the primary-block, as if the HANDLE statement and the CANNOT
HANDLE and HANDLER branches did not appear.'

To me, this seems the best situation possible:  If you *can* catch
exceptions, the language permits you; if you cannot, it will tell you
so; and if you can, but it is inefficient, you have the option of
not bothering to catch exceptions (by leaving out the HANDLE statements).

Thus, the price of this feature when not used is whatever code is
required in the compiler and support libraries to implement it---i.e.,
there is no execution speed penalty, although there may be a compile
and link time penalty---and the price of this feature when it *is* used
is, well, the price of the feature.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

ok@quintus.uucp (Richard A. O'Keefe) (09/19/88)

In article <44400022@hcx2> bill@hcx2.SSD.HARRIS.COM writes:
>For the record, I don't intend to respond further unless
>some other interesting point(s) is raised.  I merely want to answer
>some questions and misconceptions (I think).

{I wrote:}
>> HANDLE ('exception name, ..., exception name') IN
>> statements to be protected
>> CANNOT HANDLE
>> statements to execute if any of the exceptions cannot be trapped
>> HANDLER
>> statements to execute if any of the exceptions occurs
>> END HANDLER
>> The precise location of an exception does not matter, but any pending
>> exceptions must be signalled before control leaves the protected code.

{He replies}
>Oh, but it does.  Without a precise location, how can the generated
>code (or the runtime, or whoever) figure out whether the exception
>occurred _inside_ the HANDLE section or _outside_.  This single fact
>causes us headaches in Ada, because it is VERY expensive on our
>hardware to determine exactly where the exception occurred.  (I believe
>this is almost universally true in cases where floating-point
>operations are done by an independent co-processor.)

I didn't spell it out, but my idea was that you would not be allowed to
jump into a HANDLE block, and that jumping out of one or exiting it any
other way (such as RETURN) would dynamically end it just like falling
off the end.  

"Precise interrupts" are where the hardware goes to a lot of trouble
to *pretend* that your code was executing on a strictly sequential
machine, so that any changes which may have taken place since the
exception occurred (PC advancing, &c) which would not have taken place
in a strictly sequential implementation are backed out.  For example,
if we have
		X = X*Y
		I = I+1
in a program, on many machines the integer operation will be done while
the floating-point one is still unfinished.  By the time an overflow is
noticed, the PC be anywhere (due to jumps).

My scheme does not require precise registers, thus defined.  As for the
location of the exception, all you need to know is whether it occurs
between two *EVENTS*.  Consider the following entirely imaginary code:

	save_mask := exception_mask	# H/W register
	save_addr := exception_addr	# H/W register
	CALL EH$NEW_MASK('....', save_mask, new_mask, ierror)
	IF (ierror .ne. 0) GO TO "CANNOT HANDLE"
	exception_addr := "HANDLER"
	exception_mask := new_mask
	statements to be protected
	*wait*for*pending*exceptions*
	exception_mask := save_mask
	exception_addr := save_addr

The information we need is that the instruction which signalled the
exception started *after* the "exception_mask := new_mask" instruction
and *before* the *wait*for*pending*exceptions* instruction completed.
The wait-for-pending-exceptions code has the job of waiting until any
functional unit which *might* cause an exception has had all the time
it needs to do so.  Note that in this scheme the handler receives
*no* information about where the PC was or which instruction signalled
the exception, nor about the state of the registers when that
instruction commenced.  In fact, should several instructions be started
each of which can signal an exception (not a hard thing to do, when
there is a deep enough pipeline) it is *not* required that the first
of them be signalled, any of them may be.

Note that this scheme doesn't assume that the protected code forms a
contiguous region:  if a compiler notices that some of the code in
the protected region looks like some code elsewhere in the subprogram,
it can generate an internal subroutine.  All that matters is the TIME
of the exception:  after this event and before that?

Now, I know how to do this on the major 32-bit micros.  Generally, any
integer or addressing exceptions are synchronous anyway, so the only
problem with those machines is floating-point exceptions, and there are
things you can do which have the effect of waiting until the floating-
point unit has completed all pending operationgs.  That *is* a bottle-
neck, but it only happens when you ask for it, not after every operation.

What I >don't< know is which computers can't do something like so, or why.