[net.lang.c] Error recovery

john@uw-nsr.UUCP (John Sambrook) (05/29/86)

[]

Regarding error recovery in C compilers, I like the error recovery
provided by the Data General C compiler.  Here is an example of a 
botched program:

	main() {
		int	a = 0	/* missing ";" */

		printf("a: %s\n",  (a == 1) ? "1" : "?"; /* missing ")" */
	}

When compiled the following is written on stderr:

	Error 502 severity 2 beginning on line 4 (Line 4 of file main.c) 
		printf("a: %s\n",  (a == 1) ? "1" : "?";
		^
	Syntax Error.
	A symbol of type ";" has been inserted before this symbol.
	 
	 
	Error 502 severity 2 beginning on line 4 (Line 4 of file main.c) 
		printf("a: %s\n",  (a == 1) ? "1" : "?";
		                                       ^
	Syntax Error.
	A symbol of type ")" has been inserted before this symbol.

In this example the compiler produced a program that executed correctly.

To be fair, both errors are "errors of omission."  I believe, but do not
assert, that these errors are easier to repair than other types of errors.
In the event of serious errors the compiler will cease code generation and
only check the remaining input.  I don't know the parsing method used in
this compiler; it does not seem to suffer from poor error recovery as do 
many recursive-descent parsers.

While on the subject of compilers, I would like to share two other features
of this compiler that I find useful.  I have not found these features in
other C compilers that I have used, although I have heard that the VAX/VMS
C compiler is very good.

The first feature is the ability to generate a stack trace ("traceback") 
in the event of a serious error.  There are two compiler switches that
control the amount of information in a traceback.  The "-Clineid" switch
causes the offending line number to be included while the "-Cprocid" switch 
causes the procedure name to be included.

The second feature is the ability to declare certain data structures as
"read only." This is done via a compiler switch "-R" and applies to all 
data structures that are initialized to a constant value within the 
compilation unit.

Here is an example program that demonstrates both features:

	int a = 1;					/* "read only" */

	main() {
	    	int	b;				/* "read / write" */

		/* this is legal */
		b = a;

		/* prove it */
		printf("a: %d  b: %d\n", a, b);

		/* this is not */
		a = 2;

		/* lie detector */
		printf("Can't happen\n");
	}

This program was compiled with "cc -R -Clineid -Cprocid main.c -o main."
Executing the program produced the following output on stderr:

	a: 1  b: 1
	
	ERROR         71237.
	from line 13 of main.
	
	Call Traceback:
	
	from fp=16000002722,  pc=16001754200,   line 10 of main
	from fp=          0,  pc=16001762472
	
	Hardware protection violation: Write access denied.

In the traceback the phrase "line 10" is because "-Clineid" was 
specified at compile time.  The phrase "of main" is because "-Cprocid" 
was specified at compile time.  Note though that the two "offending"
line numbers differ.  I suspect that this is because the last line to
execute successfully was at line 10; line 13 did not execute successfully
but rather generated a processor trap.

Some people might say that this compiler is too verbose, or that the
features cost too much in terms of execution overhead.  I have not found
this to be the case.  And, while I hate to see any tracebacks, I find
them to be far better than:

	% mumble foo bar
	Segmentation violation - core dumped.
	%

Please note that I do not intend to participate in any "holy wars."
I do not speak for Data General nor do I receive any compensation 
from them.  I just felt that this might be of interest to the readers
of this group.

-- 
John Sambrook				Work: (206) 545-2018
University of Washington WD-12		Home: (206) 487-0180
Seattle, Washington  98195		UUCP: uw-beaver!uw-nsr!john

bright@dataioDataio.UUCP (Walter Bright) (05/29/86)

In article <312@uw-nsr.UUCP> john@uw-nsr.UUCP (John Sambrook) writes:
>While on the subject of compilers, I would like to share two other features
>of this compiler that I find useful.
>The second feature is the ability to declare certain data structures as
>"read only." This is done via a compiler switch "-R" and applies to all 
>data structures that are initialized to a constant value within the 
>compilation unit.
>
>	int a = 1;					/* "read only" */
>	main() {
>	    	int	b;				/* "read / write" */
>
>		b = a;		/* this is legal */
>		a = 2;		/* this is not */
>	}

The declaration for a can be made 'read only' by declaring it as follows:

	const int a = 1;

Doing an assignment to a will then cause a syntax error when compiling.
This is in the draft ANSI C spec.

guy@sun.uucp (Guy Harris) (06/01/86)

> Regarding error recovery in C compilers, I like the error recovery
> provided by the Data General C compiler.  Here is an example of a 
> botched program:
> 
> 	main() {
> 		int	a = 0	/* missing ";" */
> 
> 		printf("a: %s\n",  (a == 1) ? "1" : "?"; /* missing ")" */
> 	}
> 
> When compiled the following is written on stderr:
> 
> 	Error 502 severity 2 beginning on line 4 (Line 4 of file main.c) 
> 		printf("a: %s\n",  (a == 1) ? "1" : "?";
> 		^
> 	Syntax Error.
> 	A symbol of type ";" has been inserted before this symbol.

I think the Berkeley Pascal compiler does the same sort of thing.  I don't
know whether this would be practical for the C compiler or not.

> I don't know the parsing method used in this compiler; it does not seem
> to suffer from poor error recovery as do many recursive-descent parsers.

For what it's worth, many (if not most) UNIX C compilers don't use recursive
descent parsers; they are based on PCC which uses YACC.

> While on the subject of compilers, I would like to share two other features
> of this compiler that I find useful.  I have not found these features in
> other C compilers that I have used, although I have heard that the VAX/VMS
> C compiler is very good.
> 
> The first feature is the ability to generate a stack trace ("traceback") 
> in the event of a serious error.  There are two compiler switches that
> control the amount of information in a traceback.  The "-Clineid" switch
> causes the offending line number to be included while the "-Cprocid" switch 
> causes the procedure name to be included.

You can certainly get the equivalent of that in many UNIX systems; the only
difference is that you go into a debugger when the program drops core and
ask the debugger for a stack trace.  I'm not convinced that getting a
traceback "for free" is any better than going "dbx mumble" or "sdb mumble"
when you get the "Segmentation violation - core dumped" error you so dislike
and asking for a stack trace.  (If you've compiled the program with the "-g"
flag, it will give you line numbers in the stack trace.  It will also permit
you to examine variables in the program.  Merely being told what the call
stack looked like may not be sufficient to enable you to figure out what
happened, so you may have to go into a debugger anyway, even on the DG
system.)

> The second feature is the ability to declare certain data structures as
> "read only." This is done via a compiler switch "-R" and applies to all 
> data structures that are initialized to a constant value within the 
> compilation unit.

Much too crude.  What you want is the feature which will appear in ANSI C,
where you can specify which objects are to be constants and which are not.
The objects which are to be constants can be put in sharable read-only
regions.  The trouble with the "-R" flag as you describe it is that you have
to make sure that the initialized data structures which are to be read-only
must be in separate source files from the ones which are to be read-write.
-- 
	Guy Harris
	{ihnp4, decvax, seismo, decwrl, ...}!sun!guy
	guy@sun.arpa

mike@peregrine.UUCP (Mike Wexler) (06/01/86)

Heres how I do it.
>	% mumble foo bar
>	Segmentation violation - core dumped.
	% dbx mumble
	Reading symbolic information..
	Read 98 Symbols
	(dbx) where
	foo(b = 7), line 15 in "mumble.c"
	main(argc = 1, argv = 16776148, 0xfffbdc), line 5 in "mumble.c"
	(dbx) list 15
	   15	a=*q;
	(dbx) print q
	q = (nil)
	(dbx) quit
The advantage of this is that dbx will give me all the information I want
and not the information that I don't want.

-- 
Mike Wexler
Email address:(trwrb|scgvaxd)!felix!peregrine!mike
Tel Co. address: (714)855-3923
;-) Internet address: ucivax@ucbvax.BERKELY.EDU!ucivax%felix!mike@peregrine :-(

mike@peregrine.UUCP (Mike Wexler) (06/03/86)

In article <1009@dataioDataio.UUCP> bright@dataio.UUCP (Walter Bright writes:
>In article <312@uw-nsr.UUCP> john@uw-nsr.UUCP (John Sambrook) writes:
>>The second feature is the ability to declare certain data structures as
>>"read only." This is done via a compiler switch "-R" and applies to all 
>>data structures that are initialized to a constant value within the 
>>compilation unit.
>The declaration for a can be made 'read only' by declaring it as follows:
>
>	const int a = 1;
>
>Doing an assignment to a will then cause a syntax error when compiling.
>This is in the draft ANSI C spec.
This only works if you have an ANSI C compiler.  Do you?
BTW, the C compiler on the SUN has a -R option that makes initialized variables
shared and read-only.


-- 
Mike Wexler
Email address:(trwrb|scgvaxd)!felix!peregrine!mike
Tel Co. address: (714)855-3923
;-) Internet address: ucivax@ucbvax.BERKELY.EDU!ucivax%felix!mike@peregrine :-(

franka@mmintl.UUCP (Frank Adams) (06/03/86)

In article <312@uw-nsr.UUCP> john@uw-nsr.UUCP writes:
>I don't know the parsing method used in
>this compiler; it does not seem to suffer from poor error recovery as do 
>many recursive-descent parsers.

My impression is that the quality of the error recovery has little to do
with the parsing method used, and a great deal to do with how much effort is
investing in making the error recovery good.

Frank Adams                           ihnp4!philabs!pwa-b!mmintl!franka
Multimate International    52 Oakland Ave North    E. Hartford, CT 06108

john@frog.UUCP (John Woods, Software) (06/04/86)

>>While on the subject of compilers, I would like to share two other features
>>of this compiler that I find useful.
>>
>>The first feature is the ability to generate a stack trace ("traceback")
>>in the event of a serious error.  There are two compiler switches that
>>control the amount of information in a traceback.  The "-Clineid" switch
>>causes the offending line number to be included while the "-Cprocid" switch 
>>causes the procedure name to be included.
> 
I have occaisionally used a trick for programs which only seem to crash when
I am not around to poke through entrails, that of writing a C function that
catches signals and prints an admittedly-crude stack backtrace.  While it
is probably nicer to have compiler assist in doing this (I can remember the
reams of output that a student-oriented ALGOL-60 interpreter once gave), I
still find that if I need much more than the hint that this little backtrace
routine of mine gives, I need all of a debugger, anyway.

By the way, my backtracer is written entirely in C, with the exception of
a call on an assembly routine getfp(), which returns the current frame-pointer
value.  Which, by the way, is another function with a problematical type: it
returns a pointer to a --\
	  ^		 |
	  \______________/ ...  (I could have cheated on the getfp() by
using the address of a parameter, but as long as I had the RIGHT tool...)

--
John Woods, Charles River Data Systems, Framingham MA, (617) 626-1101
...!decvax!frog!john, ...!mit-eddie!jfw, jfw%mit-ccc@MIT-XX.ARPA

"Imagine if every Thursday your shoes exploded if you tied them the usual way.
This happens to us all the time with computers, and nobody thinks of
complaining."
			Jeff Raskin, interviewed in Doctor Dobb's Journal

meissner@dg_rtp.UUCP (Michael Meissner) (06/06/86)

In article <312@uw-nsr.UUCP> john@uw-nsr.UUCP (John Sambrook) writes:
>
>Regarding error recovery in C compilers, I like the error recovery
>provided by the Data General C compiler.  Here is an example of a 
>botched program:
>
>	main() {
>		int	a = 0	/* missing ";" */
>
>		printf("a: %s\n",  (a == 1) ? "1" : "?"; /* missing ")" */
>	}
>
>When compiled the following is written on stderr:
>
>	Error 502 severity 2 beginning on line 4 (Line 4 of file main.c) 
>		printf("a: %s\n",  (a == 1) ? "1" : "?";
>		^
>	Syntax Error.
>	A symbol of type ";" has been inserted before this symbol.
>	 
>	 
>	Error 502 severity 2 beginning on line 4 (Line 4 of file main.c) 
>		printf("a: %s\n",  (a == 1) ? "1" : "?";
>		                                       ^
>	Syntax Error.
>	A symbol of type ")" has been inserted before this symbol.
>
>In this example the compiler produced a program that executed correctly.
>
>To be fair, both errors are "errors of omission."  I believe, but do not
>assert, that these errors are easier to repair than other types of errors.
>In the event of serious errors the compiler will cease code generation and
>only check the remaining input.  I don't know the parsing method used in
>this compiler; it does not seem to suffer from poor error recovery as do 
>many recursive-descent parsers.

    It's a pleasant surprise when somebody says he likes something.  I am
the author of the Data General C compilers.  The parsing method that I use
is a standard LALR parse, based on an internal tool that constructs the tables
from a BNF input grammar.  In comparison to YACC, the tool is not as developer
friendly, ie, it only creates the tables, I have to write the routine that
actually interprets the parse state machine and dispatch on the semantic
actions.  The error recovery routines must also be provided as well.  YACC
on the other hand, encapsulates the parser into the the C program it generates.
It also handles error recovery (badly in my opinion), so that in general, the
user doesn't have to mess with it.  It also means that the user does not really
have the control either.

    The algorithm that I use, which is the first part of Jerry Fisher's (from
SIGPLAN, compiler construction conference) first attempts to insert, delete,
or replace the token that is in error with any of the tokens that are in the
follow set (ie would be possible, legal input), and then parses ahead 3 tokens.
The first parse that will succeed for 3 tokens is selected (the tokens are
given a priority, and tried in priority order).  The second part of Jerry
Fisher's algorithm is a complicated secondary recovery, which I initially
attempted, and gave up because adapting his algorithm to my parser kept coming
up with errors in my translation, or areas where I did not really understand
what is going on deep within the LALR tables.  As near as I can understand
from looking at it, the YACC approach is to discard tokens until it can reduce
from an 'error' production.  It's been my experience that this rarly does what
the compiler writer wants.  As far as local replacement goes, I am currently
thinking of adding another pass that would attempt to glue two tokens together
(to make += out of + and = separated by whitespace).  The priorities are the
hardest thing to get a feeling for, and I still play with them every so often.
As far as secondary recovery goes, my feeling still is that if you ever need
to go to more extereme methods, the program is hopelessly damaged, and I
question whether the programmer gets anything useful after the first few error
messages.

>While on the subject of compilers, I would like to share two other features
>of this compiler that I find useful.  I have not found these features in
>other C compilers that I have used, although I have heard that the VAX/VMS
>C compiler is very good.
>
>The first feature is the ability to generate a stack trace ("traceback") 
>in the event of a serious error.  There are two compiler switches that
>control the amount of information in a traceback.  The "-Clineid" switch
>causes the offending line number to be included while the "-Cprocid" switch 
>causes the procedure name to be included.

There have been a few responses saying dbx/adb gives you the information, if
you compile with -g and look at the core file.  The traceback feature (which
is standard on almost all 32-bit DG compilers) produces smallish tables, which
can be kept in the program file, even when it is shipped to users in production
mode.  We also support -g and dbx.

>The second feature is the ability to declare certain data structures as
>"read only." This is done via a compiler switch "-R" and applies to all 
>data structures that are initialized to a constant value within the 
>compilation unit.

This came from Berkley 4.2 (and 4.3) and was added in attempt to be as
compatible with both 4.2 and system V.2 as we could.  At some point in the
future, when the ANSI X3J11 draft stabilizes to the point of going for public
review, the `const' feature will also allow this without having to set the
option.  The private Data General keyword $shared allows this in the released
revisions.

>John Sambrook				Work: (206) 545-2018
>University of Washington WD-12		Home: (206) 487-0180
>Seattle, Washington  98195		UUCP: uw-beaver!uw-nsr!john

Michael Meissner
Data General Corporation
...{ decvax, ihnp4, ucbvax, ... }!mcnc!rti-sel!dg_rtp!meissner

bright@dataioDataio.UUCP (Walter Bright) (06/09/86)

In article <397@peregrine.UUCP> mike@peregrine.UUCP (Mike Wexler) writes:
>In article <1009@dataioDataio.UUCP> bright@dataio.UUCP (Walter Bright writes:
>>The declaration for a can be made 'read only' by declaring it as follows:
>>	const int a = 1;
>>Doing an assignment to a will then cause a syntax error when compiling.
>>This is in the draft ANSI C spec.
>This only works if you have an ANSI C compiler.  Do you?

I use the Datalight C compiler which does support const and volatile
as defined in the draft standard. No other compiler that I'm aware
of does.