[comp.lang.c] Run-time Checks for C

raghavan@umn-cs.CS.UMN.EDU (Vijay Raghavan) (11/19/88)

   I made a casual statement in a local bulletin board to the effect that
the C language definition doesn't really preclude any implementation from
doing certain run-time checks (for array bounds, type checking, referring
contents of uninitialized pointer variables &c), it's just that most
(okay, all!) implementations don't do any such checking because of efficiency
reasons. Now I'm not sure that this statement is really true (I mean I'm not
sure that sufficient information can always be passed to the compiler for it
to generate code for meaningful run-time checks.) 

   Comments? Treat this as a question of academic interest. Ignore 
considerations of efficiency on architectures which don't support run-time
checking mechanisms. Also assume that all support library functions (in
particular, malloc, calloc etc.) have been written in a way as to support
these checks, wherever possible. 

  Vijay Raghavan

steve@umigw.MIAMI.EDU (steve emmerson) (11/19/88)

In article <10113@umn-cs.CS.UMN.EDU> raghavan@umn-cs.cs.umn.edu (Vijay 
Raghavan) writes:
>
>(I mean I'm not sure that sufficient information can always be passed 
> to the compiler for it to generate code for meaningful run-time checks.) 
>
>   Comments?

In general, a pointer to garbage can easily masquerade as valid.
-- 
Steve Emmerson                     Inet: steve@umigw.miami.edu [128.116.10.1]
SPAN: miami::emmerson (host 3074::)      emmerson%miami.span@star.stanford.edu
UUCP: ...!ncar!umigw!steve               emmerson%miami.span@vlsi.jpl.nasa.gov
"Computers are like God in the Old Testament: lots of rules and no mercy"

bill@twwells.uucp (T. William Wells) (11/20/88)

In article <10113@umn-cs.CS.UMN.EDU> raghavan@umn-cs.cs.umn.edu (Vijay Raghavan) writes:
:
:    I made a casual statement in a local bulletin board to the effect that
: the C language definition doesn't really preclude any implementation from
: doing certain run-time checks (for array bounds, type checking, referring
: contents of uninitialized pointer variables &c), it's just that most
: (okay, all!) implementations don't do any such checking because of efficiency
: reasons. Now I'm not sure that this statement is really true (I mean I'm not
: sure that sufficient information can always be passed to the compiler for it
: to generate code for meaningful run-time checks.)
:
:    Comments?

It is entirely possible to do complete run-time checking; I understand
that there are some C interpreters that do this. It is not cheap,
however.

---
Bill
{uunet|novavax}!proxftl!twwells!bill

tj@mks.UUCP (T. J. Thompson) (11/20/88)

In article <10113@umn-cs.CS.UMN.EDU>, raghavan@umn-cs.CS.UMN.EDU (Vijay Raghavan) writes:
> 
>    I made a casual statement in a local bulletin board to the effect that
> the C language definition doesn't really preclude any implementation from
> doing certain run-time checks (for array bounds, type checking, referring
> contents of uninitialized pointer variables &c), it's just that most
> (okay, all!) implementations don't do any such checking because of efficiency
> reasons. Now I'm not sure that this statement is really true (I mean I'm not
> sure that sufficient information can always be passed to the compiler for it
> to generate code for meaningful run-time checks.) 
>
Consider the following semi-realistic code fragment:

typedef struct {
	int type;
	union {
		int ival;
		double dval;
		struct {
			int hval;
			char name[1];
		} n;
	} v;
} nodeT;

nodeT*
namenode(char* name)
{
	nodeT* np;

	if ((np=(nodeT*)malloc(sizeof(nodeT)+strlen(name))) != NULL) {
		np->type = NAME;
		np->v.n.hval = hash(name);
		(void)strcpy(np->v.n.name, name); /* run-time check here? */
	}
	return (np);
}

I suggest that no amount of cleverness on the part of the compiler and malloc
can limit a pointer derived from np->v.n.name to the precisely correct range
(np->v.n.name[0] .. np->v.n.name[strlen(name)]).

The compiler could limit any pointer derived from np->v.n.name to that value
only, based on the length of the array in the type declaration. This would
cause the strcpy to fail (and would probably break 98% of existing programs).

The pointer derived from np->v.n.name could be limited to the object returned
by malloc (i.e. the limits on np->v.n.name could be inherited from np).
This would permit np->v.n.name[-1]=0; clearly wrong.

There could be a special arrangement to allow the relaxation of limits on
a trailing array member of an aggregate; but consider:

union {
	double d[2];
	int i[2];
} coord;

Then could write coord.i[2]=0; clearly wrong.

I doubt many people would accept the cost of run-time checking if it were
not able to catch the vast majority of errant pointers as soon as possible
(after all, we already get `segmentation violation -- core dumped').
-- 
     ll  // // ,'/~~\'   T. J. Thompson              uunet!watmath!mks!tj
    /ll/// //l' `\\\     Mortice Kern Systems Inc.         (519) 884-2251
   / l //_// ll\___/     35 King St. N., Waterloo, Ont., Can. N2J 2W9
O_/                                long time(); /* know C */

jinli@gpu.utcs.toronto.edu (Jin Li) (11/20/88)

In article <10113@umn-cs.CS.UMN.EDU> raghavan@umn-cs.cs.umn.edu (Vijay Raghavan) writes:
>...
>the C language definition doesn't really preclude any implementation from
>doing certain run-time checks (for array bounds, type checking, referring
>contents of uninitialized pointer variables &c), it's just that most
>(okay, all!) implementations don't do any such checking because of efficiency
>reasons. Now I'm not sure that this statement is really true (I mean I'm not
>sure that sufficient information can always be passed to the compiler for it
>to generate code for meaningful run-time checks.) ...

There are C compiler/interpreter which do run-time checks, but the cost
is too high.  However, do you really want to ride on a $$$$$.$$
tricycle when you can ride on a $ mountain bike?

-- 
		Jin Li			     >>    Gin & Tonic mix well.
University of Toronto Computing Services    << 
jinli@gpu.utcs.utoronto.ca  uunet!utgpu!jinli>>

henry@utzoo.uucp (Henry Spencer) (11/20/88)

In article <10113@umn-cs.CS.UMN.EDU> raghavan@umn-cs.cs.umn.edu (Vijay Raghavan) writes:
>the C language definition doesn't really preclude any implementation from
>doing certain run-time checks (for array bounds, type checking, referring
>contents of uninitialized pointer variables &c), it's just that most
>(okay, all!) implementations don't do any such checking because of efficiency
>reasons. Now I'm not sure that this statement is really true...

It's true; there is at least one debugging-oriented implementation of C
which does full pointer checking (which includes array-bounds checking), for
example.  The efficiency hit is high, unfortunately.
-- 
Sendmail is a bug,             |     Henry Spencer at U of Toronto Zoology
not a feature.                 | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

gwyn@smoke.BRL.MIL (Doug Gwyn ) (11/20/88)

In article <10113@umn-cs.CS.UMN.EDU> raghavan@umn-cs.cs.umn.edu (Vijay Raghavan) writes:
>   I made a casual statement in a local bulletin board to the effect that
>the C language definition doesn't really preclude any implementation from
>doing certain run-time checks (for array bounds, type checking, referring
>contents of uninitialized pointer variables &c), it's just that most
>(okay, all!) implementations don't do any such checking because of efficiency
>reasons. Now I'm not sure that this statement is really true (I mean I'm not
>sure that sufficient information can always be passed to the compiler for it
>to generate code for meaningful run-time checks.) 
>   Comments? Treat this as a question of academic interest. Ignore 
>considerations of efficiency on architectures which don't support run-time
>checking mechanisms. Also assume that all support library functions (in
>particular, malloc, calloc etc.) have been written in a way as to support
>these checks, wherever possible. 

Although few existing implementations of C provide such checks, there are
a few that do.  The one I most remember was (is?) marketed under the name
"Safe C".  The forthcoming C standard is quite specific about what must
be allowed and what is "illegal" (not an official term; officially there
are varieties of behavior called "unspecified", undefined",
"implementation defined", etc.).  Anything that is not definitely allowed
could be diagnosed at compile time or at run time.  To support run-time
checks, an implementation would usually have to access data objects via
descriptors (known to old-timers as "dope vectors"), with a lot of
overhead.  Some hardware allows efficient implementation of things like
null or uninitialized pointer dereferencing, etc.  Unless every data
object can be allocated in a separate segment, there isn't much that can
be done about efficient bounds checking.

chris@mimsy.UUCP (Chris Torek) (11/20/88)

>In article <10113@umn-cs.CS.UMN.EDU> raghavan@umn-cs.cs.umn.edu (Vijay 
>Raghavan) writes:
>>(I mean I'm not sure that sufficient information can always be passed 
>> to the compiler for it to generate code for meaningful run-time checks.) 
>>   Comments?

In article <189@umigw.MIAMI.EDU> steve@umigw.MIAMI.EDU (steve emmerson)
suggests:
>In general, a pointer to garbage can easily masquerade as valid.

While this is true, if the compiler is careful, and does not provide a
way to get directly at the machine%, the compiler and runtime system
can ensure that the program cannot generate a pointer to garbage.  In
fact, it is possible to apply to C systems most of the run-time checks
common in, e.g., Pascal systems.  There are several companies selling
such systems.  Look around at a Usenix or /usr/group vendor show, for
instance.
-----
% This means no assembly escapes, and requires checking all pointer/
integer and pointer/pointer conversions and/or all pointer references.
In addition to a fairly hefty efficiency price-tag, this does, of
course, make the implementation virtually useless for writing device
drivers for conventional hardware.
-----
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

chris@mimsy.UUCP (Chris Torek) (11/21/88)

In article <566@mks.UUCP> tj@mks.UUCP (T. J. Thompson) writes:
>Consider the following semi-realistic code fragment:

(replaced with a simpler version:)

	struct string {
		int hashval;
		char str[1];
	};

	struct string *
	intern(char *s) {
		struct string *sp;

		/* assuming dpANS-conformant malloc() */
		if ((sp = malloc(sizeof(*sp) + strlen(s))) != NULL) {
			sp->hashval = hash(s);
			(void) strcpy(sp->str, s);
		}
		return (sp);
	}

>I suggest that no amount of cleverness on the part of the compiler and malloc
>can limit a pointer derived from np->v.n.name to the precisely correct range
>(np->v.n.name[0] .. np->v.n.name[strlen(name)]).

[here sp->str[0] .. sp->str[strlen(name)]

The compiler *can* check this, in either of two ways: It can limit you to
sp->str[0]..sp->str[0]:

>The compiler could limit any pointer derived from np->v.n.name to that value
>only, based on the length of the array in the type declaration. This would
>cause the strcpy to fail (and would probably break 98% of existing programs).

I think this estimate is rather high.  It would break some programs.

>The pointer derived from np->v.n.name could be limited to the object returned
>by malloc (i.e. the limits on np->v.n.name could be inherited from np).
>This would permit np->v.n.name[-1]=0; clearly wrong.

This is, I think, what existing checking-compilers do.  It is not perfect,
but it works.

>There could be a special arrangement to allow the relaxation of limits on
>a trailing array member of an aggregate ...

This would work without reservation provided the relaxation applies only
to objects allocated with malloc.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

john@uw-nsr.UUCP (John Sambrook) (11/22/88)

In article <10113@umn-cs.CS.UMN.EDU> raghavan@umn-cs.cs.umn.edu (Vijay Raghavan) writes:
>
>   I made a casual statement in a local bulletin board to the effect that
>the C language definition doesn't really preclude any implementation from
>doing certain run-time checks (for array bounds, type checking, referring
>contents of uninitialized pointer variables &c), it's just that most
>(okay, all!) implementations don't do any such checking because of efficiency
>reasons. Now I'm not sure that this statement is really true (I mean I'm not
>sure that sufficient information can always be passed to the compiler for it
>to generate code for meaningful run-time checks.) 

Please note that I have no relationship with Data General Corporation.
I just happen to think they have done a good job on their language
systems products.

The Data General C compiler is an example of a compiler that provides a 
number of facilities for debugging programs, including several run time
checks.  All such options are invoked with -C<option-name>.

Two useful switches are -Clineid and -Cprocid.  They cause the compiler
to include source file line and procedure name information into the 
generated code.  If the program aborts a (stack) traceback is produced
that includes this information.  Very useful.

-Csubcheck causes subscript range checking to be enabled.  At run time
attempts to reference outside of an array are detected and the program
aborts (with a traceback).  When possible the compiler detects these 
errors at compile time. 

-Cpointercheck is useful for catching uses of improperly typed pointers.
While not a fully general mechanism it does catch the types of errors that
cause programs to abort on MV series machines.  

Finally, -Czeroframe causes the compiler to generate code to zero all
local variables when a new activation record (stack frame) is created.
This has been useful from time to time to track down uses of uninitialized
variables.
-- 
John Sambrook                        Internet: john@nsr.bioeng.washington.edu
University of Washington RC-05           UUCP: uw-nsr!john
Seattle, Washington  98195               Dial: (206) 548-4386

throopw@xyzzy.UUCP (Wayne A. Throop) (11/24/88)

> raghavan@umn-cs.CS.UMN.EDU (Vijay Raghavan)
> the C language definition doesn't really preclude any implementation from
> doing certain run-time checks (for array bounds, type checking, referring
> contents of uninitialized pointer variables &c), it's just that most
> (okay, all!) implementations don't do any such checking because of efficiency
> reasons.

Not all.  Saber-C and (possibly misremembered) Integral-C are examples
of C language systems that do extensive run-time checking.  

> Now I'm not sure that this statement is really true (I mean I'm not
> sure that sufficient information can always be passed to the compiler for it
> to generate code for meaningful run-time checks.)

The compiler can generate checks, but for C (unlike other languages) this
involves having pointers carry around valid range limit information.  Thus,
the performance penalty is pretty severe.  Further, languages such as
Pascal and Modula allow a much better modeling of intent by use of subranges,
so that some errors can be found much sooner than they would be found in
C (if they would be found at all).  But even with C's handicaps in this
regard, a range-checking version of C is a very, VERY valuable tool to use
in conjunction with exhaustive excersize of the software.

Further, if your C language system won't perform range or other runtime
checks optionally, (or for that matter... even if it does) you should
insert sanity-checking assertions into the code to use during unit and
integration testing.  It makes finding the source of errors SO much
easier, and costs so little.  It also nicely documents the assumptions
made by the code.

--
The use of COBOL cripples the mind;  its teaching should, therefore,
be regarded as a criminal offense.
                                        --- Edsgar Dijkstra
-- 
Wayne Throop      <the-known-world>!mcnc!rti!xyzzy!throopw

pja@ralph.UUCP (Pete Alleman) (11/27/88)

In article <10113@umn-cs.CS.UMN.EDU> raghavan@umn-cs.cs.umn.edu (Vijay Raghavan) writes:
>the C language definition doesn't really preclude any implementation from
>doing certain run-time checks (for array bounds, type checking, referring
>contents of uninitialized pointer variables &c),

The real problem with bounds checking in C is that the implementation is
difficult (if not impossible).  Most high-level languages allow only very
limited pointer arithmetic (array indexing on arrays with known bounds).
Bounds checking in C might be possible is a pointer contained 3 values:
The memory address, an upper bound, and a lower bound.  Pointer arithmetic
would modify only the current value.  Assignment would copy all 3 values.
Can anyone find a flaw in this type of implementation?

>it's just that most
>(okay, all!) implementations don't do any such checking because of efficiency
>reasons. 

I vaguely remember seeing a flyer for a C interpreter that claimed to
check bounds.

-- 
Pete Alleman
	ralph!pja or
	digitran!pja

Drool@cup.portal.com (Paul James Coene) (11/29/88)

Sorry, no included article as I'm not used to my new system for postinf...
 
In a mention of bounds checking, a reference was made to a C interpreter
that does boundary checking, etc.  We are using such a beast at work.  Its
called Saber C, and does seem to do this type of checking.  Because of the
diffi
difficulty of the task, however, this checking can become tedious.  Not
only are bounds checked, but also usage checks are made based on how a 
ssection of code was allocated.  This makes applying "skeleton" structures
upon data areas very tedious, as all access, even when properly cast,
cause run time warnings. 
 
In general, Saber does a good job of load time (linty) type checks, and
mmany run time checks.  If anyone is interested, I'll post their Mail 
addreess.