raghavan@umn-cs.CS.UMN.EDU (Vijay Raghavan) (11/19/88)
I made a casual statement in a local bulletin board to the effect that the C language definition doesn't really preclude any implementation from doing certain run-time checks (for array bounds, type checking, referring contents of uninitialized pointer variables &c), it's just that most (okay, all!) implementations don't do any such checking because of efficiency reasons. Now I'm not sure that this statement is really true (I mean I'm not sure that sufficient information can always be passed to the compiler for it to generate code for meaningful run-time checks.) Comments? Treat this as a question of academic interest. Ignore considerations of efficiency on architectures which don't support run-time checking mechanisms. Also assume that all support library functions (in particular, malloc, calloc etc.) have been written in a way as to support these checks, wherever possible. Vijay Raghavan
steve@umigw.MIAMI.EDU (steve emmerson) (11/19/88)
In article <10113@umn-cs.CS.UMN.EDU> raghavan@umn-cs.cs.umn.edu (Vijay Raghavan) writes: > >(I mean I'm not sure that sufficient information can always be passed > to the compiler for it to generate code for meaningful run-time checks.) > > Comments? In general, a pointer to garbage can easily masquerade as valid. -- Steve Emmerson Inet: steve@umigw.miami.edu [128.116.10.1] SPAN: miami::emmerson (host 3074::) emmerson%miami.span@star.stanford.edu UUCP: ...!ncar!umigw!steve emmerson%miami.span@vlsi.jpl.nasa.gov "Computers are like God in the Old Testament: lots of rules and no mercy"
bill@twwells.uucp (T. William Wells) (11/20/88)
In article <10113@umn-cs.CS.UMN.EDU> raghavan@umn-cs.cs.umn.edu (Vijay Raghavan) writes:
:
: I made a casual statement in a local bulletin board to the effect that
: the C language definition doesn't really preclude any implementation from
: doing certain run-time checks (for array bounds, type checking, referring
: contents of uninitialized pointer variables &c), it's just that most
: (okay, all!) implementations don't do any such checking because of efficiency
: reasons. Now I'm not sure that this statement is really true (I mean I'm not
: sure that sufficient information can always be passed to the compiler for it
: to generate code for meaningful run-time checks.)
:
: Comments?
It is entirely possible to do complete run-time checking; I understand
that there are some C interpreters that do this. It is not cheap,
however.
---
Bill
{uunet|novavax}!proxftl!twwells!bill
tj@mks.UUCP (T. J. Thompson) (11/20/88)
In article <10113@umn-cs.CS.UMN.EDU>, raghavan@umn-cs.CS.UMN.EDU (Vijay Raghavan) writes: > > I made a casual statement in a local bulletin board to the effect that > the C language definition doesn't really preclude any implementation from > doing certain run-time checks (for array bounds, type checking, referring > contents of uninitialized pointer variables &c), it's just that most > (okay, all!) implementations don't do any such checking because of efficiency > reasons. Now I'm not sure that this statement is really true (I mean I'm not > sure that sufficient information can always be passed to the compiler for it > to generate code for meaningful run-time checks.) > Consider the following semi-realistic code fragment: typedef struct { int type; union { int ival; double dval; struct { int hval; char name[1]; } n; } v; } nodeT; nodeT* namenode(char* name) { nodeT* np; if ((np=(nodeT*)malloc(sizeof(nodeT)+strlen(name))) != NULL) { np->type = NAME; np->v.n.hval = hash(name); (void)strcpy(np->v.n.name, name); /* run-time check here? */ } return (np); } I suggest that no amount of cleverness on the part of the compiler and malloc can limit a pointer derived from np->v.n.name to the precisely correct range (np->v.n.name[0] .. np->v.n.name[strlen(name)]). The compiler could limit any pointer derived from np->v.n.name to that value only, based on the length of the array in the type declaration. This would cause the strcpy to fail (and would probably break 98% of existing programs). The pointer derived from np->v.n.name could be limited to the object returned by malloc (i.e. the limits on np->v.n.name could be inherited from np). This would permit np->v.n.name[-1]=0; clearly wrong. There could be a special arrangement to allow the relaxation of limits on a trailing array member of an aggregate; but consider: union { double d[2]; int i[2]; } coord; Then could write coord.i[2]=0; clearly wrong. I doubt many people would accept the cost of run-time checking if it were not able to catch the vast majority of errant pointers as soon as possible (after all, we already get `segmentation violation -- core dumped'). -- ll // // ,'/~~\' T. J. Thompson uunet!watmath!mks!tj /ll/// //l' `\\\ Mortice Kern Systems Inc. (519) 884-2251 / l //_// ll\___/ 35 King St. N., Waterloo, Ont., Can. N2J 2W9 O_/ long time(); /* know C */
jinli@gpu.utcs.toronto.edu (Jin Li) (11/20/88)
In article <10113@umn-cs.CS.UMN.EDU> raghavan@umn-cs.cs.umn.edu (Vijay Raghavan) writes: >... >the C language definition doesn't really preclude any implementation from >doing certain run-time checks (for array bounds, type checking, referring >contents of uninitialized pointer variables &c), it's just that most >(okay, all!) implementations don't do any such checking because of efficiency >reasons. Now I'm not sure that this statement is really true (I mean I'm not >sure that sufficient information can always be passed to the compiler for it >to generate code for meaningful run-time checks.) ... There are C compiler/interpreter which do run-time checks, but the cost is too high. However, do you really want to ride on a $$$$$.$$ tricycle when you can ride on a $ mountain bike? -- Jin Li >> Gin & Tonic mix well. University of Toronto Computing Services << jinli@gpu.utcs.utoronto.ca uunet!utgpu!jinli>>
henry@utzoo.uucp (Henry Spencer) (11/20/88)
In article <10113@umn-cs.CS.UMN.EDU> raghavan@umn-cs.cs.umn.edu (Vijay Raghavan) writes: >the C language definition doesn't really preclude any implementation from >doing certain run-time checks (for array bounds, type checking, referring >contents of uninitialized pointer variables &c), it's just that most >(okay, all!) implementations don't do any such checking because of efficiency >reasons. Now I'm not sure that this statement is really true... It's true; there is at least one debugging-oriented implementation of C which does full pointer checking (which includes array-bounds checking), for example. The efficiency hit is high, unfortunately. -- Sendmail is a bug, | Henry Spencer at U of Toronto Zoology not a feature. | uunet!attcan!utzoo!henry henry@zoo.toronto.edu
gwyn@smoke.BRL.MIL (Doug Gwyn ) (11/20/88)
In article <10113@umn-cs.CS.UMN.EDU> raghavan@umn-cs.cs.umn.edu (Vijay Raghavan) writes: > I made a casual statement in a local bulletin board to the effect that >the C language definition doesn't really preclude any implementation from >doing certain run-time checks (for array bounds, type checking, referring >contents of uninitialized pointer variables &c), it's just that most >(okay, all!) implementations don't do any such checking because of efficiency >reasons. Now I'm not sure that this statement is really true (I mean I'm not >sure that sufficient information can always be passed to the compiler for it >to generate code for meaningful run-time checks.) > Comments? Treat this as a question of academic interest. Ignore >considerations of efficiency on architectures which don't support run-time >checking mechanisms. Also assume that all support library functions (in >particular, malloc, calloc etc.) have been written in a way as to support >these checks, wherever possible. Although few existing implementations of C provide such checks, there are a few that do. The one I most remember was (is?) marketed under the name "Safe C". The forthcoming C standard is quite specific about what must be allowed and what is "illegal" (not an official term; officially there are varieties of behavior called "unspecified", undefined", "implementation defined", etc.). Anything that is not definitely allowed could be diagnosed at compile time or at run time. To support run-time checks, an implementation would usually have to access data objects via descriptors (known to old-timers as "dope vectors"), with a lot of overhead. Some hardware allows efficient implementation of things like null or uninitialized pointer dereferencing, etc. Unless every data object can be allocated in a separate segment, there isn't much that can be done about efficient bounds checking.
chris@mimsy.UUCP (Chris Torek) (11/20/88)
>In article <10113@umn-cs.CS.UMN.EDU> raghavan@umn-cs.cs.umn.edu (Vijay >Raghavan) writes: >>(I mean I'm not sure that sufficient information can always be passed >> to the compiler for it to generate code for meaningful run-time checks.) >> Comments? In article <189@umigw.MIAMI.EDU> steve@umigw.MIAMI.EDU (steve emmerson) suggests: >In general, a pointer to garbage can easily masquerade as valid. While this is true, if the compiler is careful, and does not provide a way to get directly at the machine%, the compiler and runtime system can ensure that the program cannot generate a pointer to garbage. In fact, it is possible to apply to C systems most of the run-time checks common in, e.g., Pascal systems. There are several companies selling such systems. Look around at a Usenix or /usr/group vendor show, for instance. ----- % This means no assembly escapes, and requires checking all pointer/ integer and pointer/pointer conversions and/or all pointer references. In addition to a fairly hefty efficiency price-tag, this does, of course, make the implementation virtually useless for writing device drivers for conventional hardware. ----- -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@mimsy.umd.edu Path: uunet!mimsy!chris
chris@mimsy.UUCP (Chris Torek) (11/21/88)
In article <566@mks.UUCP> tj@mks.UUCP (T. J. Thompson) writes: >Consider the following semi-realistic code fragment: (replaced with a simpler version:) struct string { int hashval; char str[1]; }; struct string * intern(char *s) { struct string *sp; /* assuming dpANS-conformant malloc() */ if ((sp = malloc(sizeof(*sp) + strlen(s))) != NULL) { sp->hashval = hash(s); (void) strcpy(sp->str, s); } return (sp); } >I suggest that no amount of cleverness on the part of the compiler and malloc >can limit a pointer derived from np->v.n.name to the precisely correct range >(np->v.n.name[0] .. np->v.n.name[strlen(name)]). [here sp->str[0] .. sp->str[strlen(name)] The compiler *can* check this, in either of two ways: It can limit you to sp->str[0]..sp->str[0]: >The compiler could limit any pointer derived from np->v.n.name to that value >only, based on the length of the array in the type declaration. This would >cause the strcpy to fail (and would probably break 98% of existing programs). I think this estimate is rather high. It would break some programs. >The pointer derived from np->v.n.name could be limited to the object returned >by malloc (i.e. the limits on np->v.n.name could be inherited from np). >This would permit np->v.n.name[-1]=0; clearly wrong. This is, I think, what existing checking-compilers do. It is not perfect, but it works. >There could be a special arrangement to allow the relaxation of limits on >a trailing array member of an aggregate ... This would work without reservation provided the relaxation applies only to objects allocated with malloc. -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@mimsy.umd.edu Path: uunet!mimsy!chris
john@uw-nsr.UUCP (John Sambrook) (11/22/88)
In article <10113@umn-cs.CS.UMN.EDU> raghavan@umn-cs.cs.umn.edu (Vijay Raghavan) writes: > > I made a casual statement in a local bulletin board to the effect that >the C language definition doesn't really preclude any implementation from >doing certain run-time checks (for array bounds, type checking, referring >contents of uninitialized pointer variables &c), it's just that most >(okay, all!) implementations don't do any such checking because of efficiency >reasons. Now I'm not sure that this statement is really true (I mean I'm not >sure that sufficient information can always be passed to the compiler for it >to generate code for meaningful run-time checks.) Please note that I have no relationship with Data General Corporation. I just happen to think they have done a good job on their language systems products. The Data General C compiler is an example of a compiler that provides a number of facilities for debugging programs, including several run time checks. All such options are invoked with -C<option-name>. Two useful switches are -Clineid and -Cprocid. They cause the compiler to include source file line and procedure name information into the generated code. If the program aborts a (stack) traceback is produced that includes this information. Very useful. -Csubcheck causes subscript range checking to be enabled. At run time attempts to reference outside of an array are detected and the program aborts (with a traceback). When possible the compiler detects these errors at compile time. -Cpointercheck is useful for catching uses of improperly typed pointers. While not a fully general mechanism it does catch the types of errors that cause programs to abort on MV series machines. Finally, -Czeroframe causes the compiler to generate code to zero all local variables when a new activation record (stack frame) is created. This has been useful from time to time to track down uses of uninitialized variables. -- John Sambrook Internet: john@nsr.bioeng.washington.edu University of Washington RC-05 UUCP: uw-nsr!john Seattle, Washington 98195 Dial: (206) 548-4386
throopw@xyzzy.UUCP (Wayne A. Throop) (11/24/88)
> raghavan@umn-cs.CS.UMN.EDU (Vijay Raghavan) > the C language definition doesn't really preclude any implementation from > doing certain run-time checks (for array bounds, type checking, referring > contents of uninitialized pointer variables &c), it's just that most > (okay, all!) implementations don't do any such checking because of efficiency > reasons. Not all. Saber-C and (possibly misremembered) Integral-C are examples of C language systems that do extensive run-time checking. > Now I'm not sure that this statement is really true (I mean I'm not > sure that sufficient information can always be passed to the compiler for it > to generate code for meaningful run-time checks.) The compiler can generate checks, but for C (unlike other languages) this involves having pointers carry around valid range limit information. Thus, the performance penalty is pretty severe. Further, languages such as Pascal and Modula allow a much better modeling of intent by use of subranges, so that some errors can be found much sooner than they would be found in C (if they would be found at all). But even with C's handicaps in this regard, a range-checking version of C is a very, VERY valuable tool to use in conjunction with exhaustive excersize of the software. Further, if your C language system won't perform range or other runtime checks optionally, (or for that matter... even if it does) you should insert sanity-checking assertions into the code to use during unit and integration testing. It makes finding the source of errors SO much easier, and costs so little. It also nicely documents the assumptions made by the code. -- The use of COBOL cripples the mind; its teaching should, therefore, be regarded as a criminal offense. --- Edsgar Dijkstra -- Wayne Throop <the-known-world>!mcnc!rti!xyzzy!throopw
pja@ralph.UUCP (Pete Alleman) (11/27/88)
In article <10113@umn-cs.CS.UMN.EDU> raghavan@umn-cs.cs.umn.edu (Vijay Raghavan) writes: >the C language definition doesn't really preclude any implementation from >doing certain run-time checks (for array bounds, type checking, referring >contents of uninitialized pointer variables &c), The real problem with bounds checking in C is that the implementation is difficult (if not impossible). Most high-level languages allow only very limited pointer arithmetic (array indexing on arrays with known bounds). Bounds checking in C might be possible is a pointer contained 3 values: The memory address, an upper bound, and a lower bound. Pointer arithmetic would modify only the current value. Assignment would copy all 3 values. Can anyone find a flaw in this type of implementation? >it's just that most >(okay, all!) implementations don't do any such checking because of efficiency >reasons. I vaguely remember seeing a flyer for a C interpreter that claimed to check bounds. -- Pete Alleman ralph!pja or digitran!pja
Drool@cup.portal.com (Paul James Coene) (11/29/88)
Sorry, no included article as I'm not used to my new system for postinf... In a mention of bounds checking, a reference was made to a C interpreter that does boundary checking, etc. We are using such a beast at work. Its called Saber C, and does seem to do this type of checking. Because of the diffi difficulty of the task, however, this checking can become tedious. Not only are bounds checked, but also usage checks are made based on how a ssection of code was allocated. This makes applying "skeleton" structures upon data areas very tedious, as all access, even when properly cast, cause run time warnings. In general, Saber does a good job of load time (linty) type checks, and mmany run time checks. If anyone is interested, I'll post their Mail addreess.