kuan@iris.ucdavis.edu (Frank [Who me?] Kuan) (08/25/90)
Why is it that most C compilers don't seem to support this nifty little feature? I'm working on a large project, and two of the worst debugging nightmares I've had were due to memory being overwritten from over indexing an array. I was thinking about writing some kind of preprocessor to check for this. Has anyone already written something like this? Thanks in Advance.
chris@mimsy.umd.edu (Chris Torek) (08/25/90)
In article <7611@ucdavis.ucdavis.edu> kuan@iris.ucdavis.edu (Frank [Who me?] Kuan) writes: >Why is it that most C compilers don't seem to [check array bounds]? Mostly because it is hard. Given `int *p', is `p[-1] = 3' valid? That depends on the value of p.... There is a company called Saber that produces a product called Saber-C that does this and more. It works quite well, although last I had heard it still objected to `&arr[sizeof arr/sizeof *arr]', which is Officially Legal. (Fortunately you can turn off each individual objection.) -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 405 2750) Domain: chris@cs.umd.edu Path: uunet!mimsy!chris
vu0310@bingvaxu.cc.binghamton.edu (R. Kym Horsell) (08/26/90)
In article <7611@ucdavis.ucdavis.edu> kuan@iris.ucdavis.edu (Frank [Who me?] Kuan) writes: > > Why is it that most C compilers don't seem to support this > nifty little feature? I guess this isn't usually included because (a) array indexing is subsumed by pointer arithmetic & this is *much* harder (i.e. impossible in general) to check; (b) arrays can be declared with no bounds, i.e. extern long arr[]; which implies either a smart linker and/or runtime support for array descriptions -- the antithesis of C (c) is is easy enough to do it yourself with macros: extern Thingy arr_[]; #define arr(i) arr_[chkbnds(i,0,max_ind_of_arr_)] int chkbnds(ind,lwb,upb) { if(ind>=lwb && ind<=upb) return ind; /* chunder */ exit(-1); } (note that we need a routine here so ``ind'', which may include side-effects, doesn't get evaluated twice). -Kym Horsell
henry@zoo.toronto.edu (Henry Spencer) (08/26/90)
In article <26196@mimsy.umd.edu> chris@mimsy.umd.edu (Chris Torek) writes: >>Why is it that most C compilers don't seem to [check array bounds]? > >Mostly because it is hard. Given `int *p', is `p[-1] = 3' valid? >That depends on the value of p.... Given the all-pervasive nature of pointers in C, about the only way you can do bounds checking is to have each pointer haul along the range of legal subscripts for itself. This gets tricky in spots but is feasible. Unfortunately, it imposes a lot of overhead on every pointer manipulation, so it's badly unsuited to anything but a specialized debugging compiler, and we don't have many of those. Personally, I've long had a theory that doing this sort of checking at compile time rather than run time ought to be feasible -- any competent programmer takes care to avoid overrunning arrays, and given the limits of the human mind, it ought to be possible for the compiler to duplicate this reasoning, possibly with some help -- but it will not be easy. -- Committees do harm merely by existing. | Henry Spencer at U of Toronto Zoology -Freeman Dyson | henry@zoo.toronto.edu utzoo!henry
browns@iccgcc.decnet.ab.com (Stan Brown, Oak Road Systems) (08/26/90)
In article <26196@mimsy.umd.edu>, chris@mimsy.umd.edu (Chris Torek) writes: (in response to aa query about array-bounds checking in C) > There is a company called Saber that produces a product called > Saber-C that does this and more. It works quite well, although last > I had heard it still objected to `&arr[sizeof arr/sizeof *arr]', > which is Officially Legal. Just to stick in my $.02: int arr[4]; sizeof arr/sizeof *arr is 4, so &arr[4] may not be legal. Whether legal or not, the address is certainly outside the array. Stan Brown, Oak Road Systems, Cleveland, Ohio, U.S.A. (216) 371-0043 The opinions expressed are mine. Mine alone! Nobody else is responsible for them or even endorses them--except my cat Dexter, and he signed the power of attorney only under my threat to cut off his Cat Chow!
henry@zoo.toronto.edu (Henry Spencer) (08/26/90)
In article <619.26d6cfb2@iccgcc.decnet.ab.com> browns@iccgcc.decnet.ab.com (Stan Brown, Oak Road Systems) writes: >int arr[4]; > >sizeof arr/sizeof *arr is 4, so &arr[4] may not be legal. Whether legal or >not, the address is certainly outside the array. It is, however, legal, by special dispensation of ANSI C. You can't dereference it, but it is required to behave properly and participate properly in comparisons and arithmetic. Note, this only applies to the address "one after" the end of the array -- both "two after" the end and "one before" the beginning of the array take you into the twilight zone of undefined behavior. -- Committees do harm merely by existing. | Henry Spencer at U of Toronto Zoology -Freeman Dyson | henry@zoo.toronto.edu utzoo!henry
libes@cme.nist.gov (Don Libes) (08/27/90)
In article <26196@mimsy.umd.edu> chris@mimsy.umd.edu (Chris Torek) writes: >In article <7611@ucdavis.ucdavis.edu> kuan@iris.ucdavis.edu >(Frank [Who me?] Kuan) writes: >>Why is it that most C compilers don't seem to [check array bounds]? > >Mostly because it is hard. Given `int *p', is `p[-1] = 3' valid? >That depends on the value of p.... > >There is a company called Saber that produces a product called >Saber-C that does this and more. It works quite well, although last >I had heard it still objected to `&arr[sizeof arr/sizeof *arr]', >which is Officially Legal. (Fortunately you can turn off each >individual objection.) Saber objects to a lot of things that are legal, but then, so does lint. And in most cases, it makes sense to use one of their directives to explicitly disable the objection. Saber complains about some things that I think it shouldn't to begin with, but again so does lint. I highly recommend Saber. I don't use it all the time - we have a limited number of licenses here - but when lint and the debugger fail me, I pull out Saber. It's pretty damn useful. Don Libes libes@cme.nist.gov ...!uunet!cme-durer!libes
spee@qmfl.jrdc.go.jp (Paul SPEE) (08/27/90)
In article <7611@ucdavis.ucdavis.edu> kuan@iris.ucdavis.edu (Frank [Who me?] Kuan) writes: > Why is it that most C compilers don't seem to support this > nifty little feature? To be able to check the array boundaries, the C compiler must now the array size. However, in most important cases the C compiler does not have this information. This can be either be the case when an array is passed as a function parameter or is allocated as a dynamic array. It would have been convenient if ANSI would have allowed 'pointers to variable size arrays'. For example, int array[10]; int * f(n, a) int n; int (*a)[n]; { register i; int (*b)[n] = (void *) malloc(n * sizeof(int)); for (i = 0; i < n; i++) (*b)[i] = (*a)[i]; return (int *) b; } main() { f(sizeof(array), array); } (Note: gcc allows this.) The same problem is true not only for runtime checking but also for parallizing scientific code. See %A Randy Allen %A Steve Johnson %T Compiling C for Vectorization, Parallelization, and Inline Expandsion %J Proceedings of the SIGPLAN '88 Conference on Programming Language Design and Implementation %D June 22-24, 1988 %C Atlanta, Georgia %P 241-249 %L ALLEN88 Paul Spee
chris@mimsy.umd.edu (Chris Torek) (08/27/90)
In article <26196@mimsy.umd.edu> I wrote: >>... it still objected to `&arr[sizeof arr/sizeof *arr]', >>which is Officially Legal. In article <619.26d6cfb2@iccgcc.decnet.ab.co> browns@iccgcc.decnet.ab.com (Stan Brown, Oak Road Systems) writes: >Just to stick in my $.02: > >int arr[4]; > >sizeof arr/sizeof *arr is 4, so &arr[4] may not be legal. Whether legal or >not, the address is certainly outside the array. See ANSI standard X3.159-1989, section 3.3.6, `additive operators'. (I might have included this before but I was at home.) As I said, it is Officially Legal. -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 405 2750) Domain: chris@cs.umd.edu Path: uunet!mimsy!chris
henry@zoo.toronto.edu (Henry Spencer) (08/28/90)
In article <6055@muffin.cme.nist.gov> libes@cme.nist.gov (Don Libes) writes: >I highly recommend Saber. I don't use it all the time - we have a >limited number of licenses here - but when lint and the debugger fail >me, I pull out Saber. It's pretty damn useful. I concur. We don't have Saber C here -- limited need and tight budget -- but I had a chance to try it out briefly, and was impressed. My only real unhappiness was that it's obvious that all its authors use the C shell, because it screws up in various small ways when you run it from the standard shell. -- Committees do harm merely by existing. | Henry Spencer at U of Toronto Zoology -Freeman Dyson | henry@zoo.toronto.edu utzoo!henry
steve@taumet.com (Stephen Clamage) (08/29/90)
spee@qmfl.jrdc.go.jp (Paul SPEE) writes: >To be able to check the array boundaries, the C compiler must now the >array size. However, in most important cases the C compiler does not >have this information. This can be either be the case when an array >is passed as a function parameter or is allocated as a dynamic array. >It would have been convenient if ANSI would have allowed 'pointers >to variable size arrays'. For example, There is nothing to prevent the C compiler from carrying around enough information with arrays and pointers to detect those problems at runtime which cannot be found at compile time. Such a compiler could still be ANSI-conforming. When an array was declared, the compiler would allocate extra space, say, just before the beginning of the array to contain size information. Pointers would be larger than a plain address to contain similar information. Every array reference and pointer dereference would then be checked for bounds violation, at compile time if possible, at run time otherwise. This approach is in fact implemented in some compilers. -- Steve Clamage, TauMetric Corp, steve@taumet.com
george@hls0.hls.oz (George Turczynski) (08/30/90)
In article <7611@ucdavis.ucdavis.edu>, kuan@iris.ucdavis.edu (Frank [Who me?] Kuan) writes: > > Why is it that most C compilers don't seem to support this > nifty little feature? Because most C compilers assume you know what you're doing :-) > I'm working on a large project, and two of the worst debugging > nightmares I've had were due to memory being overwritten from > over indexing an array. Array bounds, along with typos, should be one of the first things you check when looking for (discrete) bugs, though I guess you know that now ! > I was thinking about writing some kind of preprocessor to > check for this. Has anyone already written something like > this? What is it that you want ? Something to check your array indexing variables at compile-time ? In most cases, this is _possible_, but in others it's impossible. If you want code generated to check array indexing at run-time, then your code will run slower, which is why C compilers don't do it. We want the fastest possible code we can get ! If you want these sorts of features, use PASCAL :-) -- | George P. J. Turczynski. |---------------------------------------------------- | Computer Systems Engineer. | ACSnet: george@highland.oz | I can't speak for the | | Highland Logic Pty. Ltd. | Phone: +61 48 683490 | company, I can barely | | Suite 1, 348-354 Argyle St | Fax: +61 48 683474 | speak for myself... | | Moss Vale. NSW. Australia. 2577 |----------------------------------------------------
cjr@cs.bham.ac.uk (Chris Ridd <RiddCJ>) (08/30/90)
In article <26196@mimsy.umd.edu> chris@mimsy.umd.edu (Chris Torek) writes: >In article <7611@ucdavis.ucdavis.edu> kuan@iris.ucdavis.edu >(Frank [Who me?] Kuan) writes: >>Why is it that most C compilers don't seem to [check array bounds]? > >Mostly because it is hard. Given `int *p', is `p[-1] = 3' valid? >That depends on the value of p.... > >There is a company called Saber that produces a product called >Saber-C that does this and more. It works quite well, although last >I had heard it still objected to `&arr[sizeof arr/sizeof *arr]', >which is Officially Legal. (Fortunately you can turn off each >individual objection.) Why is this? I never could figure out why accessing the first element *past* the end of an array should be legal. Chris -- Chris Ridd, Computer Science, Birmingham Uni, UK -- RiddCJ@Cs.Bham.Ac.Uk -- "'It's going to look pretty good, then, isn't it,' said War testily, 'the One Horseman and Three Pedestrians of the Apocralypse.'" - Sourcery
njk@diku.dk (Niels J|rgen Kruse) (08/30/90)
steve@taumet.com (Stephen Clamage) writes: >There is nothing to prevent the C compiler from carrying around enough >information with arrays and pointers to detect those problems at runtime >which cannot be found at compile time. (...) >if possible, at run time otherwise. This approach is in fact implemented >in some compilers. >-- Oh. Which ones? Assume the following code: char *a,*c; double *b,d[17/sizeof(double)]; if (a = malloc (17)) { b = (double *)a; c = (char *)b; /* A */ } At location A, a[16] is of course legal and a + 17 is computable but not dereferenceable. Also, it is obvious that b[17/sizeof(double) - 1] is legal and b + 17/sizeof(double) is computable but not dereferenceable. But what about c? Is c[16] legal? Note that c[16] does not constitute part of any double within bounds of b (unless sizeof(double) == 17 or 1). What kind of object is b pointing to? How does it differ from the object pointed to by (d+0)? What does your bounds-checking C compiler have to say? What does the standard say? -- Niels J|rgen Kruse DIKU Graduate njk@diku.dk
karl@haddock.ima.isc.com (Karl Heuer) (08/31/90)
In article <1990Aug30.134537.26326@diku.dk> njk@diku.dk (Niels J|rgen Kruse) writes: >Assume the following code [on a bounds-checking implementation]: > char *a,*c; double *b,d[17/sizeof(double)]; > if (a = malloc (17)) { > b = (double *)a; > c = (char *)b; >Is c[16] legal? I believe it is, and therefore that the cast to (double *) must not actually reduce the known range of the pointer to that which is pointable from a double. Thus, a bounds-checking C implementation must maintain the bounds of a pointer by using a byte count (or byte pointer) rather than an object count (or object pointer). >What kind of object is b pointing to? How does it differ from >the object pointed to by (d+0)? Assume for concreteness that sizeof(double)==8. Then b is <double *, pointer to beginning of 17-byte block>, which is room for 2 doubles plus a spare byte at the end that cannot be referenced without casting b. But d is <double *, pointer to beginning of 16-byte block>, which is room for 2 doubles exactly. >What does your bounds-checking C compiler have to say? >What does the standard say? This is my interpretation of the Standard. I don't have a bounds-checking C compiler at hand, and I wonder if it would get this right. (Particularly on a word-addressible architecture.) Karl W. Z. Heuer (karl@kelp.ima.isc.com or ima!kelp!karl), The Walking Lint
gdtltr@freezer.it.udel.edu (Gary Duzan) (08/31/90)
In article <988@christopher-robin.cs.bham.ac.uk> cjr@christopher-robin.UUCP (Chris Ridd <RiddCJ>) writes: =>In article <26196@mimsy.umd.edu> chris@mimsy.umd.edu (Chris Torek) writes: =>>In article <7611@ucdavis.ucdavis.edu> kuan@iris.ucdavis.edu =>>(Frank [Who me?] Kuan) writes: =>>>Why is it that most C compilers don't seem to [check array bounds]? =>> =>>Mostly because it is hard. Given `int *p', is `p[-1] = 3' valid? =>>That depends on the value of p.... =>> =>>There is a company called Saber that produces a product called =>>Saber-C that does this and more. It works quite well, although last =>>I had heard it still objected to `&arr[sizeof arr/sizeof *arr]', =>>which is Officially Legal. (Fortunately you can turn off each =>>individual objection.) => => Why is this? I never could figure out why accessing the first =>element *past* the end of an array should be legal. => Correct me if I am wrong, but I don't believe accessing the element after is legal, but the pointer is still legal. In other words: int x,foo[foolen],*fooptr; x=foo[foolen]; /* Illegal */ fooptr=foo+foolen; /* Legal, points one int past end of foo */ x=*fooptr; /* Illegal */ --fooptr; /* Legal, points to last element of foo */ Gary Duzan Time Lord Third Regeneration -- gdtltr@freezer.it.udel.edu _o_ -------------------------- _o_ [|o o|] If you can square, round, or cube a number, why not sphere it? [|o o|] |_O_| "Don't listen to me; I never do." -- Doctor Who |_O_|
richard@aiai.ed.ac.uk (Richard Tobin) (08/31/90)
In article <988@christopher-robin.cs.bham.ac.uk> cjr@christopher-robin.UUCP (Chris Ridd <RiddCJ>) writes: >>I had heard it still objected to `&arr[sizeof arr/sizeof *arr]', >>which is Officially Legal. > Why is this? I never could figure out why accessing the first >element *past* the end of an array should be legal. Because it's not *accessing* it. Note the ampersand. It's very useful for loops: for(p = buf; p < &buf[sizeof(buf)]; p++) -- Richard -- Richard Tobin, JANET: R.Tobin@uk.ac.ed AI Applications Institute, ARPA: R.Tobin%uk.ac.ed@nsfnet-relay.ac.uk Edinburgh University. UUCP: ...!ukc!ed.ac.uk!R.Tobin
meissner@osf.org (Michael Meissner) (08/31/90)
In article <988@christopher-robin.cs.bham.ac.uk> cjr@cs.bham.ac.uk (Chris Ridd <RiddCJ>) writes: | Why is this? I never could figure out why accessing the first | element *past* the end of an array should be legal. So that you can do something like: char array[ARRAY_SIZE]; void clear_array(){ char *p; for (p = &array[0]; p < &array[ARRAY_SIZE]; p++) *p = '\0'; } -- Michael Meissner email: meissner@osf.org phone: 617-621-8861 Open Software Foundation, 11 Cambridge Center, Cambridge, MA, 02142 Do apple growers tell their kids money doesn't grow on bushes?
henry@zoo.toronto.edu (Henry Spencer) (08/31/90)
In article <988@christopher-robin.cs.bham.ac.uk> cjr@christopher-robin.UUCP (Chris Ridd <RiddCJ>) writes: > Why is this? I never could figure out why accessing the first >element *past* the end of an array should be legal. *Accessing* it is not legal, but taking its address is. This is arguably regrettable, but vast amounts of C code depend on it, and it is not a serious problem to implement. -- TCP/IP: handling tomorrow's loads today| Henry Spencer at U of Toronto Zoology OSI: handling yesterday's loads someday| henry@zoo.toronto.edu utzoo!henry
daveg@near.cs.caltech.edu (Dave Gillespie) (09/01/90)
>>>>> On 31 Aug 90 14:26:36 GMT, meissner@osf.org (Michael Meissner) said: > In article <988@christopher-robin.cs.bham.ac.uk> cjr@cs.bham.ac.uk > (Chris Ridd <RiddCJ>) writes: > | Why is this? I never could figure out why accessing the first > | element *past* the end of an array should be legal. > So that you can do something like: > ... > for (p = &array[0]; p < &array[ARRAY_SIZE]; p++) > *p = '\0'; Also, a pointer to the place just past the end of an array must legally be allowed to exist, for even more innocuous code like: p = array; /* same as "&array[0]" */ for (i = 0; i < ARRAY_SIZE; i++) *p++ = '\0'; Notice that at the end of this loop, "p" points to an address which would be illegal to access. But ANSI requires that such a pointer must work properly, even though saying "*p" or "p++" at this point is allowed to delete all your files, launch a nuclear strike, or any other kind of undefined result. (Whether I would actually buy a compiler that did this is another story...) Since I can produce this legal pointer by saying "p++", it stands to reason I should also be able to say "p = array + ARRAY_SIZE"; and we all know this is equivalent in C to "p = &array[ARRAY_SIZE]". It would be a shame to let these equivalences break just in this one special case. -- Dave -- Dave Gillespie 256-80 Caltech Pasadena CA USA 91125 daveg@csvax.cs.caltech.edu, ...!cit-vax!daveg
campbell@redsox.bsw.com (Larry Campbell) (09/04/90)
Are there actually any current compilers out there that are so stupid that they generate substantially different code for the following two code fragments? /* Fragment 1 */ for (p = array; p < &array[ARRAY_SIZE]; p++) *p++ = '\0'; /* Fragment 2 */ for (i = 0; i < ARRAY_SIZE; i++) array[i] = '\0'; If the answer to my question is "no", then the only excuse for permitting array indices past the end of an array is for compiling old, crufty code from clay tablets. If the answer to my question is "yes", then I think we should flame the culpable compiler vendors to a crisp. (hmmm, just tried this on SysV/386, and I guess I have to send a flame to AT&T... sigh...) -- Larry Campbell The Boston Software Works, Inc. campbell@redsox.bsw.com 120 Fulton Street wjh12!redsox!campbell Boston, MA 02109
cjr@cs.bham.ac.uk (Chris Ridd <RiddCJ>) (09/04/90)
In article <1990Aug31.163805.11232@zoo.toronto.edu> henry@zoo.toronto.edu (Henry Spencer) writes: >In article <988@christopher-robin.cs.bham.ac.uk> cjr@christopher-robin.UUCP (Chris Ridd <RiddCJ>) writes: >> Why is this? I never could figure out why accessing the first >>element *past* the end of an array should be legal. > >*Accessing* it is not legal, but taking its address is. This is arguably >regrettable, but vast amounts of C code depend on it, and it is not a >serious problem to implement. Thanks for the reminder! I don't tend to use the technique much myself... Chris -- Chris Ridd, Computer Science, Birmingham Uni, UK -- RiddCJ@Cs.Bham.Ac.Uk -- "'It's going to look pretty good, then, isn't it,' said War testily, 'the One Horseman and Three Pedestrians of the Apocralypse.'" - Sourcery
amull@Morgan.COM (Andrew P. Mullhaupt) (09/04/90)
In article <1589@redsox.bsw.com>, campbell@redsox.bsw.com (Larry Campbell) writes: > Are there actually any current compilers out there that are so stupid that > they generate substantially different code for the following two code fragments? > > /* Fragment 1 */ > for (p = array; p < &array[ARRAY_SIZE]; p++) > *p++ = '\0'; > > /* Fragment 2 */ > for (i = 0; i < ARRAY_SIZE; i++) > array[i] = '\0'; I hope that there are lots of compilers which can tell the difference between these. "Off by one" in the number of increments is one of the classic hazards of side effect assignments. (See my still as yet unposted "for statement considered harmful in C"...) The next problem can happed if p is of type (char *) and array is type (double[]). On machines where sizeof(char) != sizeof(double) (and there are one or two of these ill-behaved monstrosities around...) you don't get the same result. But seriously, folks... the exposure to aliasing of the two fragments is different, and if p is not used carefully subsequent to exiting the loop, optimization may be lost - i.e. a mere mortal compiler might not be able to generate the identical code for the kind of loop where the array is accessed via a pointer alias and the loop calls a function with either p or array as an argument. (Suppose the source for the function in the loop is not in the file where the loop is. How is the compiler supposed to know that a side effect doesn't prevent the use of a register for the aliased quanitity?) Later, Andrew Mullhaupt
chip@tct.uucp (Chip Salzenberg) (09/05/90)
Larry Campbell flames compiler writers for generating different code for these fragments: > /* Fragment 1 */ > for (p = array; p < &array[ARRAY_SIZE]; p++) > *p++ = '\0'; > > /* Fragment 2 */ > for (i = 0; i < ARRAY_SIZE; i++) > array[i] = '\0'; Mr. Campbell here makes a good point about compiler optimization. However, Mr. Campbell also writes: >If the answer to my question is "no", then the only excuse for permitting >array indices past the end of an array is for compiling old, crufty code >from clay tablets. Many C programmers use pointers for clarity, not just out of habit. My code, including my brand new code, frequently uses pointer navigation within arrays. This feature is useful, and its use is not confined to "old, crufty code". Other programming languages get along without pointers into arrays. That's one of the reasons I don't use those other languages. -- Chip Salzenberg at Teltronics/TCT <chip@tct.uucp>, <uunet!pdn!tct!chip>
mikey@ontek.com (Interplanetary Krill) (09/06/90)
In comp.lang.c, campbell@redsox.bsw.com (Larry Campbell) writes: | Are there actually any current compilers out there that are so stupid | that they generate substantially different code for the following two | code fragments? | | /* Fragment 1 */ | for (p = array; p < &array[ARRAY_SIZE]; p++) | *p++ = '\0'; | | /* Fragment 2 */ | for (i = 0; i < ARRAY_SIZE; i++) | array[i] = '\0'; Fragment 2 is superior in that it initializes the entire array rather than just elements with even numbered indexes. __ /x \/| \__/\|
jimmy@tybalt.caltech.edu (Jimmy Hu) (09/07/90)
In article <1589@redsox.bsw.com> campbell@redsox.bsw.com (Larry Campbell) writes: >Are there actually any current compilers out there that are so stupid that >they generate substantially different code for the following two code fragments? > > /* Fragment 1 */ > for (p = array; p < &array[ARRAY_SIZE]; p++) > *p++ = '\0'; > > /* Fragment 2 */ > for (i = 0; i < ARRAY_SIZE; i++) > array[i] = '\0'; > >If the answer to my question is "no", then the only excuse for permitting >array indices past the end of an array is for compiling old, crufty code >from clay tablets. > >If the answer to my question is "yes", then I think we should flame the >culpable compiler vendors to a crisp. To answer the first question, "Does any compiler generate different codes for the two fragments?" I would have to say definately yes. I would expect the two fragments to generate different codes. The first one uses a pointer to access the contents of the array, and the second one uses an integer as an index to the array. The final result will still be the same, in that the array is cleared to 0, but I don't think that the compiler will "convert" or "optimize" the second to the first or vice versa. After all, assuming it is trying to convert the second to the first, where is it going to get a pointer from? Create one? How will i become ARRAYSIZE after exiting the loop? There has to be SOME code difference. I don't think that a compiler should have to be sophisticated enough to "guess" at the intent of your code. (Another article stated that a MIPS compiler actually generated the SAME code for the two fragments. I find that hard to believe unless index i was never used anywhere else. Even so, that must have been a sophisticated compiler.) As for the usefulness of "crufty code", I think that the first fragment will run faster on most machines, since the address (array+i) doesn't have to be computed at each step. Whether this savings of time is of any interest to you is another matter. To answer the second question, namely "Why don't compilers check array bounds?", I would say that it is a very difficult task to check array bounds. For simple cases, like: int array[ARRAYSIZE]; for (i=1;i<=ARRAYSIZE;i++) /* Error */ array[i] = '\0'; perhaps a compiler could discover a possible error. But checking more complicated examples would probably be as hard as checking correctness of algorithms. For example: for (i=0;i<ARRAYSIZE;i++) array[func(i,someparm)]=nextval(); How is the compiler going to know if the array bound will be exceeded? It would need to check func and see if the return values fall within the bounds (0..ARRAYSIZE-1). To further complicate things, func is dependent on someparm, which perhaps if given the wrong values, can cause func to return invalid indeces. Thus, the compiler would have to make sure that someparm is in ITS valid range. As you can see, this problem can balloon further, given more variables and dependencies. I think that the best that can be done is to check for simple cases and to give warnings where applicable. Other than that, it is quite unreasonable to expect array bounds checking at compile time. I don't think other languages have this, do they? Run-time checking is more realistic, and I think I have seen this in Pascal, but it does slow things down quite a bit. All I can suggest to you is to find a compiler that has run-time checking that you can turn on or off, so that once you're satisfied that you haven't made any mistakes, you can turn the checking off and run your program at full speed. -- ---------------------------------------------------------------------- Jimmy Hu jimmy@tybalt.caltech.edu ----------------------------------------------------------------------
bright@Data-IO.COM (Walter Bright) (09/08/90)
In comp.lang.c, campbell@redsox.bsw.com (Larry Campbell) writes: | Are there actually any current compilers out there that are so stupid | that they generate substantially different code for the following two | code fragments? | | /* Fragment 1 */ | for (p = array; p < &array[ARRAY_SIZE]; p++) | *p = '\0'; /* changed from *p++ = 0; */ | | /* Fragment 2 */ | for (i = 0; i < ARRAY_SIZE; i++) | array[i] = '\0'; The optimization that converts 2 to 1 is called 'strength reduction' with 'loop induction variable elimination'. It is an advanced capability of global optimizing compilers, and few compilers do it. Zortech C/C++ does this optimization when it is compiling with full optimization.
bright@Data-IO.COM (Walter Bright) (09/08/90)
In article <1990Sep6.203031.29254@laguna.ccsf.caltech.edu> jimmy@tybalt.caltech.edu (Jimmy Hu) writes: <In article <1589@redsox.bsw.com> campbell@redsox.bsw.com (Larry Campbell) writes: <<Are there actually any current compilers out there that are so stupid that <<they generate substantially different code for the following two code fragments? << << /* Fragment 1 */ << for (p = array; p < &array[ARRAY_SIZE]; p++) << *p = '\0'; << << /* Fragment 2 */ << for (i = 0; i < ARRAY_SIZE; i++) << array[i] = '\0'; << <The final result will still be the <same, in that the array is cleared to 0, but I don't think that the compiler <will "convert" or "optimize" the second to the first or vice versa. Zortech's optimizer will do this conversion. <After all, <assuming it is trying to convert the second to the first, where is it going to <get a pointer from? Create one? Yes. <How will i become ARRAYSIZE after exiting the <loop? There has to be SOME code difference. I don't think that a compiler <should have to be sophisticated enough to "guess" at the intent of your code. If i is used after the exit of the loop, the expression i = p - array; is added to each branch that exits the loop. <(Another article stated that a MIPS compiler actually generated the SAME <code for the two fragments. I find that hard to believe unless index i was <never used anywhere else. Even so, that must have been a sophisticated <compiler.) Thanks for the compliment! These kinds of optimizations are fairly well understood. Check out Aho and Ullman's "Compilers" book (the dragon book).
userAKDU@mts.ucs.UAlberta.CA (Al Dunbar) (09/08/90)
In article <1990Sep6.203031.29254@laguna.ccsf.caltech.edu>, jimmy@tybalt.caltech.edu (Jimmy Hu) writes: >In article <1589@redsox.bsw.com> campbell@redsox.bsw.com (Larry Campbell) writes: >>Are there actually any current compilers out there that are so stupid that >>they generate substantially different code for the following two code fragments? >> >> /* Fragment 1 */ >> for (p = array; p < &array[ARRAY_SIZE]; p++) >> *p++ = '\0'; >> >> /* Fragment 2 */ >> for (i = 0; i < ARRAY_SIZE; i++) >> array[i] = '\0'; >> Actually, the first fragment will set every other element of the array to '?0', as the pointer is incremented twice per trip through the loop. -------------------+------------------------------------------- Al Dunbar | Edmonton, Alberta | this space for rent CANADA | -------------------+-------------------------------------------