[comp.lang.c] Array bounds checking with C????

kuan@iris.ucdavis.edu (Frank [Who me?] Kuan) (08/25/90)

	Why is it that most C compilers don't seem to support this
	nifty little feature?

	I'm working on a large project, and two of the worst debugging
	nightmares I've had were due to memory being overwritten from
	over indexing an array.

	I was thinking about writing some kind of preprocessor to
	check for this. Has anyone already written something like
	this?

	Thanks in Advance.

chris@mimsy.umd.edu (Chris Torek) (08/25/90)

In article <7611@ucdavis.ucdavis.edu> kuan@iris.ucdavis.edu
(Frank [Who me?] Kuan) writes:
>Why is it that most C compilers don't seem to [check array bounds]?

Mostly because it is hard.  Given `int *p', is `p[-1] = 3' valid?
That depends on the value of p....

There is a company called Saber that produces a product called
Saber-C that does this and more.  It works quite well, although last
I had heard it still objected to `&arr[sizeof arr/sizeof *arr]',
which is Officially Legal.  (Fortunately you can turn off each
individual objection.)
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 405 2750)
Domain:	chris@cs.umd.edu	Path:	uunet!mimsy!chris

vu0310@bingvaxu.cc.binghamton.edu (R. Kym Horsell) (08/26/90)

In article <7611@ucdavis.ucdavis.edu> kuan@iris.ucdavis.edu (Frank [Who me?] Kuan) writes:
>
>	Why is it that most C compilers don't seem to support this
>	nifty little feature?

I guess this isn't usually included because
(a)	array indexing is subsumed by pointer arithmetic & this
	is *much* harder (i.e. impossible in general) to check;
(b)	arrays can be declared with no bounds, i.e.
		extern long arr[];
	which implies either a smart linker and/or runtime
	support for array descriptions -- the antithesis of C
(c)	is is easy enough to do it yourself with macros:
		extern Thingy arr_[];
		#define	arr(i)	arr_[chkbnds(i,0,max_ind_of_arr_)]
		int chkbnds(ind,lwb,upb) {
			if(ind>=lwb && ind<=upb) return ind;
			/* chunder */
			exit(-1);
			}
	(note that we need a routine here so ``ind'', which may
	include side-effects, doesn't get evaluated twice).

-Kym Horsell

henry@zoo.toronto.edu (Henry Spencer) (08/26/90)

In article <26196@mimsy.umd.edu> chris@mimsy.umd.edu (Chris Torek) writes:
>>Why is it that most C compilers don't seem to [check array bounds]?
>
>Mostly because it is hard.  Given `int *p', is `p[-1] = 3' valid?
>That depends on the value of p....

Given the all-pervasive nature of pointers in C, about the only way you
can do bounds checking is to have each pointer haul along the range of
legal subscripts for itself.  This gets tricky in spots but is feasible.
Unfortunately, it imposes a lot of overhead on every pointer manipulation,
so it's badly unsuited to anything but a specialized debugging compiler,
and we don't have many of those.

Personally, I've long had a theory that doing this sort of checking at
compile time rather than run time ought to be feasible -- any competent
programmer takes care to avoid overrunning arrays, and given the limits
of the human mind, it ought to be possible for the compiler to duplicate
this reasoning, possibly with some help -- but it will not be easy.
-- 
Committees do harm merely by existing. | Henry Spencer at U of Toronto Zoology
                       -Freeman Dyson  |  henry@zoo.toronto.edu   utzoo!henry

browns@iccgcc.decnet.ab.com (Stan Brown, Oak Road Systems) (08/26/90)

In article <26196@mimsy.umd.edu>, chris@mimsy.umd.edu (Chris Torek) writes:
 (in response to aa query about array-bounds checking in C)

> There is a company called Saber that produces a product called
> Saber-C that does this and more.  It works quite well, although last
> I had heard it still objected to `&arr[sizeof arr/sizeof *arr]',
> which is Officially Legal.

Just to stick in my $.02:

int arr[4];

sizeof arr/sizeof *arr is 4, so &arr[4] may not be legal.  Whether legal or
not, the address is certainly outside the array.

Stan Brown, Oak Road Systems, Cleveland, Ohio, U.S.A.         (216) 371-0043
The opinions expressed are mine. Mine alone!  Nobody else is responsible for
them or even endorses them--except my cat Dexter, and he signed the power of
attorney only under my threat to cut off his Cat Chow!

henry@zoo.toronto.edu (Henry Spencer) (08/26/90)

In article <619.26d6cfb2@iccgcc.decnet.ab.com> browns@iccgcc.decnet.ab.com (Stan Brown, Oak Road Systems) writes:
>int arr[4];
>
>sizeof arr/sizeof *arr is 4, so &arr[4] may not be legal.  Whether legal or
>not, the address is certainly outside the array.

It is, however, legal, by special dispensation of ANSI C.  You can't
dereference it, but it is required to behave properly and participate
properly in comparisons and arithmetic.  Note, this only applies to
the address "one after" the end of the array -- both "two after" the
end and "one before" the beginning of the array take you into the
twilight zone of undefined behavior.
-- 
Committees do harm merely by existing. | Henry Spencer at U of Toronto Zoology
                       -Freeman Dyson  |  henry@zoo.toronto.edu   utzoo!henry

libes@cme.nist.gov (Don Libes) (08/27/90)

In article <26196@mimsy.umd.edu> chris@mimsy.umd.edu (Chris Torek) writes:
>In article <7611@ucdavis.ucdavis.edu> kuan@iris.ucdavis.edu
>(Frank [Who me?] Kuan) writes:
>>Why is it that most C compilers don't seem to [check array bounds]?
>
>Mostly because it is hard.  Given `int *p', is `p[-1] = 3' valid?
>That depends on the value of p....
>
>There is a company called Saber that produces a product called
>Saber-C that does this and more.  It works quite well, although last
>I had heard it still objected to `&arr[sizeof arr/sizeof *arr]',
>which is Officially Legal.  (Fortunately you can turn off each
>individual objection.)

Saber objects to a lot of things that are legal, but then, so does
lint.  And in most cases, it makes sense to use one of their
directives to explicitly disable the objection.

Saber complains about some things that I think it shouldn't to begin
with, but again so does lint.

I highly recommend Saber.  I don't use it all the time - we have a
limited number of licenses here - but when lint and the debugger fail
me, I pull out Saber.  It's pretty damn useful.

Don Libes          libes@cme.nist.gov      ...!uunet!cme-durer!libes

spee@qmfl.jrdc.go.jp (Paul SPEE) (08/27/90)

In article <7611@ucdavis.ucdavis.edu> kuan@iris.ucdavis.edu (Frank [Who me?] Kuan) writes:
>	Why is it that most C compilers don't seem to support this
>	nifty little feature?

To be able to check the array boundaries, the C compiler must now the
array size. However, in most important cases the C compiler does not
have this information. This can be either be the case when an array
is passed as a function parameter or is allocated as a dynamic array.
It would have been convenient if ANSI would have allowed 'pointers
to variable size arrays'. For example,

int	array[10];

int *
f(n, a)
int	n;
int	(*a)[n];
{
	register i;
	int	(*b)[n] = (void *) malloc(n * sizeof(int));

	for (i = 0; i < n; i++)
		(*b)[i] = (*a)[i];

	return (int *) b;
}

main()
{
	f(sizeof(array), array);
}

(Note: gcc allows this.)

The same problem is true not only for runtime checking but also for
parallizing scientific code. See

%A Randy Allen
%A Steve Johnson
%T Compiling C for Vectorization, Parallelization, and Inline Expandsion
%J Proceedings of the SIGPLAN '88 Conference on Programming Language Design and Implementation
%D June 22-24, 1988
%C Atlanta, Georgia
%P 241-249
%L ALLEN88

Paul Spee

chris@mimsy.umd.edu (Chris Torek) (08/27/90)

In article <26196@mimsy.umd.edu> I wrote:
>>... it still objected to `&arr[sizeof arr/sizeof *arr]',
>>which is Officially Legal.

In article <619.26d6cfb2@iccgcc.decnet.ab.co> browns@iccgcc.decnet.ab.com
(Stan Brown, Oak Road Systems) writes:
>Just to stick in my $.02:
>
>int arr[4];
>
>sizeof arr/sizeof *arr is 4, so &arr[4] may not be legal.  Whether legal or
>not, the address is certainly outside the array.

See ANSI standard X3.159-1989, section 3.3.6, `additive operators'.
(I might have included this before but I was at home.)  As I said, it
is Officially Legal.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 405 2750)
Domain:	chris@cs.umd.edu	Path:	uunet!mimsy!chris

henry@zoo.toronto.edu (Henry Spencer) (08/28/90)

In article <6055@muffin.cme.nist.gov> libes@cme.nist.gov (Don Libes) writes:
>I highly recommend Saber.  I don't use it all the time - we have a
>limited number of licenses here - but when lint and the debugger fail
>me, I pull out Saber.  It's pretty damn useful.

I concur.  We don't have Saber C here -- limited need and tight budget --
but I had a chance to try it out briefly, and was impressed.  My only real
unhappiness was that it's obvious that all its authors use the C shell,
because it screws up in various small ways when you run it from the
standard shell.
-- 
Committees do harm merely by existing. | Henry Spencer at U of Toronto Zoology
                       -Freeman Dyson  |  henry@zoo.toronto.edu   utzoo!henry

steve@taumet.com (Stephen Clamage) (08/29/90)

spee@qmfl.jrdc.go.jp (Paul SPEE) writes:

>To be able to check the array boundaries, the C compiler must now the
>array size. However, in most important cases the C compiler does not
>have this information. This can be either be the case when an array
>is passed as a function parameter or is allocated as a dynamic array.
>It would have been convenient if ANSI would have allowed 'pointers
>to variable size arrays'. For example,

There is nothing to prevent the C compiler from carrying around enough
information with arrays and pointers to detect those problems at runtime
which cannot be found at compile time.  Such a compiler could still be
ANSI-conforming.  When an array was declared, the compiler would
allocate extra space, say, just before the beginning of the array to
contain size information.  Pointers would be larger than a plain address
to contain similar information.  Every array reference and pointer
dereference would then be checked for bounds violation, at compile time
if possible, at run time otherwise.  This approach is in fact implemented
in some compilers.
-- 

Steve Clamage, TauMetric Corp, steve@taumet.com

george@hls0.hls.oz (George Turczynski) (08/30/90)

In article <7611@ucdavis.ucdavis.edu>, kuan@iris.ucdavis.edu (Frank [Who me?] Kuan) writes:
> 
> 	Why is it that most C compilers don't seem to support this
> 	nifty little feature?

Because most C compilers assume you know what you're doing :-)

> 	I'm working on a large project, and two of the worst debugging
> 	nightmares I've had were due to memory being overwritten from
> 	over indexing an array.

Array bounds, along with typos,  should be one of the first things you
check when looking for (discrete) bugs, though I guess you know that
now !
 
> 	I was thinking about writing some kind of preprocessor to
> 	check for this. Has anyone already written something like
> 	this?

What is it that you want ?  Something to check your array indexing
variables at compile-time ?  In most cases, this is _possible_, but in
others it's impossible.  If you want code generated to check array
indexing at run-time, then your code will run slower, which is why
C compilers don't do it.  We want the fastest possible code we can
get !

If you want these sorts of features, use PASCAL :-)
 
-- 
| George P. J. Turczynski.          |---------------------------------------------------- 
| Computer Systems Engineer.        | ACSnet: george@highland.oz | I can't speak for the |
| Highland Logic Pty. Ltd.          | Phone: +61 48 683490       | company, I can barely |
| Suite 1, 348-354 Argyle St        | Fax:   +61 48 683474       | speak for myself...   |
| Moss Vale. NSW. Australia. 2577   |---------------------------------------------------- 

cjr@cs.bham.ac.uk (Chris Ridd <RiddCJ>) (08/30/90)

In article <26196@mimsy.umd.edu> chris@mimsy.umd.edu (Chris Torek) writes:
>In article <7611@ucdavis.ucdavis.edu> kuan@iris.ucdavis.edu
>(Frank [Who me?] Kuan) writes:
>>Why is it that most C compilers don't seem to [check array bounds]?
>
>Mostly because it is hard.  Given `int *p', is `p[-1] = 3' valid?
>That depends on the value of p....
>
>There is a company called Saber that produces a product called
>Saber-C that does this and more.  It works quite well, although last
>I had heard it still objected to `&arr[sizeof arr/sizeof *arr]',
>which is Officially Legal.  (Fortunately you can turn off each
>individual objection.)

  Why is this?  I never could figure out why accessing the first
element *past* the end of an array should be legal.

   Chris

-- Chris Ridd, Computer Science, Birmingham Uni, UK -- RiddCJ@Cs.Bham.Ac.Uk --

"'It's going to look pretty good, then, isn't it,' said War testily, 'the One
Horseman and Three Pedestrians of the Apocralypse.'" - Sourcery

njk@diku.dk (Niels J|rgen Kruse) (08/30/90)

steve@taumet.com (Stephen Clamage) writes:

>There is nothing to prevent the C compiler from carrying around enough
>information with arrays and pointers to detect those problems at runtime
>which cannot be found at compile time.  (...)
>if possible, at run time otherwise.  This approach is in fact implemented
>in some compilers.
>--

Oh.  Which ones?

Assume the following code:

        char *a,*c; double *b,d[17/sizeof(double)];

        if (a = malloc (17)) {
          b = (double *)a;
          c = (char *)b;
          /*  A  */
        }

At location A, a[16] is of course legal and a + 17 is
computable but not dereferenceable.  Also, it is obvious that
b[17/sizeof(double) - 1] is legal and b + 17/sizeof(double) is
computable but not dereferenceable.  But what about c?
Is c[16] legal?  Note that c[16] does not constitute part of
any double within bounds of b (unless sizeof(double) == 17 or 1).
What kind of object is b pointing to?  How does it differ from
the object pointed to by (d+0)?

What does your bounds-checking C compiler have to say?
What does the standard say?
-- 
Niels J|rgen Kruse 	DIKU Graduate 	njk@diku.dk

karl@haddock.ima.isc.com (Karl Heuer) (08/31/90)

In article <1990Aug30.134537.26326@diku.dk> njk@diku.dk (Niels J|rgen Kruse) writes:
>Assume the following code [on a bounds-checking implementation]:
>        char *a,*c; double *b,d[17/sizeof(double)];
>        if (a = malloc (17)) {
>          b = (double *)a;
>          c = (char *)b;
>Is c[16] legal?

I believe it is, and therefore that the cast to (double *) must not actually
reduce the known range of the pointer to that which is pointable from a
double.  Thus, a bounds-checking C implementation must maintain the bounds of
a pointer by using a byte count (or byte pointer) rather than an object count
(or object pointer).

>What kind of object is b pointing to?  How does it differ from
>the object pointed to by (d+0)?

Assume for concreteness that sizeof(double)==8.  Then b is <double *, pointer
to beginning of 17-byte block>, which is room for 2 doubles plus a spare byte
at the end that cannot be referenced without casting b.  But d is <double *,
pointer to beginning of 16-byte block>, which is room for 2 doubles exactly.

>What does your bounds-checking C compiler have to say?
>What does the standard say?

This is my interpretation of the Standard.  I don't have a bounds-checking C
compiler at hand, and I wonder if it would get this right.  (Particularly on a
word-addressible architecture.)

Karl W. Z. Heuer (karl@kelp.ima.isc.com or ima!kelp!karl), The Walking Lint

gdtltr@freezer.it.udel.edu (Gary Duzan) (08/31/90)

In article <988@christopher-robin.cs.bham.ac.uk> cjr@christopher-robin.UUCP (Chris Ridd <RiddCJ>) writes:
=>In article <26196@mimsy.umd.edu> chris@mimsy.umd.edu (Chris Torek) writes:
=>>In article <7611@ucdavis.ucdavis.edu> kuan@iris.ucdavis.edu
=>>(Frank [Who me?] Kuan) writes:
=>>>Why is it that most C compilers don't seem to [check array bounds]?
=>>
=>>Mostly because it is hard.  Given `int *p', is `p[-1] = 3' valid?
=>>That depends on the value of p....
=>>
=>>There is a company called Saber that produces a product called
=>>Saber-C that does this and more.  It works quite well, although last
=>>I had heard it still objected to `&arr[sizeof arr/sizeof *arr]',
=>>which is Officially Legal.  (Fortunately you can turn off each
=>>individual objection.)
=>
=>  Why is this?  I never could figure out why accessing the first
=>element *past* the end of an array should be legal.
=>
   Correct me if I am wrong, but I don't believe accessing the element after
is legal, but the pointer is still legal. In other words:

int x,foo[foolen],*fooptr;

x=foo[foolen]; /* Illegal */

fooptr=foo+foolen; /* Legal, points one int past end of foo */
x=*fooptr;         /* Illegal */
--fooptr;          /* Legal, points to last element of foo */

                                        Gary Duzan
                                        Time  Lord
                                    Third Regeneration



--
                          gdtltr@freezer.it.udel.edu
   _o_                    --------------------------                      _o_
 [|o o|] If you can square, round, or cube a number, why not sphere it? [|o o|]
  |_O_|         "Don't listen to me; I never do." -- Doctor Who          |_O_|

richard@aiai.ed.ac.uk (Richard Tobin) (08/31/90)

In article <988@christopher-robin.cs.bham.ac.uk> cjr@christopher-robin.UUCP (Chris Ridd <RiddCJ>) writes:
>>I had heard it still objected to `&arr[sizeof arr/sizeof *arr]',
>>which is Officially Legal.

>  Why is this?  I never could figure out why accessing the first
>element *past* the end of an array should be legal.

Because it's not *accessing* it.  Note the ampersand.

It's very useful for loops:

    for(p = buf;  p < &buf[sizeof(buf)];  p++)

-- Richard
-- 
Richard Tobin,                       JANET: R.Tobin@uk.ac.ed             
AI Applications Institute,           ARPA:  R.Tobin%uk.ac.ed@nsfnet-relay.ac.uk
Edinburgh University.                UUCP:  ...!ukc!ed.ac.uk!R.Tobin

meissner@osf.org (Michael Meissner) (08/31/90)

In article <988@christopher-robin.cs.bham.ac.uk> cjr@cs.bham.ac.uk
(Chris Ridd <RiddCJ>) writes:

|   Why is this?  I never could figure out why accessing the first
| element *past* the end of an array should be legal.

So that you can do something like:

char array[ARRAY_SIZE];

void clear_array(){
	char *p;

	for (p = &array[0]; p < &array[ARRAY_SIZE]; p++)
		*p = '\0';
}
--
Michael Meissner	email: meissner@osf.org		phone: 617-621-8861
Open Software Foundation, 11 Cambridge Center, Cambridge, MA, 02142

Do apple growers tell their kids money doesn't grow on bushes?

henry@zoo.toronto.edu (Henry Spencer) (08/31/90)

In article <988@christopher-robin.cs.bham.ac.uk> cjr@christopher-robin.UUCP (Chris Ridd <RiddCJ>) writes:
>  Why is this?  I never could figure out why accessing the first
>element *past* the end of an array should be legal.

*Accessing* it is not legal, but taking its address is.  This is arguably
regrettable, but vast amounts of C code depend on it, and it is not a
serious problem to implement.
-- 
TCP/IP: handling tomorrow's loads today| Henry Spencer at U of Toronto Zoology
OSI: handling yesterday's loads someday|  henry@zoo.toronto.edu   utzoo!henry

daveg@near.cs.caltech.edu (Dave Gillespie) (09/01/90)

>>>>> On 31 Aug 90 14:26:36 GMT, meissner@osf.org (Michael Meissner) said:
> In article <988@christopher-robin.cs.bham.ac.uk> cjr@cs.bham.ac.uk
> (Chris Ridd <RiddCJ>) writes:
> |   Why is this?  I never could figure out why accessing the first
> | element *past* the end of an array should be legal.

> So that you can do something like:
>       ...
> 	for (p = &array[0]; p < &array[ARRAY_SIZE]; p++)
> 		*p = '\0';

Also, a pointer to the place just past the end of an array must legally
be allowed to exist, for even more innocuous code like:

	p = array;   /* same as "&array[0]" */
	for (i = 0; i < ARRAY_SIZE; i++)
		*p++ = '\0';

Notice that at the end of this loop, "p" points to an address which
would be illegal to access.  But ANSI requires that such a pointer must
work properly, even though saying "*p" or "p++" at this point is
allowed to delete all your files, launch a nuclear strike, or any
other kind of undefined result.  (Whether I would actually buy a
compiler that did this is another story...)

Since I can produce this legal pointer by saying "p++", it stands to
reason I should also be able to say "p = array + ARRAY_SIZE"; and we
all know this is equivalent in C to "p = &array[ARRAY_SIZE]".  It
would be a shame to let these equivalences break just in this one
special case.

								-- Dave
--
Dave Gillespie
  256-80 Caltech Pasadena CA USA 91125
  daveg@csvax.cs.caltech.edu, ...!cit-vax!daveg

campbell@redsox.bsw.com (Larry Campbell) (09/04/90)

Are there actually any current compilers out there that are so stupid that
they generate substantially different code for the following two code fragments?

    /* Fragment 1 */
    for (p = array; p < &array[ARRAY_SIZE]; p++)
	*p++ = '\0';

    /* Fragment 2 */
    for (i = 0; i < ARRAY_SIZE; i++)
	array[i] = '\0';

If the answer to my question is "no", then the only excuse for permitting
array indices past the end of an array is for compiling old, crufty code
from clay tablets.

If the answer to my question is "yes", then I think we should flame the
culpable compiler vendors to a crisp.

(hmmm, just tried this on SysV/386, and I guess I have to send a flame to
AT&T...  sigh...)
-- 
Larry Campbell                          The Boston Software Works, Inc.
campbell@redsox.bsw.com                 120 Fulton Street
wjh12!redsox!campbell                   Boston, MA 02109

cjr@cs.bham.ac.uk (Chris Ridd <RiddCJ>) (09/04/90)

In article <1990Aug31.163805.11232@zoo.toronto.edu>
henry@zoo.toronto.edu (Henry Spencer) writes:

>In article <988@christopher-robin.cs.bham.ac.uk>
cjr@christopher-robin.UUCP (Chris Ridd <RiddCJ>) writes:
>>  Why is this?  I never could figure out why accessing the first
>>element *past* the end of an array should be legal.
>
>*Accessing* it is not legal, but taking its address is.  This is arguably
>regrettable, but vast amounts of C code depend on it, and it is not a
>serious problem to implement.

  Thanks for the reminder!  I don't tend to use the technique much myself...

    Chris

-- Chris Ridd, Computer Science, Birmingham Uni, UK -- RiddCJ@Cs.Bham.Ac.Uk --

"'It's going to look pretty good, then, isn't it,' said War testily, 'the One
Horseman and Three Pedestrians of the Apocralypse.'" - Sourcery

amull@Morgan.COM (Andrew P. Mullhaupt) (09/04/90)

In article <1589@redsox.bsw.com>, campbell@redsox.bsw.com (Larry Campbell) writes:
> Are there actually any current compilers out there that are so stupid that
> they generate substantially different code for the following two code fragments?
> 
>     /* Fragment 1 */
>     for (p = array; p < &array[ARRAY_SIZE]; p++)
> 	*p++ = '\0';
> 
>     /* Fragment 2 */
>     for (i = 0; i < ARRAY_SIZE; i++)
> 	array[i] = '\0';

I hope that there are lots of compilers which can tell the difference
between these. "Off by one" in the number of increments is one of the
classic hazards of side effect assignments. (See my still as yet
unposted "for statement considered harmful in C"...) The next problem
can happed if p is of type (char *) and array is type (double[]). On
machines where sizeof(char) != sizeof(double) (and there are one or
two of these ill-behaved monstrosities around...) you don't get the
same result.

But seriously, folks... the exposure to aliasing of the two fragments
is different, and if p is not used carefully subsequent to exiting
the loop, optimization may be lost - i.e. a mere mortal compiler might
not be able to generate the identical code for the kind of loop where
the array is accessed via a pointer alias and the loop calls a function
with either p or array as an argument. (Suppose the source for the
function in the loop is not in the file where the loop is. How is
the compiler supposed to know that a side effect doesn't prevent the
use of a register for the aliased quanitity?)

Later,
Andrew Mullhaupt

chip@tct.uucp (Chip Salzenberg) (09/05/90)

Larry Campbell flames compiler writers for generating different code
for these fragments:

>    /* Fragment 1 */
>    for (p = array; p < &array[ARRAY_SIZE]; p++)
>	*p++ = '\0';
>
>    /* Fragment 2 */
>    for (i = 0; i < ARRAY_SIZE; i++)
>	array[i] = '\0';

Mr. Campbell here makes a good point about compiler optimization.
However, Mr. Campbell also writes:

>If the answer to my question is "no", then the only excuse for permitting
>array indices past the end of an array is for compiling old, crufty code
>from clay tablets.

Many C programmers use pointers for clarity, not just out of habit.
My code, including my brand new code, frequently uses pointer
navigation within arrays.  This feature is useful, and its use is not
confined to "old, crufty code".

Other programming languages get along without pointers into arrays.
That's one of the reasons I don't use those other languages.
-- 
Chip Salzenberg at Teltronics/TCT     <chip@tct.uucp>, <uunet!pdn!tct!chip>

mikey@ontek.com (Interplanetary Krill) (09/06/90)

In comp.lang.c, campbell@redsox.bsw.com (Larry Campbell) writes:
| Are there actually any current compilers out there that are so stupid 
| that they generate substantially different code for the following two 
| code fragments? 
| 
|     /* Fragment 1 */
|     for (p = array; p < &array[ARRAY_SIZE]; p++)
| 	*p++ = '\0';
| 
|     /* Fragment 2 */
|     for (i = 0; i < ARRAY_SIZE; i++)
| 	array[i] = '\0';

  Fragment 2 is superior in that it initializes the entire
  array rather than just elements with even numbered indexes.
			__
		       /x \/|
		       \__/\|

jimmy@tybalt.caltech.edu (Jimmy Hu) (09/07/90)

In article <1589@redsox.bsw.com> campbell@redsox.bsw.com (Larry Campbell) writes:
>Are there actually any current compilers out there that are so stupid that
>they generate substantially different code for the following two code fragments?
>
>    /* Fragment 1 */
>    for (p = array; p < &array[ARRAY_SIZE]; p++)
>	*p++ = '\0';
>
>    /* Fragment 2 */
>    for (i = 0; i < ARRAY_SIZE; i++)
>	array[i] = '\0';
>
>If the answer to my question is "no", then the only excuse for permitting
>array indices past the end of an array is for compiling old, crufty code
>from clay tablets.
>
>If the answer to my question is "yes", then I think we should flame the
>culpable compiler vendors to a crisp.

To answer the first question, "Does any compiler generate different
codes for the two fragments?" I would have to say definately yes.
I would expect the two fragments to generate different codes. The first
one uses a pointer to access the contents of the array, and the second one
uses an integer as an index to the array. The final result will still be the
same, in that the array is cleared to 0, but I don't think that the compiler
will "convert" or "optimize" the second to the first or vice versa. After all,
assuming it is trying to convert the second to the first, where is it going to
get a pointer from? Create one? How will i become ARRAYSIZE after exiting the
loop? There has to be SOME code difference. I don't think that a compiler
should have to be sophisticated enough to "guess" at the intent of your code.

(Another article stated that a MIPS compiler actually generated the SAME
code for the two fragments. I find that hard to believe unless index i was
never used anywhere else. Even so, that must have been a sophisticated
compiler.) 

As for the usefulness of "crufty code", I think that the first fragment
will run faster on most machines, since the address (array+i) doesn't have
to be computed at each step. Whether this savings of time is of any interest
to you is another matter.


To answer the second question, namely "Why don't compilers check array
bounds?", I would say that it is a very difficult task to check array
bounds. For simple cases, like:

int array[ARRAYSIZE];

for (i=1;i<=ARRAYSIZE;i++)   /* Error */
	array[i] = '\0';

perhaps a compiler could discover a possible error. But checking more
complicated examples would probably be as hard as checking correctness of
algorithms. For example:

for (i=0;i<ARRAYSIZE;i++)
	array[func(i,someparm)]=nextval();

How is the compiler going to know if the array bound will be exceeded?
It would need to check func and see if the return values fall within the
bounds (0..ARRAYSIZE-1). To further complicate things, func is dependent
on someparm, which perhaps if given the wrong values, can cause func to
return invalid indeces. Thus, the compiler would have to make sure that
someparm is in ITS valid range. As you can see, this problem can balloon
further, given more variables and dependencies.

I think that the best that can be done is to check for simple cases and to
give warnings where applicable. Other than that, it is quite unreasonable
to expect array bounds checking at compile time. I don't think other languages
have this, do they?  Run-time checking is more realistic, and I think I have
seen this in Pascal, but it does slow things down quite a bit. All I can
suggest to you is to find a compiler that has run-time checking that you
can turn on or off, so that once you're satisfied that you haven't made any
mistakes, you can turn the checking off and run your program at full speed.
--
----------------------------------------------------------------------
Jimmy Hu                                      jimmy@tybalt.caltech.edu
----------------------------------------------------------------------

bright@Data-IO.COM (Walter Bright) (09/08/90)

In comp.lang.c, campbell@redsox.bsw.com (Larry Campbell) writes:
| Are there actually any current compilers out there that are so stupid 
| that they generate substantially different code for the following two 
| code fragments? 
| 
|     /* Fragment 1 */
|     for (p = array; p < &array[ARRAY_SIZE]; p++)
| 	*p = '\0';		/* changed from *p++ = 0;	*/
| 
|     /* Fragment 2 */
|     for (i = 0; i < ARRAY_SIZE; i++)
| 	array[i] = '\0';

The optimization that converts 2 to 1 is called 'strength reduction'
with 'loop induction variable elimination'. It is an advanced capability
of global optimizing compilers, and few compilers do it. Zortech C/C++
does this optimization when it is compiling with full optimization.

bright@Data-IO.COM (Walter Bright) (09/08/90)

In article <1990Sep6.203031.29254@laguna.ccsf.caltech.edu> jimmy@tybalt.caltech.edu (Jimmy Hu) writes:
<In article <1589@redsox.bsw.com> campbell@redsox.bsw.com (Larry Campbell) writes:
<<Are there actually any current compilers out there that are so stupid that
<<they generate substantially different code for the following two code fragments?
<<
<<    /* Fragment 1 */
<<    for (p = array; p < &array[ARRAY_SIZE]; p++)
<<	*p = '\0';
<<
<<    /* Fragment 2 */
<<    for (i = 0; i < ARRAY_SIZE; i++)
<<	array[i] = '\0';
<<
<The final result will still be the
<same, in that the array is cleared to 0, but I don't think that the compiler
<will "convert" or "optimize" the second to the first or vice versa.

Zortech's optimizer will do this conversion.

<After all,
<assuming it is trying to convert the second to the first, where is it going to
<get a pointer from? Create one?

Yes.

<How will i become ARRAYSIZE after exiting the
<loop? There has to be SOME code difference. I don't think that a compiler
<should have to be sophisticated enough to "guess" at the intent of your code.

If i is used after the exit of the loop, the expression
	i = p - array;
is added to each branch that exits the loop.

<(Another article stated that a MIPS compiler actually generated the SAME
<code for the two fragments. I find that hard to believe unless index i was
<never used anywhere else. Even so, that must have been a sophisticated
<compiler.) 

Thanks for the compliment!

These kinds of optimizations are fairly well understood. Check out
Aho and Ullman's "Compilers" book (the dragon book).

userAKDU@mts.ucs.UAlberta.CA (Al Dunbar) (09/08/90)

In article <1990Sep6.203031.29254@laguna.ccsf.caltech.edu>, jimmy@tybalt.caltech.edu (Jimmy Hu) writes:
>In article <1589@redsox.bsw.com> campbell@redsox.bsw.com (Larry Campbell) writes:
>>Are there actually any current compilers out there that are so stupid that
>>they generate substantially different code for the following two code fragments?
>>
>>    /* Fragment 1 */
>>    for (p = array; p < &array[ARRAY_SIZE]; p++)
>>       *p++ = '\0';
>>
>>    /* Fragment 2 */
>>    for (i = 0; i < ARRAY_SIZE; i++)
>>       array[i] = '\0';
>>
 
Actually, the first fragment will set every other element of
the array to '?0', as the pointer is incremented twice per trip
through the loop.
-------------------+-------------------------------------------
Al Dunbar          |
Edmonton, Alberta  |   this space for rent
CANADA             |
-------------------+-------------------------------------------