[comp.lang.c] Array bounds checking: what is legal

chris@mimsy.umd.edu (Chris Torek) (09/02/90)

In article <26196@mimsy.umd.edu> I wrote:
>`&arr[sizeof arr/sizeof *arr]' ... is Officially Legal.

(Those who would dispute this are advised to see ANSI Standard
X3.159-1989, otherwise known as `The ANSI C Standard', sections 3.2.2.1
(Lvalues and function designators), 3.3.3.4 (The sizeof operator), and
3.3.6 (Additive operators).)

This seems to be rather universally misunderstood.  To amplify a bit:

In article <29051@nigel.ee.udel.edu> gdtltr@freezer.it.udel.edu (Gary Duzan)
writes:
>I don't believe accessing the element after is legal, but the pointer
>is still legal.

Correct.  Given `int a[4];', the following holds:

	int *p = a;			/* legal */
	a[0], a[1], a[2], a[3];		/* all legal */
	p[0], p[1], p[2], p[3];		/* all legal */
	p = &a[4];			/* legal */
	*p;				/* illegal (a[4] does not exist) */
	p--;				/* legal */
	p = a;				/* legal */
	p--;				/* illegal */
	p = &a[4];			/* legal */
	p[-4], p[-3], p[-2], p[-1];	/* all legal */

Note the last carefully: it is not the subscript itself that makes a
given x[i] legal or illegal, but rather whether x+i yeilds a legal address
and, if so, whether *(x+i) is also legal.

Now, as to why &a[4] is legal when a[4] is not, consider:

	int i;
	for (i = 0; i < 4; i++)
		printf("%d\n", i);

When this code is run, i takes on five values, namely 0, 1, 2, 3, and 4.
Even if we alter the loop slightly to get rid of the `4', i still takes
on the value 4:

	for (i = 0; i <= 3; i++)
		...

Now what happens if we loop `p' over the various elements in `a'?

	for (p = &a[0]; p < &a[4]; p++)
		...

p must eventually take on the value &a[4].  There is no way around it;
even if we get rid of the `&a[4]' in the loop, p still winds up with
&a[4] as its final value:

	for (p = &a[0]; p <= &a[3]; p++)
		...
	/* now p == &a[4] */

Since this sort of thing happens all the time in existing code, there was
no choice but to make it Officially Legal and require all C compilers to
support it.  This, on the other hand, is not legal:

	for (p = &a[3]; p >= &a[0]; p--)	/* illegal */
		...

This loop supposedly terminates when p takes on the value &a[-1]; but as
noted above, &a[-1] is not a legal address, and in fact this code fails
on some machines---for instance, on a 68000 where the C compiler starts
the data space at location 2, and `a' is a global array of 32-bit `int's
that happens to be the first object in the data segment.  The code turns
into, e.g.,

loop:
	...
	subql	#4,a2		# p--
	cmpl	#2,a2		# (unsigned long)p < 2?
	jcs	out		# if so, exit loop
	jra	loop		# otherwise continue

and when p==&a[0], p==2, so p-4 puts 0xfffffffe into p, which is still
greater than or equal to 2.

This is the same old fencepost problem that occurs everywhere.

Incidentally, there is a way to keep p from taking on &a[4]:

	for (p = a;; p++) {
		...
		if (p == &a[3])
			break;
	}

This is the same solution required for loops that purport to run to
MAXINT or MAXULONG or other such maxima, and it shares their drawback:
these are exceedingly ugly.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 405 2750)
Domain:	chris@cs.umd.edu	Path:	uunet!mimsy!chris

6sigma2@polari.UUCP (Brian Matthews) (09/04/90)

In article <26327@mimsy.umd.edu> chris@mimsy.umd.edu (Chris Torek) writes:
|Correct.  Given `int a[4];', the following holds:
|	int *p = a;			/* legal */
|	a[0], a[1], a[2], a[3];		/* all legal */
|	p[0], p[1], p[2], p[3];		/* all legal */
|	p = &a[4];			/* legal */
|	*p;				/* illegal (a[4] does not exist) */
|	p--;				/* legal */
|	p = a;				/* legal */
|	p--;				/* illegal */
|	p = &a[4];			/* legal */
|	p[-4], p[-3], p[-2], p[-1];	/* all legal */

A minor clarification - where Chris says "illegal", read "undefined".
Admittedly a minor point, but "illegal" might lead one to believe the
compiler won't accept the code in question.  "Undefined" means the
compiler may or may not accept the code, and if accepted, the resulting
machine code may or may not do something "useful", which is actually
the case.
-- 
Brian L. Matthews	blm@6sceng.UUCP