chris@mimsy.umd.edu (Chris Torek) (09/02/90)
In article <26196@mimsy.umd.edu> I wrote: >`&arr[sizeof arr/sizeof *arr]' ... is Officially Legal. (Those who would dispute this are advised to see ANSI Standard X3.159-1989, otherwise known as `The ANSI C Standard', sections 3.2.2.1 (Lvalues and function designators), 3.3.3.4 (The sizeof operator), and 3.3.6 (Additive operators).) This seems to be rather universally misunderstood. To amplify a bit: In article <29051@nigel.ee.udel.edu> gdtltr@freezer.it.udel.edu (Gary Duzan) writes: >I don't believe accessing the element after is legal, but the pointer >is still legal. Correct. Given `int a[4];', the following holds: int *p = a; /* legal */ a[0], a[1], a[2], a[3]; /* all legal */ p[0], p[1], p[2], p[3]; /* all legal */ p = &a[4]; /* legal */ *p; /* illegal (a[4] does not exist) */ p--; /* legal */ p = a; /* legal */ p--; /* illegal */ p = &a[4]; /* legal */ p[-4], p[-3], p[-2], p[-1]; /* all legal */ Note the last carefully: it is not the subscript itself that makes a given x[i] legal or illegal, but rather whether x+i yeilds a legal address and, if so, whether *(x+i) is also legal. Now, as to why &a[4] is legal when a[4] is not, consider: int i; for (i = 0; i < 4; i++) printf("%d\n", i); When this code is run, i takes on five values, namely 0, 1, 2, 3, and 4. Even if we alter the loop slightly to get rid of the `4', i still takes on the value 4: for (i = 0; i <= 3; i++) ... Now what happens if we loop `p' over the various elements in `a'? for (p = &a[0]; p < &a[4]; p++) ... p must eventually take on the value &a[4]. There is no way around it; even if we get rid of the `&a[4]' in the loop, p still winds up with &a[4] as its final value: for (p = &a[0]; p <= &a[3]; p++) ... /* now p == &a[4] */ Since this sort of thing happens all the time in existing code, there was no choice but to make it Officially Legal and require all C compilers to support it. This, on the other hand, is not legal: for (p = &a[3]; p >= &a[0]; p--) /* illegal */ ... This loop supposedly terminates when p takes on the value &a[-1]; but as noted above, &a[-1] is not a legal address, and in fact this code fails on some machines---for instance, on a 68000 where the C compiler starts the data space at location 2, and `a' is a global array of 32-bit `int's that happens to be the first object in the data segment. The code turns into, e.g., loop: ... subql #4,a2 # p-- cmpl #2,a2 # (unsigned long)p < 2? jcs out # if so, exit loop jra loop # otherwise continue and when p==&a[0], p==2, so p-4 puts 0xfffffffe into p, which is still greater than or equal to 2. This is the same old fencepost problem that occurs everywhere. Incidentally, there is a way to keep p from taking on &a[4]: for (p = a;; p++) { ... if (p == &a[3]) break; } This is the same solution required for loops that purport to run to MAXINT or MAXULONG or other such maxima, and it shares their drawback: these are exceedingly ugly. -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 405 2750) Domain: chris@cs.umd.edu Path: uunet!mimsy!chris
6sigma2@polari.UUCP (Brian Matthews) (09/04/90)
In article <26327@mimsy.umd.edu> chris@mimsy.umd.edu (Chris Torek) writes: |Correct. Given `int a[4];', the following holds: | int *p = a; /* legal */ | a[0], a[1], a[2], a[3]; /* all legal */ | p[0], p[1], p[2], p[3]; /* all legal */ | p = &a[4]; /* legal */ | *p; /* illegal (a[4] does not exist) */ | p--; /* legal */ | p = a; /* legal */ | p--; /* illegal */ | p = &a[4]; /* legal */ | p[-4], p[-3], p[-2], p[-1]; /* all legal */ A minor clarification - where Chris says "illegal", read "undefined". Admittedly a minor point, but "illegal" might lead one to believe the compiler won't accept the code in question. "Undefined" means the compiler may or may not accept the code, and if accepted, the resulting machine code may or may not do something "useful", which is actually the case. -- Brian L. Matthews blm@6sceng.UUCP