[comp.lang.c] Is &a[NTHINGS] legal

lvc@tut.cis.ohio-state.edu (Lawrence V. Cipriani) (04/30/88)

Is it legal to apply the & (address of) operator to an array
element that is non-existent?  Given:

	sometype a[NTHINGS], *p;

Should:

	for (p = a; p < &a[NTHINGS]; p++)	/* 1 */
		...
be written as:

	for (p = a; p <= &a[NTHINGS-1]; p++)	/* 2 */
		...

I like 1 better than 2 since there are fewer characters to
type and I find it quicker and easier to comprehend.  The
dpANS says & only applies to objects that are not bit fields
or have the register qualifier.  In this example, one could
argue that a[NTHINGS] doesn't even exist so that & should be
invalid on it.  Will 1 be guaranteed to work in ANSI-C?
Thanks,

-- 
Larry Cipriani, AT&T Network Systems and Ohio State University
Domain: lvc@tut.cis.ohio-state.edu
Path: ...!cbosgd!osu-cis!tut.cis.ohio-state.edu!lvc (weird but right)

gwyn@brl-smoke.ARPA (Doug Gwyn ) (04/30/88)

In article <12074@tut.cis.ohio-state.edu> lvc@tut.cis.ohio-state.edu (Lawrence V. Cipriani) writes:
>Is it legal to apply the & (address of) operator to an array
>element that is non-existent?  Given:
>	sometype a[NTHINGS], *p;
>Should:
>	for (p = a; p < &a[NTHINGS]; p++)	/* 1 */
>be written as:
>	for (p = a; p <= &a[NTHINGS-1]; p++)	/* 2 */
>		...
>Will 1 be guaranteed to work in ANSI-C?

Yes, it is.  This kind of code is quite pervasive, and if you
consider that NTHINGS might have been defined as 0 it is impossible
to avoid (in fact in that situation your case 2 is invalid).

Every object must have at least one addressable cell beyond it,
but not necessarily in front of it.  The reason the latter is not
required is that &a[-1] may be MANY bytes in front of allocated
storage if the array element is large, but &a[NTHINGS] will be
just one byte past the valid array locations.

chris@mimsy.UUCP (Chris Torek) (04/30/88)

In article <12074@tut.cis.ohio-state.edu> lvc@tut.cis.ohio-state.edu
(Lawrence V. Cipriani) writes:
>Is it legal to apply the & (address of) operator to an array
>element that is non-existent?

Depends:

>Given:
>	sometype a[NTHINGS], *p;
>Should:
>	for (p = a; p < &a[NTHINGS]; p++)	/* 1 */
>be written as:
>	for (p = a; p <= &a[NTHINGS-1]; p++)	/* 2 */

This is not necessary.

>Will 1 be guaranteed to work in ANSI-C?

Yes.  If necessary, the compiler will put a one-byte (or word or
whatever) shim after the array in that array's address space, so
that &a[NTHINGS] will be meaningful.

Note, however, that the corresponding count-down loop

	for (p = &a[NTHINGS]; --p >= a;)

is not.  In particular, this sort of code fails if &a[-1] `wraps
around' the address space of the array.  It can even fail on a flat
address space machine like the Vax if sizeof(a[0]) is large enough.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

levy@ttrdc.UUCP (Daniel R. Levy) (05/01/88)

In article <7806@brl-smoke.ARPA>, gwyn@brl-smoke.ARPA (Doug Gwyn ) writes:
# In article <12074@tut.cis.ohio-state.edu> lvc@tut.cis.ohio-state.edu (Lawrence V. Cipriani) writes:
# >Is it legal to apply the & (address of) operator to an array
# >element that is non-existent?  Given:
# >	sometype a[NTHINGS], *p;
# >Should:
# >	for (p = a; p < &a[NTHINGS]; p++)	/* 1 */
# >be written as:
# >	for (p = a; p <= &a[NTHINGS-1]; p++)	/* 2 */
# >		...
# >Will 1 be guaranteed to work in ANSI-C?
# 
# Yes, it is.  This kind of code is quite pervasive, and if you
# consider that NTHINGS might have been defined as 0 it is impossible
# to avoid (in fact in that situation your case 2 is invalid).
# 
# Every object must have at least one addressable cell beyond it,
# but not necessarily in front of it.  The reason the latter is not
# required is that &a[-1] may be MANY bytes in front of allocated
# storage if the array element is large, but &a[NTHINGS] will be
# just one byte past the valid array locations.

A picky point if you will:  would not one expect that &a[NTHINGS] must have at
and beyond it at least as many bytes as the smallest object having the same
alignment requirement as an object having the type of a[0]?  Imagine a[]
being laid out in a segment of memory as close to the "top" as possible while
still having &a[NTHINGS] be valid, and you'll see what I mean.
-- 
|------------Dan Levy------------|  Path: ihnp4,<most AT&T machines>!ttrdc!levy
|              AT&T              |  Weinberg's Principle:  An expert is a
|       Data Systems Group       |  person who avoids the small errors while
|--------Skokie, Illinois--------|  sweeping on to the grand fallacy.

norvell@utcsri.UUCP (Theodore Stevens Norvell) (05/02/88)

In article <11289@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes:
>In article <12074@tut.cis.ohio-state.edu> lvc@tut.cis.ohio-state.edu
>(Lawrence V. Cipriani) writes:
>>Is it legal to apply the & (address of) operator to an array
>>element that is non-existent?
>>	for (p = a; p < &a[NTHINGS]; p++)	/* 1 */
>>Will 1 be guaranteed to work in ANSI-C?
>
>Yes.  If necessary, the compiler will put a one-byte (or word or
>whatever) shim after the array in that array's address space, so
>that &a[NTHINGS] will be meaningful.

But, doesn't the draft (my copy is from November) also say that p+i
(where p is a pointer) can only be dereferenced if it points to the
same array as does p? Since &a[NTHINGS] translates to &(*(a+NTHINGS))
and a+NTHINGS does not point to the same array as does a, the dereference
is undefined.  This allows array bounds checking in ANSI C.
Try     for(p = a; p < a+NTHINGS; p++)  /* 3 */

chris@mimsy.UUCP (Chris Torek) (05/02/88)

>>In article <12074@tut.cis.ohio-state.edu> lvc@tut.cis.ohio-state.edu
>>(Lawrence V. Cipriani) writes:
>>>	for (p = a; p < &a[NTHINGS]; p++)	/* legal? */

>In article <11289@mimsy.UUCP> I answered:
>>Yes.

In article <5997@utcsri.UUCP> norvell@utcsri.UUCP (Theodore Stevens Norvell)
writes:
>But, doesn't the draft ... also say that p+i (where p is a pointer)
>can only be dereferenced if it points to the same array as does p?

Yes.

>Since &a[NTHINGS] translates to &(*(a+NTHINGS))

No, not the way you mean this.  `Is equivalent to', not `evaluates as':
the indirection does not occur.  Only the address computation occurs;
the value at a+NTHINGS is not examined.

>... the dereference is undefined.

True.  It also does not occur.

>This allows array bounds checking in ANSI C.

Bounds checking is still possible, but &a[NTHINGS] must be allowed,
while both a[NTHINGS] and &a[NTHINGS+1] should be rejected.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

rcd@ico.ISC.COM (Dick Dunn) (05/03/88)

In article <11289@mimsy.UUCP>, chris@mimsy.UUCP (Chris Torek) writes:
> (Lawrence V. Cipriani) writes:
> >Is it legal to apply the & (address of) operator to an array
> >element that is non-existent?
...
> Yes.  If necessary, the compiler will put a one-byte (or word or
> whatever) shim after the array in that array's address space, so
> that &a[NTHINGS] will be meaningful.

Chris, is this really so?  I understand that &a[NTHINGS] should be valid,
but I don't see why the shim is necessary.  It seems that all you require
is that the address be meaningful and produce correct results on a com-
parison--you can't dereference it because it doesn't exist.

For example, on the 286 (an all-too-frequent counterexample) you can have
contiguous objects up to 2^16 bytes without too much hardship--and since
it is often useful to have arrays of size power-of-two, you might like to
declare, say
	char stuff[65536];
or
	int nonsense[16384];
but you would certainly NOT want the compiler to try to allocate 65537 or
65540 bytes--that just won't work in the normal universe (large model or
less).  There's no reason it should try to do so, however.  It can form an
address &stuff[65536] which is usable for comparison--and address
arithmetic--even though it's not a valid address (nor even a valid segment,
for that matter).

However--here, 286 compiler folks take note--it does require treating
pointer comparison as a 32-bit operation for <, <=, >, >= (not just == and
!=).  The object "stuff" might have an address that looks like 008f0000.
The address of stuff[65535]--the last element--is 008fffff.  The next
valid user-space address (assuming sequential allocation of segments) after 
008fffff is 00970000.  A straightforward calculation of &stuff[65536] will
give 00900000--not a valid address; gives a protection violation from a
user program--but useful for a 32-bit unsigned comparison.
-- 
Dick Dunn      UUCP: {ncar,cbosgd,nbires}!ico!rcd       (303)449-2870
   ...Never attribute to malice what can be adequately explained by stupidity.

chris@mimsy.UUCP (Chris Torek) (05/03/88)

>In article <11289@mimsy.UUCP> I noted that
>>If necessary, the compiler will put a ... shim after the array ....
  -- --------- [emphasis just now added]

In article <4554@ico.ISC.COM> rcd@ico.ISC.COM (Dick Dunn) writes:
>Chris, is this really so?  I understand that &a[NTHINGS] should be valid,
>but I don't see why the shim is necessary.

It may not be.  Given your description of 80286 addressing, it is not
necessary there.  It might be required in others, particularly in other
segmented architectures.  I am not familiar enough with any others
(Burroughs, Honeywell, ...?) to say whether they would.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

carlp@iscuva.ISCS.COM (Carl Paukstis) (05/04/88)

In article <2637@ttrdc.UUCP> levy@ttrdc.UUCP (Daniel R. Levy) writes:
=In article <7806@brl-smoke.ARPA>, gwyn@brl-smoke.ARPA (Doug Gwyn ) writes:
=# In article <12074@tut.cis.ohio-state.edu> lvc@tut.cis.ohio-state.edu (Lawrence V. Cipriani) writes:
=# >Is it legal to apply the & (address of) operator to an array
=# >element that is non-existent?  Given:
=# >	sometype a[NTHINGS], *p;
=# >Should:
=# >	for (p = a; p < &a[NTHINGS]; p++)	/* 1 */
=# >be written as:
=# >	for (p = a; p <= &a[NTHINGS-1]; p++)	/* 2 */
=# >		...
=# >Will 1 be guaranteed to work in ANSI-C?
=# 
=# ...
=# Every object must have at least one addressable cell beyond it,
=# but not necessarily in front of it.  

I thought that the requirement was, in effect, that there must be at least
one valid ADDRESS beyond the composite object.  Maybe this is what you're
saying, but...

=A picky point if you will:  would not one expect that &a[NTHINGS] must have at
=and beyond it at least as many bytes as the smallest object having the same
=alignment requirement as an object having the type of a[0]?  Imagine a[]
=being laid out in a segment of memory as close to the "top" as possible while
=still having &a[NTHINGS] be valid, and you'll see what I mean.

The above notion follows from a quick reading of Mr. Gwyn's original
followup.  It seems, however, that there need not be an addressable OBJECT
beyond the array bound, and it is an error to attempt to dereference
a[NTHINGS+1].  There must be a valid ADDRESS (or at least simulation of
one), so that size calculations will work, mostly, but also so that
bound-checking as proposed in the original posting is possible.

Maybe I've just added to the confusion, here...

-- 
Carl Paukstis    +1 509 927 5600 x5321  |"I met a girl who sang the blues
                                        | and asked her for some happy news
UUCP:     carlp@iscuvc.ISCS.COM         | but she just smiled and turned away"
          ...uunet!iscuva!iscuvc!carlp  |                    - Don MacLean

davidsen@steinmetz.ge.com (William E. Davidsen Jr) (05/05/88)

I don't see that there should be any bounds checking until the pointer
or address is dereferenced. Doing a check is of dubious use and will
probably break as many valid programs as it helps.

Consider:
  char *x, a[NVAL];

  /* if this is legal */
  x = &a[0];
  /* and this is legal */
  x += NVAL;
  /* then why try to make this illegal? */
  x = &a[NVAL];

There are (valid) reasons for wanting to take an address outside the
range of an array. Consider an algorithm in which negative subscripts
are heavily used (say from Pascal, PL/I, Algol-60, etc). You can do an
add each time to normalize the values of the subscript, but it is (a)
slower, and (b) hard to read.

Therefore:
  char *x, a[NVAL];
  x = &a[NVAL+49];		/* for use with negative subscripts */
  x[-50] = 'a';			/* same as a[NVAL] */

If dpANS were going to make C into Pascal, it should have been done by
having real enums instead of the botch we have now. The feature appeared
(I was told) when cpp ran out of symbol space. It was kept as is to "not
break existing programs."
-- 
	bill davidsen		(wedu@ge-crd.arpa)
  {uunet | philabs | seismo}!steinmetz!crdos1!davidsen
"Stupidity, like virtue, is its own reward" -me

gwyn@brl-smoke.ARPA (Doug Gwyn ) (05/05/88)

In article <1450@iscuva.ISCS.COM> carlp@iscuva.ISCS.COM (Carl Paukstis) writes:
>=In article <7806@brl-smoke.ARPA>, gwyn@brl-smoke.ARPA (Doug Gwyn ) writes:
>=# Every object must have at least one addressable cell beyond it,
>=# but not necessarily in front of it.  
>It seems, however, that there need not be an addressable OBJECT
>beyond the array bound, ... There must be a valid ADDRESS ...

I thought this was clear enough.  Obviously, if every object had to have
another finite object behind it, the existence of even one object would
force an infinity of them, which is patently absurd.  It's also not what
I said.

daveb@geac.UUCP (David Collier-Brown) (05/05/88)

In article <1450@iscuva.ISCS.COM> carlp@iscuva.ISCS.COM (Carl Paukstis) writes:
>The above notion follows from a quick reading of Mr. Gwyn's original
>followup.  It seems, however, that there need not be an addressable OBJECT
>beyond the array bound, and it is an error to attempt to dereference
>a[NTHINGS+1].  There must be a valid ADDRESS (or at least simulation of
>one), so that size calculations will work, mostly, but also so that
>bound-checking as proposed in the original posting is possible.

  My understanding is that one has to be able to generate the
address of the last-plus-one'th element of an array:
	thing a[NTHINGS];
	...

	if (x < &a[NTHINGS]) ...

  I'm not so sure about NTHINGS+1, which is the last-plus-two'th
(last-plus-tooth?) element.

  As you might expect, this reqirement identifies a problem with
machines like the DPS-6, which will need the shim at &a[NTHINGS] to
keep from trapping when the address is loaded into a register and
checked for legality.

 --dave
-- 
 David Collier-Brown.                 {mnetor yunexus utgpu}!geac!daveb
 Geac Computers International Inc.,   |  Computer Science loses its
 350 Steelcase Road,Markham, Ontario, |  memory (if not its mind) 
 CANADA, L3R 1B3 (416) 475-0525 x3279 |  every 6 months.

brb@akgua.ATT.COM (Brian R. Bainter) (05/05/88)

From article <12074@tut.cis.ohio-state.edu>, by lvc@tut.cis.ohio-state.edu (Lawrence V. Cipriani):
> Is it legal to apply the & (address of) operator to an array
> element that is non-existent?  Given:
> 
> 	sometype a[NTHINGS], *p;
> 
> Should:
> 
> 	for (p = a; p < &a[NTHINGS]; p++)	/* 1 */
> 		...
> be written as:
> 
> 	for (p = a; p <= &a[NTHINGS-1]; p++)	/* 2 */
> 		...
> 
> I like 1 better than 2 since there are fewer characters to
> type and I find it quicker and easier to comprehend.  The
> dpANS says & only applies to objects that are not bit fields
> or have the register qualifier.  In this example, one could
> argue that a[NTHINGS] doesn't even exist so that & should be
> invalid on it.  Will 1 be guaranteed to work in ANSI-C?

I see no problem with the first examle. Taking into consideration
that a is an address and NTHINGS is an offset to that address,
there should be no problem whatsoever with this construct. If C
was a more crude language which made limit checks on arrays, there
might be a problem in doing something like this. C however does not
make any limit checks for arrays. Arrays are nothing more than an
address and an offset. The exception is when you are defining the
array. At this point the compiler needs to know the limit of the
array so that memory may be allocated.

If you have a question with something like this, it may be most
worthwhile to write a test program and try the construct or algorithm
out. In most cases you can't hurt anything, and you may learn
exactly what is going on with the code. Using a source debugger
such as sdb or even printf statements may help.

          Brian R. Bainter

henry@utzoo.uucp (Henry Spencer) (05/06/88)

> I don't see that there should be any bounds checking until the pointer
> or address is dereferenced. Doing a check is of dubious use and will
> probably break as many valid programs as it helps.

The issue is not whether checks should be inserted deliberately, but whether
the hardware will even permit out-of-range pointers to be computed (with
meaningful results).  For example, if we assume a segmented architecture
in which pointer arithmetic affects only the offset part of the pointer,
with overflow simply wrapping around, the value of &a[10000] may well be
*less* than the value of &a[1].  Or the overflow might cause a trap, in
which case the value is not computable at all.  Short of making pointer
arithmetic much slower, there may be NO WAY TO AVOID THIS.

This isn't imaginary.  I know of at least one machine (not a common one)
in which pointer arithmetic was strictly offset arithmetic, with no carry
into the segment part; I don't remember whether overflow was trapped.

Code which computes &array[n], where array is of size m, and n < 0 or n > m
(ANSI having legitimized n == m), is not portable.  Period.
-- 
NASA is to spaceflight as            |  Henry Spencer @ U of Toronto Zoology
the Post Office is to mail.          | {ihnp4,decvax,uunet!mnetor}!utzoo!henry

biff@eyeball.ophth.bcm.tmc.edu (Brad Daniels) (05/06/88)

In article <12074@tut.cis.ohio-state.edu> lvc@tut.cis.ohio-state.edu
(Lawrence V. Cipriani) writes:
>Is it legal to apply the & (address of) operator to an array
>element that is non-existent?  Given:
>	sometype a[NTHINGS], *p;
>Should:
>	for (p = a; p < &a[NTHINGS]; p++)	/* 1 */
>		...
>be written as:
>	for (p = a; p <= &a[NTHINGS-1]; p++)	/* 2 */
>		...

Whether or not the above is legal, I think it should probably be written as:

	for (p = a; p < (a+NTHINGS); p++)

Or am I missing something obvious?

					- Brad
-
Brad Daniels			|	biff@eyeball.ophth.bcm.tmc.edu
The Low Vision Project		| If money can't buy happiness
Baylor College of Medicine	| I guess I'll have to rent it.  - Weird Al

carlp@iscuva.ISCS.COM (Carl Paukstis) (05/07/88)

In article <2699@geac.UUCP> daveb@geac.UUCP (David Collier-Brown) writes:
>In article <1450@iscuva.ISCS.COM> carlp@iscuva.ISCS.COM ([ME!]) writes:
>>                       .... it is an error to attempt to dereference
>>a[NTHINGS+1].  There must be a valid ADDRESS (or at least simulation of
>>one), so that size calculations will work, mostly, but also so that
>>bound-checking as proposed in the original posting is possible.
>
>  My understanding is that one has to be able to generate the
>address of the last-plus-one'th element of an array:
>	thing a[NTHINGS];
>	...
>
>	if (x < &a[NTHINGS]) ...
>
>  I'm not so sure about NTHINGS+1, which is the last-plus-two'th
>(last-plus-tooth?) element.

I guess it's been at least a week since I last said a[NTHINGS] to
indicate the final element of an array of NTHINGS elements.  I started
programming with FORTRAN IV, which I recall as having 1-based rather than
0-based subscripts.  Old habits die hard.

Of course I really meant to say that it's an error to dereference (is that
the right word, or does that apply only to pointers?) the element
"a[NTHINGS]", which as dave c-b points out is last-plus-one'th element.  I
think my understanding agrees with dave's: &a[NTHINGS] is a valid address,
and pointer arithmetic can be done with it; while a[NTHINGS] is NOT a valid
element and can't be used.  Both &a[NTHINGS+1] and a[NTHINGS+1] are
undefined.

Which is, of course, what doug gwyn was saying in the first place.  I tried
to clarify his statements after I saw some followups which misinterpreted
what he said.  It seems I only added to the confusion.  For this I
apologize, and I will now return to lurking quietly in the shadows...


-- 
Carl Paukstis    +1 509 927 5600 x5321  |"I met a girl who sang the blues
                                        | and asked her for some happy news
UUCP:     carlp@iscuvc.ISCS.COM         | but she just smiled and turned away"
          ...uunet!iscuva!iscuvc!carlp  |                    - Don MacLean

limes@sun.uucp (Greg Limes) (05/07/88)

In article <1086@gazette.bcm.tmc.edu> biff@eyeball.ophth.bcm.tmc.edu.UUCP (Brad Daniels) writes:
>
>Whether or not the above is legal, I think it should probably be written as:
>
>	for (p = a; p < (a+NTHINGS); p++)
>
>Or am I missing something obvious?

What if (a+NTHINGS) wraps around the address space in such a way that
(a+NTHINGS) < a? Loop #1 would execute 0 times, loop #1 would execute
properly ... translating to your style (which looks better to these
eyes),
	for (p = a; p <= (a+NTHINGS-1); p++)
-- 
   Greg Limes [limes@sun.com]			Illigitimi Non Carborundum

chris@mimsy.UUCP (Chris Torek) (05/07/88)

In article <1788@akgua.ATT.COM> brb@akgua.ATT.COM (Brian R. Bainter) writes:
>Taking into consideration that a is an address and NTHINGS is an
>offset to that address, there should be no problem whatsoever with
>this construct.

There are more hosts in heaven and earth, B.B., than are dreamt of
in your philosophy.  (Funny wording to make the syllables come out
right.  `B.B.' is only two syllables, but the accents fix that :-) .)

>C ... does not make any limit checks for arrays.

There is no such promise in the language.  An implementation is free
to check.  It is true that most do not, as this tends to worsen benchmark
numbers, and hence does little for sales.

>If you have a question with something like this, it may be most
>worthwhile to write a test program and try the construct or algorithm
>out.

While this remains good advice, remember that this provides only one
existence proof for a single implementation.  Hence you can say with
utter certainty that at least one C compiler simply adds an offset to
a base.  As it turns out, others do something different.  Fortunately
&a[NTHINGS] is legal by definition (or fiat, as it were), according
to the dpANS.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

gwyn@brl-smoke.ARPA (Doug Gwyn ) (05/08/88)

In article <52339@sun.uucp> limes@sun.UUCP (Greg Limes) writes:
>In article <1086@gazette.bcm.tmc.edu> biff@eyeball.ophth.bcm.tmc.edu.UUCP (Brad Daniels) writes:
>>	for (p = a; p < (a+NTHINGS); p++)
>What if (a+NTHINGS) wraps around the address space in such a way that
>(a+NTHINGS) < a? Loop #1 would execute 0 times, loop #1 would execute
>properly ... translating to your style (which looks better to these
>eyes),
>	for (p = a; p <= (a+NTHINGS-1); p++)

That wasn't the fellow's question.

In any case, your suggestion fails for NTHINGS==0, since a-1 CAN wrap
around the address space; the original CANNOT wrap around (it is
forbidden by the language rules).  You got this exactly backwards.

richardh@killer.UUCP (Richard Hargrove) (05/08/88)

In article <1988May5.194916.1971@utzoo.uucp>, henry@utzoo.uucp (Henry Spencer) writes:
> This isn't imaginary.  I know of at least one machine (not a common one)
> in which pointer arithmetic was strictly offset arithmetic, with no carry
> into the segment part; I don't remember whether overflow was trapped.
> 

This will be the case in the most useful operational mode of a soon-to-be
very common architecture: protected, USE32 mode of the 80386, where segments
have a maximum limit of 2^32 btes but generally have much smaller actual
limits (maintained in the segment descriptor). Bound checking code will 
have to be VERY complicated to keep from generating an exception if a 
computed offset exceeds the segment limit (the limit must be read from the 
segment descriptor entry in the GDT or LDT, neither of which was designed
to be accessible from the user's code). Or else a 4 gigabyte limit must be
installed for every defined segment, which seems both unlikely and self-
defeating.

For what it's worth, the Intel 80386 tool, bnd386, adds pad bytes at the
end of defined segments to ensure, among other things, that a reference
such as a[NTHINGS] will NOT generate an exception (however a[NTHINGS+1]
may, depending on sizeof(a[0]) and segment alignment requirements).

...!{ihnp4 | codas | cbosgd}!killer!richardh
--------------------------------------------

jimp@cognos.uucp (Jim Patterson) (05/11/88)

In article <52339@sun.uucp> limes@sun.UUCP (Greg Limes) writes:
>In article <1086@gazette.bcm.tmc.edu> biff@eyeball.ophth.bcm.tmc.edu.UUCP (Brad Daniels) writes:
>>
>>Whether or not the above is legal, I think it should probably be written as:
>>
>>	for (p = a; p < (a+NTHINGS); p++)
>>
>>Or am I missing something obvious?
>
>What if (a+NTHINGS) wraps around the address space in such a way that
>(a+NTHINGS) < a?

The ANSI C draft explicitly allows for the syntax shown above (where a is
declared "a[NTHINGS]". This is found in section 3.3.6 "Additive Operators"
of the draft. The last sentence reads:

    "However, if P points to the last member of an array object, the
    expression (P+1) - P has the value 1, even though P+1 does not 
    point to a member of the array object".

(ANSI Doc X3J11/88-002 page 48). 

This is further clarified in the Rationale document. However, the
intention is that such expressions should work as expected. In
practice, the effect is that generally at least one addressable byte
needs to follow any declared array such that its address follows that
of the last array element.

So, according to the ANSI committee, &a[NTHINGS] IS legal.
-- 
Jim Patterson                              Cognos Incorporated
UUCP:decvax!utzoo!dciem!nrcaer!cognos!jimp P.O. BOX 9707    
PHONE:(613)738-1440                        3755 Riverside Drive
                                           Ottawa, Ont  K1G 3Z4

peter@ficc.UUCP (Peter da Silva) (05/16/88)

In article <4023@killer.UUCP>, richardh@killer.UUCP (Richard Hargrove) writes:
> In article <1988May5.194916.1971@utzoo.uucp>, henry@utzoo.uucp (Henry Spencer) writes:
> > This isn't imaginary.  I know of at least one machine (not a common one)
> > in which pointer arithmetic was strictly offset arithmetic, with no carry
> > into the segment part; I don't remember whether overflow was trapped.

> This will be the case in ... protected USE32 mode of the 80386 ...

On the other hand I very much suspect that any useful 'C' compiler on the
386 will blow off that segment stuff and just stick everything in one big
segment. In which case a single pad at the end will solve the problem.

tneff@atpal.UUCP (Tom Neff) (05/20/88)

In article <778@.UUCP> peter@ficc.UUCP (Peter da Silva) writes:
>On the other hand I very much suspect that any useful 'C' compiler on the
>386 will blow off that segment stuff and just stick everything in one big
>segment...

Any *truly* useful 'C' compiler on the 386 will let the programmer
choose his own segmentation model like a big boy.  Accessing big
segments is a sine qua non, but forcibly limiting people to the "flat"
model would be fatal in the marketplace.  Fortunately nobody does this
or is planning to that I know of, so the subject need only come up in
net discussions. :-)


-- 
Tom Neff			UUCP: ...uunet!pwcmrd!skipnyc!atpal!tneff
	"None of your toys	CIS: 76556,2536		MCI: TNEFF
	 will function..."	GEnie: TOMNEFF		BIX: are you kidding?

peter@ficc.UUCP (Peter da Silva) (06/02/88)

In article <164@atpal.UUCP>, tneff@atpal.UUCP (Tom Neff) writes:
> In article <778@.UUCP> peter@ficc.UUCP (Peter da Silva) writes:
> >On the other hand I very much suspect that any useful 'C' compiler on the
> >386 will blow off that segment stuff and just stick everything in one big
> >segment...

> Any *truly* useful 'C' compiler on the 386 will let the programmer
> choose his own segmentation model like a big boy.

Why is this any more important than (say) allowing people to choose
which register to use for subroutine linkage on the PDP-11 or 68000?
This is something that's much more important (at least if you want
your 'C' to interface with other PDP-11 languages).

> Accessing big
> segments is a sine qua non, but forcibly limiting people to the "flat"
> model would be fatal in the marketplace.

Why? What reason could you possibly have, within the lifespan of the
80386, for using anything but a flat model in 'C'? 'C' doesn't apply
very well to intel segments in the first place (if you don't believe
me, explain "memory models": there are languages that handle segments
much more gracefully (PL/M-86, for example), but 'C' falls flat on its
face), and with 32 bits of address space per segment the need for even
dealing with them goes away.
-- 
-- Peter da Silva, Ferranti International Controls Corporation.
-- Phone: 713-274-5180. Remote UUCP: uunet!nuchat!sugar!peter.