[comp.lang.c] Contiguous Arrays

dmg@ssc-vax.UUCP (David Geary) (02/21/89)

  Are ALL arrays in C contiguious?  If I declare an array:

  int  x[10];

  am I GUARANTEED that x[0] through x[9] will occupy contiguous memory,
  or is the compiler free to scatter the elements around in memory?

  I have a difficult time imagining implementing C without contiguous
  arrays, but is the compiler free to implement non-contiguous arrays
  if so desired?

  I have a hard time understanding how one could implement non-contiguous
  arrays, especially when you have functions such as strcpy().  

-- 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~ David Geary, Boeing Aerospace,               ~ 
~ #define    Seattle     RAIN                  ~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

gwyn@smoke.BRL.MIL (Doug Gwyn ) (02/21/89)

In article <2508@ssc-vax.UUCP> dmg@ssc-vax.UUCP (David Geary) writes:
>  I have a difficult time imagining implementing C without contiguous
>  arrays, but is the compiler free to implement non-contiguous arrays
>  if so desired?

Only if the compiler can determine that the program won't notice the
difference.  In practice, no compiler is going to do this.

ark@alice.UUCP (Andrew Koenig) (02/21/89)

In article <2508@ssc-vax.UUCP>, dmg@ssc-vax.UUCP (David Geary) writes:
> 
>   Are ALL arrays in C contiguious?  If I declare an array:
> 
>   int  x[10];
> 
>   am I GUARANTEED that x[0] through x[9] will occupy contiguous memory,
>   or is the compiler free to scatter the elements around in memory?

You are guaranteed that if you say

	int *p;

	p = &x[k];		/* 0 <= k < 9 */
	p++;

then p now points to x[k+1].  It is possible to execute `p++' in
a separately compiled routine.

For that reason it's hard to see how a C implementation could possibly
do anything but put x in contiguous memory.
-- 
				--Andrew Koenig
				  ark@europa.att.com

badri@valhalla.ee.rochester.edu (Badri Lokanathan) (02/22/89)

> In article <2508@ssc-vax.UUCP>, dmg@ssc-vax.UUCP (David Geary) writes:
> >   Are ALL arrays in C contiguious?  If I declare an array:
> > 
> >   int  x[10];
> > 
> >   am I GUARANTEED that x[0] through x[9] will occupy contiguous memory,
> >   or is the compiler free to scatter the elements around in memory?

In article <8943@alice.UUCP>, ark@alice.UUCP (Andrew Koenig) writes:
> You are guaranteed that if you say
> 
> 	int *p;
> 
> 	p = &x[k];		/* 0 <= k < 9 */
> 	p++;
> 
> then p now points to x[k+1].  It is possible to execute `p++' in
> a separately compiled routine.

On a similar note, just yesterday a friend and I were discussing
tricks to access arrays with different ranges.
For instance, if the array was to be x[101] to x[110] instead of
x[0] to x[9], what is the easiest way to do it?
For automatic arrays one could do
int space[10], *x; 
x = &space[0] - 101;

For malloc'd arrays, one could do something like

int *make_array(init_index,final_index)
int init_index, final_index;
{
	int *foo;

	foo = (int *) malloc((unsigned) (final_index-initial_index+1));
	return (foo - init_index);
}

Kind of ugly; only a pascal programmer would think of such grossness (:-)

In any case, I am hard pressed to think of reasons for wanting
to store an automatic array in non-contiguous space.
-- 
"I care about my fellow man              {) badri@ee.rochester.edu
 Being taken for a ride,                //\\ {ames,cmcl2,columbia,cornell,
 I care that things start changing     ///\\\ garp,harvard,ll-xn,rutgers}!
 But there's no one on my side."-UB40   _||_   rochester!ur-valhalla!badri

hascall@atanasoff.cs.iastate.edu (John Hascall) (02/22/89)

In article <8943@alice.UUCP> ark@alice.UUCP (Andrew Koenig) writes:
>In article <2508@ssc-vax.UUCP>, dmg@ssc-vax.UUCP (David Geary) writes:
 
>>   Are ALL arrays in C contiguious?  If I declare an array:...

>You are guaranteed that if you say

>	int *p;

>	p = &x[k];		/* 0 <= k < 9 */
>	p++;

>then p now points to x[k+1].  It is possible to execute `p++' in
>a separately compiled routine.

>For that reason it's hard to see how a C implementation could possibly
>do anything but put x in contiguous memory.


     How about:

	 Assume int's are (say) 2 bytes.  Assume further that your
	 machine is stupid, and all accesses must be on an 8-byte boundary.

	 Suppose x[0] ends up at 0x01000, then
		 x[1] ends up at 0x01008, and
		 x[2]            0x01010, etc....

       address (0x01000 base):    0 1 2 3 4 5 6 7 8 9 a b c d e f 0 1 ...
	 x[?]                     0 0             1 1             2 2

    A non-contiguous array, and the compiler can always generate

	    p <- p + 8

    for p++; since it knows about the 8-byte alignment business.

  Or have I forgotten something?
  John Hascall / ISU Comp Center

henry@utzoo.uucp (Henry Spencer) (02/23/89)

In article <1828@valhalla.ee.rochester.edu> badri@valhalla.ee.rochester.edu (Badri Lokanathan) writes:
>For instance, if the array was to be x[101] to x[110] instead of
>x[0] to x[9], what is the easiest way to do it?
>For automatic arrays one could do
>int space[10], *x; 
>x = &space[0] - 101;

No one couldn't, not if one wants to be portable.  There is absolutely no
guarantee that x[101] will be the same as space[0], and in fact there is
no guarantee that the computation of x won't trap and give you a core dump.
On a sufficiently odd segmented machine it might.  Even K&R1 warns you that
pointer arithmetic won't necessarily do what you think unless you keep it
within an array.
-- 
The Earth is our mother;       |     Henry Spencer at U of Toronto Zoology
our nine months are up.        | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

walter@hpclwjm.HP.COM (Walter Murray) (02/23/89)

>  Are ALL arrays in C contiguious?

From the latest draft:

   "An array type describes a contiguously allocated nonempty set
   of objects with a particular member object type."  Section 3.1.2.5.

Walter Murray
-------------

badri@valhalla.ee.rochester.edu (Badri Lokanathan) (02/23/89)

> In article <1828@valhalla.ee.rochester.edu>, I said
> >For instance, if the array was to be x[101] to x[110] instead of
> >x[0] to x[9], what is the easiest way to do it?
> >For automatic arrays one could do
> >int space[10], *x; 
> >x = &space[0] - 101;

In article <1989Feb22.171441.7957@utzoo.uucp>,
henry@utzoo.uucp (Henry Spencer) writes:
> No one couldn't, not if one wants to be portable.  There is absolutely no
> guarantee that x[101] will be the same as space[0], and in fact there is
> no guarantee that the computation of x won't trap and give you a core dump.
> On a sufficiently odd segmented machine it might.  Even K&R1 warns you that
> pointer arithmetic won't necessarily do what you think unless you keep it
> within an array.

First, a disclaimer. My suggestion was just a possibility: personally
I would never use it: if at all, I would perform preprocessor abuse.
Now to Henry's reply:

Henry is probably referring to the comments on pages 98 and 189 of K&R1.
My argument is, since x is an independent variable (a pointer, but
still a variable,) there is no way of checking any array bounds.
Thus in order for consistency with the definition of pointer
arithmetic,
x -= i; x += i;
should result in the original value of x (provided the resulting
intermediate values are within pointer ranges.)
Since x can be initialized to any value, this should always be possible,
the result after each operation being an address appropriately offset
from the original object. Yes, accessing the contents of an illegal location
will almost certainly screw things up, but there is no access happening.

Further, the + and - operations group left to right and are
associative. Note that in C, a[i] is completely equivalent to *(a+i)
if a is a pointer (page 94.) But Henry's statement implies that on some
architectures

(x + i) - i != x + (i - i)

Sigh. Are we biblical scholars or what?
-- 
"Don't blame me for wanting more         {) badri@ee.rochester.edu
 The facts are too hard to ignore       //\\ {ames,cmcl2,columbia,cornell,
 I'm scared to death of poverty        ///\\\ garp,harvard,ll-xn,rutgers}!
 I only want what's best for me."-UB40   /\    rochester!ur-valhalla!badri

gwyn@smoke.BRL.MIL (Doug Gwyn ) (02/23/89)

In article <830@atanasoff.cs.iastate.edu> hascall@atanasoff.cs.iastate.edu (John Hascall) writes:
>	 Assume int's are (say) 2 bytes.  Assume further that your
>	 machine is stupid, and all accesses must be on an 8-byte boundary.

Then sizeof(int) must be 8, not 2.

chris@mimsy.UUCP (Chris Torek) (02/23/89)

In article <1831@valhalla.ee.rochester.edu> badri@valhalla.ee.rochester.edu
(Badri Lokanathan) writes:
>... But Henry's statement implies that on some architectures
>
>(x + i) - i != x + (i - i)
/* where x is a pointer */

This is correct: on some architectures, if `x' is one of the integral
types except the unsigned types, or if x is a pointer type, x + i may
cause an overflow trap, iff the address computed by x+i is not within
the object denoted by x.  (That is: on some machines, there exists an
integer $i$ such that $x+i = \error$.)

The trick described is, however, portable whenever the `zero point' of
the derived array is contained within the array, because in that case
we know that 0 <= i <= N (where N is the number of elements in the array)
and the pANS guarantees that this address is computable (does not cause a
range error).  (That is: $\forall i \elem [0,N], x+i$ is computable, and
$(x+i)-i = x$.)
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

gwyn@smoke.BRL.MIL (Doug Gwyn ) (02/23/89)

In article <1831@valhalla.ee.rochester.edu> badri@valhalla.ee.rochester.edu (Badri Lokanathan) writes:
>Yes, accessing the contents of an illegal location
>will almost certainly screw things up, but there is no access happening.

The problem is that even computing the address of an out-of-bounds location
can screw up (and it DOES, on some widespread segmented architectures).
Henry is correct; just because YOU know what the result of such pointer
arithmetic "should" be, does not mean that it will actually turn out that
way at run time.

chris@mimsy.UUCP (Chris Torek) (02/23/89)

In article <830@atanasoff.cs.iastate.edu> hascall@atanasoff.cs.iastate.edu
(John Hascall) writes:
>How about:
>
>Assume int's are (say) 2 bytes.  Assume further that your
>machine is stupid, and all accesses must be on an 8-byte boundary. ...
>
>	address (0x01000 base):   0 1 2 3 4 5 6 7 8 9 a b c d e f 0 1 ...
>	 x[?]                     0 0             1 1             2 2
>
>A non-contiguous array, and the compiler can always generate
>
>	p <- p + 8
>
>for p++; since it knows about the 8-byte alignment business.

The trick here is that it *appears* contiguous.  You cannot tell
(without stepping outside the bounds of the C language proper)
that there are holes.

A more interesting fact is that, because of the array-of-E /
pointer-to-E conversion rule, given a declaration like

	int x[3][2];

and such a compiler, you get this layout:

	addr:	0 1 2 3 4 5 6 7 8 9 a b c d e f 0 1 2 3 4 5 6 7 8 9 ...
		---		---		---		---
		 ^		 ^		 ^		 ^
	      x[0][0]	      x[0][1]	      x[1][0]	      x[1][1]

On a more typical machine, you get:

	addr:	0 1 2 3 4 5 6 7 8 9 a b
		--- --- --- --- --- ---
		0/0 0/1 1/0 1/1 2/0 2/1

If your compiler lets you get away with writing x[0][2]---if it does
not abort with a subscript range error---it will always do it by
talking to x[1][0].  (If C did not have malloc, the compiler could
arrange things otherwise.)
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

gordon@sneaky.TANDY.COM (Gordon Burditt) (02/23/89)

>On a similar note, just yesterday a friend and I were discussing
>tricks to access arrays with different ranges.
>For instance, if the array was to be x[101] to x[110] instead of
>x[0] to x[9], what is the easiest way to do it?
>For automatic arrays one could do
>int space[10], *x; 
>x = &space[0] - 101;

UNPORTABILITY ALERT!

There is no guarantee that there is a way to represent &space[0] - 101
at all.  Further, computing it may cause overflows such that

	(&space[0] - 101) + 101 != &space[0]

It could happen that (&space[0] - 101) == (int *) NULL.  Even computing
&space[0] - 101 could cause core dumps (MSDOS equivalent:  hang system
forcing use of RESET, optionally trashing FAT on hard disk)
On *86 machines, consider what happens when space starts within 100
bytes of physical address 0.  Having it within 100 bytes of the data
segment origin may cause problems also.  These problems may vary with
model, load address, and the phase of the moon.

Given the declaration:
	int	space[10];
then
	&space[i]
has guaranteed meaning when i = 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10, and 
no other values of i.  You can't use space[10] because it's out of range, 
but &space[10] is guaranteed to have a representation by the ANSI standard.

	struct foobar space[10], *x;
	/* Right */
	for (x = &space[0]; x < &space[10]; x++)
		x->data = 0;

	/* Wrong - this loop may never execute */
	for (x = &space[9]; x > &space[-1]; x--)
		x->data = 0;
I got bit on a loop like the above on a 68000.  As it happened, &space[0]
was less than sizeof(struct foobar), so &space[-1] was positive and very
large.  Since &space[9] < &space[-1], the loop never executed.


					Gordon L. Burditt
					...!texbell!sneaky!gordon

cik@l.cc.purdue.edu (Herman Rubin) (02/23/89)

In article <1831@valhalla.ee.rochester.edu>, badri@valhalla.ee.rochester.edu (Badri Lokanathan) writes:
> > In article <1828@valhalla.ee.rochester.edu>, I said
| < <For instance, if the array was to be x[101] to x[110] instead of
| < <x[0] to x[9], what is the easiest way to do it?
| < <For automatic arrays one could do
| < <int space[10], *x; 
| < <x = &space[0] - 101;
> 
> In article <1989Feb22.171441.7957@utzoo.uucp>,
> henry@utzoo.uucp (Henry Spencer) writes:
> > No one couldn't, not if one wants to be portable.  There is absolutely no
> > guarantee that x[101] will be the same as space[0], and in fact there is
> > no guarantee that the computation of x won't trap and give you a core dump.

The desired effect should be obtainable by a struct.  But what is really 
wanted here is the Fortran EQUIVALENCE statement to guarantee that the 
right thing is done.  Another way that this can be done is to use some
assembler code to force the arrays to line up.

Why do language gurus work so hard to keep us from doing the obvious?
-- 
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907
Phone: (317)494-6054
hrubin@l.cc.purdue.edu (Internet, bitnet, UUCP)

gwyn@smoke.BRL.MIL (Doug Gwyn ) (02/24/89)

In article <1146@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes:
>The desired effect should be obtainable by a struct.

??

>But what is really wanted here is the Fortran EQUIVALENCE statement ...

Gag, choke.  Ask any competent computer/language architect what he
thinks about FORTRAN'S EQUIVALENCE statement.  It really constrains
implementations in generally undesirable ways.

>Another way that this can be done is to use some
>assembler code to force the arrays to line up.

The idea wasn't to get x[] and space[] to "line up" so much as it
was to introduce space[] as a kludge in an attempt to avoid allocating
space for the unreferenced part of x[].  This doesn't work because of
the impossibility of computing out-of-range addresses for space[] on
some (e.g. segmented) architectures.

>Why do language gurus work so hard to keep us from doing the obvious?

Most Algol-like languages, and even FORTRAN-77, allow the exact range
of valid subscripts to be specified for an array.  C is rather unusual
in maintaining a simple model of array, always starting at [0].  There
are tradeoffs here; C's model has much nicer abstract properties, but
in exchange if the programmer wants offsets he has to arrange to get
them himself.  Fortunately that is quite trivial to accomplish, due
to the power of the C preprocessor.  In fact we discussed how to do
that in this newsgroup not very many months ago.

henry@utzoo.uucp (Henry Spencer) (02/24/89)

In article <1831@valhalla.ee.rochester.edu> badri@valhalla.ee.rochester.edu (Badri Lokanathan) writes:
>My argument is, since x is an independent variable (a pointer, but
>still a variable,) there is no way of checking any array bounds.
>Thus in order for consistency with the definition of pointer
>arithmetic,
>x -= i; x += i;
>should result in the original value of x (provided the resulting
>intermediate values are within pointer ranges.)

Exactly -- provided the intermediate values do not cause an overflow.
ON SOME MACHINES THEY CAN.  It is not true that all machines have a
single linear homogenous address space.  Some have it broken up into
pieces, and bad things can happen if your pointer arithmetic goes over
the edge of a piece.

>Further, the + and - operations group left to right and are
>associative...

In the absence of overflow.
-- 
The Earth is our mother;       |     Henry Spencer at U of Toronto Zoology
our nine months are up.        | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

henry@utzoo.uucp (Henry Spencer) (02/25/89)

In article <1146@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes:
>... what is really 
>wanted here is the Fortran EQUIVALENCE statement to guarantee that the 
>right thing is done...
>
>Why do language gurus work so hard to keep us from doing the obvious?

Perhaps because, if one's brain has not been warped by too much Fortran,
the obvious solution is different:

	#define	x(n)	(space[(n)-101])

This involves one annoyance, the change from x[n] to x(n).  (But then,
a Fortran enthusiast shouldn't mind that! :-))  But it cures the problem
completely AND PORTABLY.  Furthermore, it shouldn't hurt the efficiency
of the code, because on machines where the original trickery was possible,
any decent compiler should implement the portable code that way, by folding
constants in the address calculation.  Doing that is one #@#%$%@# of a
lot easier than implementing EQUIVALENCE...
-- 
The Earth is our mother;       |     Henry Spencer at U of Toronto Zoology
our nine months are up.        | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

chasm@killer.DALLAS.TX.US (Charles Marslett) (02/25/89)

In article <7799@sneaky.TANDY.COM>, gordon@sneaky.TANDY.COM (Gordon Burditt) writes:
 > >On a similar note, just yesterday a friend and I were discussing
 > >tricks to access arrays with different ranges.
 > >For instance, if the array was to be x[101] to x[110] instead of
 > >x[0] to x[9], what is the easiest way to do it?
 > >For automatic arrays one could do
 > >int space[10], *x; 
 > >x = &space[0] - 101;
 > 
 > UNPORTABILITY ALERT!
 > 
 > There is no guarantee that there is a way to represent &space[0] - 101
 > at all.  Further, computing it may cause overflows such that
 > 
 > 	(&space[0] - 101) + 101 != &space[0]

Please explain?  What machine does this, or what architecture would allow
the machine or programming language to do it?

 > It could happen that (&space[0] - 101) == (int *) NULL.  Even computing
 > &space[0] - 101 could cause core dumps (MSDOS equivalent:  hang system
 > forcing use of RESET, optionally trashing FAT on hard disk)

This is not necessarily a condemnation of any computer that shows such lack
of concern for basic mathematics, but I would expect a computer that does
anything but return equality in the above expression to be broken as a
mathematical engine.  If it is acceptable for addition and subtraction to
be other than inverse operations (taking into account the possibility of
an error trap or whatever) the thing you are using cannot support any
programming language I know of without simulating integer arithmetic with
some well protected external unit.

And to address the question of errors, if we limit arithmetic to addition
and subtraction of integer quantities, any computer that supports both
signed and unsigned arithmetic will have to be able to generate the result
of that expression without a fault or even an excessive number of machine
cycles wasted.  To make it even more serious, I do not know of any twos-
complement computer that even has a signed vs. unsigned add instruction.

So I have convinced myself that if this is really a problem it is a failure
of the compiler (over optimization or use of a specialized instruction that
really is inadequate?).  Can anyone get a bit more explicit about how this
bit of "fiction" in the addressing of an array fails on any computer, anywhere?
It works on the compilers I know of for everything from 1802s (with no real
ALU to speak of) to IBM mainframes (if you are careful) to dozens of minis.
The only well know segmented architecture it might fail on is Burroughs', and
I say that only because it is the only one I know of that I don't know anything
about ... ;^).

Or did I completely miss the point (are addition and subtraction not required
of computers in these RISCy days?)???
 > On *86 machines, consider what happens when space starts within 100
 > bytes of physical address 0.  Having it within 100 bytes of the data
 > segment origin may cause problems also.  These problems may vary with
 > model, load address, and the phase of the moon.
 > 
 > Given the declaration:
 > 	int	space[10];
 > then
 > 	&space[i]
 > has guaranteed meaning when i = 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10, and 
 > no other values of i.  You can't use space[10] because it's out of range, 
 > but &space[10] is guaranteed to have a representation by the ANSI standard.

Here I agree, the standard (and most earlier compilers) guarantee you can use
&space[10], but in some cases I would bet this is not implemented (say in an
Intel box, where space is an integer array space[32768]).  &space[11] is not
required to be representable by the standard.  It just seems to be nearly
impossible to prevent it from being representable without exerting extra
effort.

 > 
 > 	struct foobar space[10], *x;
 > 	/* Right */
 > 	for (x = &space[0]; x < &space[10]; x++)
 > 		x->data = 0;
 > 
 > 	/* Wrong - this loop may never execute */
 > 	for (x = &space[9]; x > &space[-1]; x--)
 > 		x->data = 0;

This particular problem seems to me to have nothing to do with whether
space[9-9] is the same as space[0] or not.  That is, to speak more
precisely, (space-9)[9] is the same as space[0].

 > . . .
 > 
 > 					Gordon L. Burditt
 > 					...!texbell!sneaky!gordon

Charles Marslett
===========================================================================
[Former mathematician, former compiler writer, former ...]
Charles Marslett
STB Systems, Inc.  <== Apply all standard disclaimers
Wordmark Systems   <== No disclaimers required -- that's just me
chasm@killer.dallas.tx.us

chasm@killer.DALLAS.TX.US (Charles Marslett) (02/25/89)

In response to the discussion on using pointers to non-memory locations (or
non-pointers as pointers, or . . .),
In article <1989Feb24.171102.10397@utzoo.uucp>, henry@utzoo.uucp (Henry Spencer) writes:
> 	#define	x(n)	(space[(n)-101])
> 
> This involves one annoyance, the change from x[n] to x(n).  (But then,
> a Fortran enthusiast shouldn't mind that! :-))  But it cures the problem
> completely AND PORTABLY.  Furthermore, it shouldn't hurt the efficiency
> of the code, because on machines where the original trickery was possible,
> any decent compiler should implement the portable code that way, by folding
> constants in the address calculation.  Doing that is one #@#%$%@# of a
> lot easier than implementing EQUIVALENCE...

And I might add that on machines where direct memory addressing is faster than
register indirect addressing (Crays or Cybers maybe?, do any exist, really?),
this would be faster than referencing the array through the pointer in the
original statement.

> -- 
> The Earth is our mother;       |     Henry Spencer at U of Toronto Zoology
> our nine months are up.        | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

Charles Marslett
[Future enquirng mind . . .]

===========================================================================
Charles Marslett
STB Systems, Inc.  <== Apply all standard disclaimers
Wordmark Systems   <== No disclaimers required -- that's just me
chasm@killer.dallas.tx.us

ark@alice.UUCP (Andrew Koenig) (02/26/89)

In article <7309@killer.DALLAS.TX.US>, chasm@killer.DALLAS.TX.US (Charles Marslett) writes:

>  > There is no guarantee that there is a way to represent &space[0] - 101
>  > at all.  Further, computing it may cause overflows such that
>  > 
>  > 	(&space[0] - 101) + 101 != &space[0]
> 
> Please explain?  What machine does this, or what architecture would allow
> the machine or programming language to do it?

The easiest case is if I give you a strictly checking C implementation
in which evaluating &space[0]-101 causes your program to terminate
with an error diagnostic.

For such reasons, C is explicit about what addresses can validly
be generated.  If A is an array with n elements, then &a[0],
&a[1], ... &a[n-1] are all valid addresses and can be used
freely.  Moreover, &a[n] can be generated and used for arithmetic
and comparisons, provided that no attempt is ever made to examine
or change the value of a[n].  A program that attempts to generate
the address of any other `element' of A is an invalid C program
and the implementation is permitted to do as it pleases with it.
-- 
				--Andrew Koenig
				  ark@europa.att.com

gwyn@smoke.BRL.MIL (Doug Gwyn ) (02/26/89)

In article <7309@killer.DALLAS.TX.US> chasm@killer.DALLAS.TX.US (Charles Marslett) writes:
>This is not necessarily a condemnation of any computer that shows such lack
>of concern for basic mathematics, but I would expect a computer that does
>anything but return equality in the above expression to be broken as a
>mathematical engine.  If it is acceptable for addition and subtraction to
>be other than inverse operations (taking into account the possibility of
>an error trap or whatever) the thing you are using cannot support any
>programming language I know of without simulating integer arithmetic with
>some well protected external unit.

First of all, you're wrong.  The problem with out-of-range pointer
arithmetic arises from the way that addresses have to be represented
in a segmented architecture: a (segment identifier, offset) pair.
Any arithmetic that would result in an out-of-segment address is simply
illegal; anything could happen, including hardware trap of the illegal
operation.  However, in-range address computation presents no problem.
I know of no Algol-like programming language that guarantees the ability
to compute out-of-range pointers even as an intermediate step in a
longer computation, probably because such a guarantee would severely
constrain implementations on some architectures, requiring considerably
less efficient data representations and generated code.

Secondly, no computer fully supports the kind of real-number arithmetic
(including integer arithmetic) that you were taught in school.  Instead
of saying that therefore all computers are broken, which is not helpful,
most of us learn how to remain within the range of validity of actual
hardware operations.  The proposed C Standard makes explicit what is
guaranteed here, and it does not constrain the programmer any more than
the facts of reality already do.  Out-of-range address computation
causes trouble sooner or later; I've seen many examples of that already.

>To make it even more serious, I do not know of any twos-
>complement computer that even has a signed vs. unsigned add instruction.

They are one and the same.

henry@utzoo.uucp (Henry Spencer) (02/26/89)

In article <9718@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>) writes:
>>To make it even more serious, I do not know of any twos-
>>complement computer that even has a signed vs. unsigned add instruction.
>
>They are one and the same.

Only if the instructions don't do anything drastic about overflow.  If they
do, then the two are different, as on (for example) the MIPS machines, which
have (if memory serves) a signed add that traps overflows and an unsigned
add that just wraps around.
-- 
The Earth is our mother;       |     Henry Spencer at U of Toronto Zoology
our nine months are up.        | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

throopw@agarn.dg.com (Wayne A. Throop) (02/28/89)

> henry@utzoo.uucp (Henry Spencer)
> [... avoid calculating addresses that fall outside an array because ...]
> It is not true that all machines have a
> single linear homogenous address space.  Some have it broken up into
> pieces, and bad things can happen if your pointer arithmetic goes over
> the edge of a piece.

Several people have stated the problem as if it were absent on
architectures with linear homogenous address spaces.  This is not the
case.  Even linear homogenous address spaces can have over (and under)
flow problems for address calculations occuring on arrays allocated
"near" the beginning or ending of the whole address space.

The example given was adjusting the zero-point of an array like so:

     int space[10], *x=(space-101);

so that x[101] through x[110] would refer to allocated chunks of memory,
without wasting the space for elements 0-100.  In other words, a poor
person's (int [101:110]) type.

My point is this.  Assumng for whatever reason that one has a large,
homogenous, linear address space, and pointer arithmetic checks for
overflow and underflow... AND the "space" array is allocated within
100 integer-sized words from the beginning of the address space.
Naturally, the calculation that initializes x will fail.

And most important to note: LOOK MA: NO SEGMENTATION!

( It seems to me that Henry and Chris and other gurus are aware of this
  fact.  I just thought it ought to be pointed out more strongly that
  segmentation isn't at fault here. )  (For once.)
--
"Here, look him up.  Oh yeah, it's under soh-creits."
                              --- Bill and Ted's Excellent Adventure
Wayne Throop      <the-known-world>!mcnc!rti!xyzzy!throopw

gordon@sneaky.TANDY.COM (Gordon Burditt) (02/28/89)

> > UNPORTABILITY ALERT!
> > There is no guarantee that there is a way to represent &space[0] - 101
> > at all.  Further, computing it may cause overflows such that
> > 
> > 	(&space[0] - 101) + 101 != &space[0]
>Please explain?  What machine does this, or what architecture would allow
>the machine or programming language to do it?

Ok, I'm going to use the Intel 80386 as an example.  There are lots of
combinations of memory models and processor modes to consider.
It is perhaps possible to make unportable code that creates pointers that
run off the end of arrays like this "work right" but it comes at a very 
large cost in code size and execution time.

Problem 1:  in protected mode, loading a bad segment number (putting a bad
pointer into a register) causes traps.
Problem 2:  in non-protected mode, the pointer math necessary to 
handle both out-of-range pointers and arrays > 64k is very expensive.
It is possible to blow it in a way that works on an 8086 and fails on an 80386.

> > It could happen that (&space[0] - 101) == (int *) NULL.  Even computing
> > &space[0] - 101 could cause core dumps (MSDOS equivalent:  hang system
> > forcing use of RESET, optionally trashing FAT on hard disk)
>
>This is not necessarily a condemnation of any computer that shows such lack
>of concern for basic mathematics, but I would expect a computer that does
>anything but return equality in the above expression to be broken as a

*POINTER* arithmetic is not basic mathematics.  Segmented architectures
usually have a segment number, which is ordinarily a "magic token" that
you don't do math on, and an offset, which you can do math on.  How (and
whether) you handle carries from the offset to the segment depends on
the memory model.  There isn't usually an "increment pointer" instruction.

* * * WARNING * * * READING FURTHER MAY INDUCE VOMITING * * * WARNING * * *

Consider the 80386 protected mode (with 16-bit addressing).  A segment 
register (16 bits) contains an index (segment number), a table indicator 
(global or local), and a privilege level.  A memory access requires a
16-bit offset, computed from fields of the instruction and various registers,
depending on the address mode, and a reference to one of 6 segment registers.
The machine instructions don't provide a carry from the offset part to the
segment.

When you load a segment register, information is loaded from the descriptor
tables on the origin, length, and privileges of the segment.  If the 
segment isn't valid, you get one of several types of traps.  NO ACCESS 
NEED BE ATTEMPTED - JUST LOADING A BAD SEGMENT NUMBER INTO A SEGMENT
REGISTER CAUSES A TRAP.  (A NULL pointer is a special case:  you can load it,
but it traps if you try to dereference it).  Why do it this way?  It saves
a substantial amount of time doing this when the segment register is loaded 
instead of fetching and checking this information on every access.

Small model:  pointers are 16 bits, and pointer math works fine. 64k data max.
Large model:  pointers are 32 bits, and pointer math works fine, with no
carries into the segment number.  No object (including whole arrays) may 
exceed 64k.
Huge model:  pointers are 32 bits.  Pointer math must propogate carries
into the segment number.  (To further complicate things, carries have to
propogate into the segment number field, which does NOT start in the low-order
bit of the segment register).  Huge arrays must have consecutive segment 
numbers.  

Now, what happens if you want to compute &foo[-256] (foo is an int array)?  
foo starts at 88:0x0000.  (Note:  NULL is 0:0x0000, and pointer comparisons
(OUGHT TO) compare 32 bits)  So &foo[-256] is 87:0xFE00.  As it happens, 
there is no segment 87 allocated.  If you put &foo[-256] into a register 
(pair), you get a General Protection Trap.  NO ACCESS NEED BE ATTEMPTED.

Consider the 80386 protected mode (with 32-bit addressing).  This is
the same as 16-bit addressing, except the offset can be much bigger, and
each pointer gets 16 bits larger.  One segment can be so big nobody would 
ever need more than one, right? Wasn't it 15 years ago when 64k was 
considered more than anyone would need on a micro? Chances are there won't 
be much widespread serious need for anything but small model for a decade 
or two.  It is possible for an OS to not support 32-bit addressing, so an 
application might be stuck with 16-bit addressing.  Also, 32-bit addressing 
makes the code bigger.

Consider 80386 real address mode, or virtual 8086 mode.  In this mode, the
segment register is a number of 16-byte "paragraphs", so that 
physical address = 16 * segment register + offset, rather than a segment
number.  The machine instructions don't provide a carry from the offset
computation into the segment part.

Small model:  pointers are 16 bits, and pointer math works fine. 64k data max.
Large model:  pointers are 32 bits, and pointer math works fine, with no
carries into the segment register.  No object, including arrays, may exceed 64k.
Huge model:  pointers are 32 bits.  Pointer math must propogate carries into
the segment number in a wierd sort of way.  After doing pointer math, it is 
usually necessary to "normalize" the pointer before using it.  Pretending that 
a pointer is a structure, which it's not, normalization can be done by: 
	ptr.seg += ptr.offset >> 4; ptr.offset &= 0xf;  
This is fairly expensive in instructions to compute.  Also, this doesn't 
always work for out-of-range pointers.  

Why normalize?  Consider trying to access a 2-byte quantity at 0x2400:FFFF
( = 0x34FFF).  On a 80386 in real address or virtual 8086 mode, you get a 
General Protection Trap.  On a real 8086, you get one byte from 0x2400:FFFF
and one byte from 0x2400:0000 ( = 0x34FFF and 0x24000), which is wrong.
If you normalize this to 0x33FF:000F, the access works properly.

Now, consider that &foo is at 0x0000:0022.  ( = physical address 0x00022)
&foo[-256] has many representations, the normalized one being 0xFFE2:0002.  
The trouble is, on a 80386 (real address or virtual 8086 mode), adding 256
( * sizeof int) to this (giving 0xFFE2:0202) accesses physical address 
0x100022, which is the wrong memory location.  On a real 8086, this accesses 
physical address 0x00022, which is correct.  (If you don't believe these 
differences, read the 80386's Programmer's Reference Manual, Section 14.7, 
paragraphs 7 and 18.  The 80386 in these modes can access 1 meg + 64k).  This 
part isn't a problem if you don't let the pointer value go outside the array.

The point of this longwinded mess is that in order to handle BOTH > 64k arrays
and arrays with a 0-origin outside array the EVERY access must involve 
pointer arithmetic followed by normalization, and you can't use the
displacement field in instructions, even if it is known that you just
normalized the pointer, and the displacement is small and positive.  Even
if you're trying to access the second half of a 2-byte quantity.  Pointer
normalization is expensive, nearly a dozen instructions.  Just handling
> 64k arrays is bad enough.  Complicating it with out-of-range pointers
makes it much worse.

Microsoft's huge model doesn't handle the full generality required here.
It requires that if you have an array > 128k, that its size be a power of
2.  Guess why these restrictions exist?

I think all of the mentioned characteristics of the 80386 in real-address 
mode or 16-bit protected mode also apply to the 80286.

				Gordon L. Burditt
				...!texbell!sneaky!gordon

chasm@killer.DALLAS.TX.US (Charles Marslett) (03/02/89)

In article <8966@alice.UUCP>, ark@alice.UUCP (Andrew Koenig) writes:
> In article <7309@killer.DALLAS.TX.US>, chasm@killer.DALLAS.TX.US (Charles Marslett) writes:
> 
> >  > There is no guarantee that there is a way to represent &space[0] - 101
> >  > at all.  Further, computing it may cause overflows such that
> >  > 
> >  > 	(&space[0] - 101) + 101 != &space[0]
> > 
> > Please explain?  What machine does this, or what architecture would allow
> > the machine or programming language to do it?
> 
> The easiest case is if I give you a strictly checking C implementation
> in which evaluating &space[0]-101 causes your program to terminate
> with an error diagnostic.

I did not disagree that it was possible to design a compiler/computer
combination that had this difficulty -- my point was that given the amount
of code it was likely to break, would anyone write such a compiler.  Or
more precisely, again, HAS ANYONE SEEN THIS PROBLEM and what computer was it
on, and what compiler.  I want a specific KLUTZY 29020 with the GCC 5.13
compiler does blah, blah, blah, ...

Otherwise, yes you can do anything you please with the architecture and the
language once you leave the standards protected area (and some do not even
keep the K&R or Ansi defined language sacred).  Some such implementations are
relatively useless though, and just never appear in practice.  I was
suguesting this limitation is one (not that it is impossible).

Charles
> . . .  A program that attempts to generate
> the address of any other `element' of A is an invalid C program
> and the implementation is permitted to do as it pleases with it.

Once again, I did not say that this behavior is covered by the standard,
I said it is useful behavior and universal (or nearly so).

Before the recent rounds of Ansi drafts, there was no requirement (oops,
someone may nail me on this, but if I err in detail, something had to be
unspecified) that sizeof(xxx) return the size of xxx in units of "char"s.
All (or almost all) C compilers do so and have since the beginning of time
(or Unix).  I am sure much code would break if they did not.
[if this is not an accurate statement of the interpretation of "sizeof",
substitute something that was undefined under K&R, often or always implemented
one way or another, and standardized under ANSI -- there is such a beast
isn't there?]

> -- 
> 				--Andrew Koenig

--
Charles Marslett

chasm@killer.DALLAS.TX.US (Charles Marslett) (03/02/89)

In article <9718@smoke.BRL.MIL>, gwyn@smoke.BRL.MIL (Doug Gwyn ) writes:
> In article <7309@killer.DALLAS.TX.US> chasm@killer.DALLAS.TX.US (Charles Marslett) writes:
> >This is not necessarily a condemnation of any computer that shows such lack
> >of concern for basic mathematics, but I would expect a computer that does
> >anything but return equality in the above expression to be broken as a
> >mathematical engine.  If it is acceptable for addition and subtraction to
> >be other than inverse operations (taking into account the possibility of
> >an error trap or whatever) the thing you are using cannot support any
> >programming language I know of without simulating integer arithmetic with
> >some well protected external unit.
> 
> First of all, you're wrong.  The problem with out-of-range pointer
> arithmetic arises from the way that addresses have to be represented
> in a segmented architecture: a (segment identifier, offset) pair.
> Any arithmetic that would result in an out-of-segment address is simply
> illegal; anything could happen, including hardware trap of the illegal
> operation.  However, in-range address computation presents no problem.

To repeat my original statement:  ANYTHING COULD HAPPEN, but anything does
not happen, because it makes the construction of the computer more difficult.
My point was that making funny things happen with some arithmetic and not
with other arithmetic (since integer arithmetic is constrained by expectations
if not by the standards) is hard to do (so designers don't).

> I know of no Algol-like programming language that guarantees the ability
> to compute out-of-range pointers even as an intermediate step in a
> longer computation, probably because such a guarantee would severely
> constrain implementations on some architectures, requiring considerably
> less efficient data representations and generated code.

I disagree that it is not guaranteed because of some positive good from
not guaranteeing it.  It is not guaranteed because it is not very useful
in a language that has 0-based arrays.  A computer that made such references
"hard" to do would almost certainly result in, as you said, less efficient
data representations and generated code -- if the language were Fortran or
Pascal (where arrays do not have to start at 0 or even at a representable
address).  One might be forced to do an additional subtraction for every
subscript reference.

> Secondly, no computer fully supports the kind of real-number arithmetic
> (including integer arithmetic) that you were taught in school.  Instead
> of saying that therefore all computers are broken, which is not helpful,
> most of us learn how to remain within the range of validity of actual
> hardware operations.  . . .

But do any fail to support this subset?  (I have guessed Burroughs may,
Intel's architectures, except for the 432 perhaps, certainly do allow
such pointers and the pointers work as expected in large and small models --
in the huge model compiler bugs keep much portable code from working, but
I think if all portable code could be made to work in that model, this would
as well.)

> >To make it even more serious, I do not know of any twos-
> >complement computer that even has a signed vs. unsigned add instruction.
> 
> They are one and the same.

You fell into the same "oops" I did: the IBM 360 had two different instructions
(has ...) that differ only in the way the condition codes are set (overflow
and the like).  So there is a difference.  But not for the purposes of this
discussion, since the condition code can be ignored (and usually is).

So, . . . any real machine/compiler combinations out there?

Please, . . .

Charles

gwyn@smoke.BRL.MIL (Doug Gwyn ) (03/03/89)

In article <7389@killer.DALLAS.TX.US> chasm@killer.DALLAS.TX.US (Charles Marslett) writes:
>> The easiest case is if I give you a strictly checking C implementation
>> in which evaluating &space[0]-101 causes your program to terminate
>> with an error diagnostic.
>I did not disagree that it was possible to design a compiler/computer
>combination that had this difficulty -- my point was that given the amount
>of code it was likely to break, would anyone write such a compiler.

There are compilers like that for architectures like that, as
recently explained in detail by somebody else, and reasonably-
written code works fine.  Why do you think it should be valid
to express the address of a nonexistent object?

>Before the recent rounds of Ansi drafts, there was no requirement (oops,
>someone may nail me on this, but if I err in detail, something had to be
>unspecified) that sizeof(xxx) return the size of xxx in units of "char"s.

Yes, the units of "sizeof" weren't specified.  Since all common
early implementations of C chose to use units of chars, too much
code became utterly dependent on sizeof(char)==1.  I argued for
not requiring that in the proposed standard, but was outnumbered
by those who wanted to retroactively bless that assumption.

dan@ccnysci.UUCP (Dan Schlitt) (03/10/89)

Please allow me to try to cut through all of this complicated
discussion of specific behavior of arithmetic for various data types
on sundry computers.  If one insists on thinking like a mathematician
then one must understand that the mathematical objects that we call
ints, floats, pointers, .... are special objects with peculiar
properties.  They simply are not the objects that we learned about in
elementary school.  Any expectations derived from our experience with
the natural numbers or the reals may be wrong.

They are finite sets.  They frequently are not closed under common
operations like "addition".  When they are closed the behavior may be
strange.

Most folk who deal with computers avoid this swamp by not thinking
like mathematicians.  They deal with the objects as they find them.
This generally keeps them out of trouble.  Most of the answers in this
thread have been of this sort.  They are correct answers, but they
don't satisfy the mathematician in us.

Even though my background is theoretical physics and quantum field
theory, I have never found the discussion of computer arithmetic as a
topic in abstract mathematics very attractive.  But if you are going
to do it then you have to do it right.  A good first step probably is
to stop using "+" and "-" as binary operators in the discussion.  Then
you won't be confused by the arithmetic of natural numbers.

This, of course, ignores all of the botches which have been imposed on
us by various designers.  W.M. Kahan didn't write the program called
paranoia just for fun.  The type of errors that it tests for actually
occur in the computer arithmetic of well known computers.  (The program
is available from netlib.)  Some examples are given in his paper in
Information Processing 71, North-Holland(1972).
-- 
Dan Schlitt                        Manager, Science Division Computer Facility
dan@ccnysci                        City College of New York
dan@ccnysci.bitnet                 New York, NY 10031
                                   (212)690-6868

mat@mole-end.UUCP (Mark A Terribile) (03/11/89)

>  > There is no guarantee that there is a way to represent &space[0] - 101
>  > at all.  Further, computing it may cause overflows such that
>  > 	(&space[0] - 101) + 101 != &space[0]
> ... what architecture would allow [this]?
>  > ...  Even computing &space[0] - 101 could cause core dumps ...
 
> ... but I would expect a computer that does anything but return equality in
> the above expression to be broken as a mathematical engine.

Actually not.  It's broken as an addressing engine.

Some machines require pointers to be handled in special registers, or to be
composed of more than one register (Intel segmented architecture).  Address
computation is not performed by addition instructions but by loading a base
register and an index register and executing a ``load address'' or ``compute
address'' instruction.  

Unfortunately, some such architectures, in order to maintain performance
based on the way their designers assumed that the registers would be used
(I don't know if they were still studying FORTRAN execution on early
System 360s or what) force the address to be validated against limit registers
and page valid registers and so forth when one of the address registers is
loaded.  Because of the normal limitations in the ``normal'' machine models,
the Intel machines are less vulnerable than they might be, but one can easily
conceive of a machine in which addressing exceptions would be generated at the
decrement of a register.  Or in which an address register would be marked
INVALID and all future instructions ignored until the register was reloaded
from memory.

Why not use ordinary instructions and only load the address registers when you
really want to perform the indirection?  Perhaps the address space is 18 bits
or 22 bits wide and all your arithmetic instructions are limited to 8 bits
and an add-with-carry takes half-again as long as an add.  (It *shouldn't*,
but in a world where Intel can create their bizzare segmented architecture,
anything can happen.)  Perhaps the address validation on the segment register
is *sooo* costly that the compiler prefers to keep it loaded.  (Validation on
the 286 in virtual mode requires two table lookups accomplished in firmware
and takes more than 20 clock cycles.)

Such designs may be DUMB, but that doesn't stop people from thinking that
they are neat ideas.  One of these days I will read up on the 386 and see
what they had to bolix to get the large contiguous instruction space.
-- 

(This man's opinions are his own.)
From mole-end				Mark Terribile

aglew@mcdurb.Urbana.Gould.COM (03/16/89)

>I have never found the discussion of computer arithmetic as a
>topic in abstract mathematics very attractive.

Topic for another newsgroup, but abstractions of computer arithmetic
can be interesting. For example, binary encoding gives us O(lg N) in the number
of bits addition and parallel multiplication and comparisons; redundant
encodings give us O(1) parallel addition at the cost of more difficult
comparisons; residues give you O(1) parallel addition and multiplication
(well, actually O(lg M), where M is a number related to the number of 
primes within the range you want to work on) but considerably more
complicated comparisons; logarithmic encoding gives you O(1) parallel
multiplication and division, at the cost of subtraction...
    Q: is there any sort of "Conservation of [parallel] complexity"
theorem for finite arithmetic?

mouse@mcgill-vision.UUCP (der Mouse) (03/26/89)

In article <1388@ccnysci.UUCP>, dan@ccnysci.UUCP (Dan Schlitt) writes:
> W.M. Kahan didn't write the program called paranoia just for fun.
> [...] (The program is available from netlib.)

My letter's gone unanswered (so far), if in fact it ever got to
ccnysci, and Dan's posting is about to expire....

What's netlib?  How can I get hold of this, given Internet access and
UUCP mail, but not bitnet?  (If I can, that is.)

					der Mouse

			old: mcgill-vision!mouse
			new: mouse@larry.mcrcim.mcgill.edu