[comp.lang.c] is this array access portable?

mouse@thunder.mcrcim.mcgill.edu (der Mouse) (06/24/91)

Suppose `sometype' is a type, and `something' is a valid thing to
assign to an object of that type, and I write

	sometype foo[40][50];
	sometype *fp;
	int i;

	fp = &foo[0][0];
	for (i=2000;i>0;i--) *fp++ = something;

Is this portable?  (The significant question is whether the wraparound
from the end of one row to the beginning of the next is guaranteed to
work correctly.)  I *think* enough promises are made that this will
work, but I'd like to get a few second opinions before writing this
into some code.  The thing that makes me think it has to work is a
sentence in the New Testament A7.4.8 (page 204): ``[...]: the size of
an array of n elements is n times the size of one element.'', which,
applied recursively, appears to imply that the elements of an
N-dimensional array must be packed together as if in a one-dimensional
array of the appropriate size.  (I'm also not *quite* certain that even
if this is true, a pointer can sleaze past the boundary safely....)

					der Mouse

			old: mcgill-vision!mouse
			new: mouse@larry.mcrcim.mcgill.edu

worley@compass.com (Dale Worley) (06/24/91)

In article <1991Jun23.185351.5695@thunder.mcrcim.mcgill.edu> mouse@thunder.mcrcim.mcgill.edu (der Mouse) writes:
   The thing that makes me think it has to work is a
   sentence in the New Testament A7.4.8 (page 204): ``[...]: the size of
   an array of n elements is n times the size of one element.'', which,
   applied recursively, appears to imply that the elements of an
   N-dimensional array must be packed together as if in a one-dimensional
   array of the appropriate size.  (I'm also not *quite* certain that even
   if this is true, a pointer can sleaze past the boundary safely....)

I think the trouble you get into is that you can't guarantee to
increment the pointer from the last element of one row to the first
row of another, even if they are stored adjacently.  (See 3.3.6.)
However, I can't work out a practical example.

Dale Worley		Compass, Inc.			worley@compass.com
--
Taking a trip by air? Next time ask to be seated in the "Comedy" section.

kremer@cs.odu.edu (Lloyd Kremer) (06/25/91)

In article <1991Jun23.185351.5695@thunder.mcrcim.mcgill.edu> mouse@thunder.mcrcim.mcgill.edu (der Mouse) writes:
>
>	sometype foo[40][50];
>	sometype *fp;
>	int i;
>
>	fp = &foo[0][0];
>	for (i=2000;i>0;i--) *fp++ = something;
>
>Is this portable?  (The significant question is whether the wraparound
>from the end of one row to the beginning of the next is guaranteed to
>work correctly.)

Arrays are guaranteed contiguous, therefore arrays of arrays (being arrays
themselves) are guaranteed contiguous.

foo is an array of 40 contiguous objects each of which is an array of 50
contiguous sometypes.  So foo occupies exactly 40 * 50 * sizeof(sometype)
contiguous bytes of memory, the same as "sometype foo[2000]" would.

					Lloyd Kremer
					Hilton Systems, Inc.
					kremer@cs.odu.edu

baligag@gtephx.UUCP (Ganesh Baliga) (06/26/91)

In article <1991Jun25.135920.4120@cs.odu.edu>, kremer@cs.odu.edu (Lloyd Kremer) writes:
> In article <1991Jun23.185351.5695@thunder.mcrcim.mcgill.edu> mouse@thunder.mcrcim.mcgill.edu (der Mouse) writes:
> >
> >	sometype foo[40][50];
> >	sometype *fp;
> >	int i;
> >
> >	fp = &foo[0][0];
> >	for (i=2000;i>0;i--) *fp++ = something;
> >
> >Is this portable?  (The significant question is whether the wraparound
> >from the end of one row to the beginning of the next is guaranteed to
> >work correctly.)
> 
> Arrays are guaranteed contiguous, therefore arrays of arrays (being arrays
> themselves) are guaranteed contiguous.
> 
> foo is an array of 40 contiguous objects each of which is an array of 50
> contiguous sometypes.  So foo occupies exactly 40 * 50 * sizeof(sometype)
> contiguous bytes of memory, the same as "sometype foo[2000]" would.
> 
> 					Lloyd Kremer
> 					Hilton Systems, Inc.
> 					kremer@cs.odu.edu
 

Is the row major addressing scheme part of the C language definition
or is it one of those unwritten rules for C compiler writers ? If
this varies over compilers, all hell could break loose.

Could someone please comment ?

msb@sq.sq.com (Mark Brader) (06/27/91)

> 	sometype foo[40][50];
> 	sometype *fp;
> 	int i;
> 
> 	fp = &foo[0][0];
> 	for (i=2000;i>0;i--) *fp++ = something;
> 
> Is this portable?  (The significant question is whether the wraparound
> from the end of one row to the beginning of the next is guaranteed to
> work correctly.)

This is interesting.  The answer is that the *above* is valid ANSI C, but
the seemingly equivalent last line

	for (i=0;i<2000;++i) fp[i] = something;   i=0;

*isn't* valid.

The reason is this.  The standard does guarantee that array elements
are stored in contiguous fashion as you expect, and that two-dimensional
arrays work in row major order (because they are really arrays of arrays).
Consequently, &foo[0][50] and &foo[1][0] are equal.  And since fp is
equal to the pointer to which foo[0] decays, &fp[50] equals &foo[1][0].

You might expect that &fp[51] would therefore be equal to &foo[1][1].
It *isn't* -- any more than 3*xx/xx is equal to 3 when xx is zero.
Computing &fp[51] involves an out-of-bounds array reference and is
undefined behavior in ANSI C, whereas &foo[1][1] is valid.

Computing &fp[50] is also an out-of-bounds array reference, but you're
allowed to go *one* position past the upper bound in ANSI C if you don't
dereference the resulting pointer.  Assigning to fp[50] is an error.
The second version of the loop could fail on the iteration when i is 50.

If this is so, why is the first version valid?

Well, the difficulty in the second version is not the transition from
the first to the second row of the large array; it's in the addition
of 51 to fp.  If you increase the pointer value by steps of 1 at a time,
in due course you reach the magic value which is one past the end of the
first row -- AND is guaranteed to also be the beginning of the second row.

That is, in the first version of the loop, at some point you have the
value of &foo[0][49] in fp.  You indirect through that, which assigns to
foo[0][49].  Then you increment fp.  This increment takes it from &foo[0][49]
to &foo[0][50], which is valid since it doesn't go more than one place past
the end of the array foo[0].  Now ordinarily you couldn't indirect through
this pointer value.  But in this case you *can*, because it is known to
be equal to &foo[1][0], which you can indirect through all right.  And
then when you increment this, of course, you get &foo[1][1], and so on
to the end.

I checked by email with Doug Gwyn before posting this, and he confirmed
that there had been an interpretation ruling on this or a very similar
case.

Now I have answered the question theoretically in terms of ANSI C.
In terms of K&R C, both loops are illegal, as the "one past the end" rule
didn't exist in K&R.  And in practical terms, very few implementations
would reject either one, simply because very few do array bounds checking.

But bounds checking *is* the only issue here; the alignment of the array
elements is guaranteed.  If you wanted to write a loop like the second
one without problems from bounds checking, you could either use an array
in a union:

	union {
 		sometype f1 [40][50];
 		sometype f2 [40*50];
	} foo;

or you could malloc() the array.
-- 
Mark Brader, Toronto	    "If you feel [that Doug Gwyn] has a bad attitude,
utzoo!sq!msb, msb@sq.com     then use lint (or Chris Torek...)" -- Joe English

This article is in the public domain.

paul@u02.svl.cdc.com (Paul Kohlmiller) (06/28/91)

baligag@gtephx.UUCP (Ganesh Baliga) writes:

>Is the row major addressing scheme part of the C language definition
>or is it one of those unwritten rules for C compiler writers ? If
>this varies over compilers, all hell could break loose.

>Could someone please comment ?
I don't think you will see the words "row major" in the ANSI standard but then
I also don't think you will see multidimensional arrays either. C has arrays of
arrays but not multidimensional arrays. This distinction answers your question.
The item a[0][1] follows a[0][0] because the array a[0] must be filled up 
before preceeding with array[1]. Did I explain that well enough?
I don't know of any C compilers that do not use row major order because it 
would violate this C notion of arrays of arrays. On the other hand, I do know
of Fortran compilers that can switch between row major and column major arrays.
BTW, earlier in this thread someone said that arrays fill contiguous space but
there is a catch in there. The X3J11 committee said the overindexing arrays to
get to another part of the array is not guaranteed to work. For example;
int a[4][4];
/* Is a[0][4] the same memory location as a[1][0] ? */
The committee said that it is not necessarily so but it is one of those
arguments that involves a compiler noone uses.
(Assuming I wrote it down right.)
Paul Kohlmiller

--
     // Paul H. Kohlmiller           //  "Cybers, Macs and Mips"         //
     // Control Data Corporation     // Internet: paul@robin.svl.cdc.com   //
     // All comments are strictly    // America Online: Paul CDC         //
     // my own.                      // Compuserve: 71170,2064           //

mouse@thunder.mcrcim.mcgill.edu (der Mouse) (06/29/91)

In article <1991Jun25.135920.4120@cs.odu.edu>, kremer@cs.odu.edu (Lloyd Kremer) writes:
> In article <1991Jun23.185351.5695@thunder.mcrcim.mcgill.edu> mouse@thunder.mcrcim.mcgill.edu (der Mouse) writes:
>>	sometype foo[40][50]; sometype *fp; int i;
>>	fp = &foo[0][0]; for (i=2000;i>0;i--) *fp++ = something;
>> Is this portable?

> Arrays are guaranteed contiguous, therefore arrays of arrays (being
> arrays themselves) are guaranteed contiguous.

Yes, but that doesn't necessarily mean that a pointer can run through
the whole array as I outlined above.

(By the way, does the answer to my original question change any if
"sometime" is known to be "char"?)

					der Mouse

			old: mcgill-vision!mouse
			new: mouse@larry.mcrcim.mcgill.edu

mouse@thunder.mcrcim.mcgill.edu (der Mouse) (06/29/91)

>> 	for (i=2000;i>0;i--) *fp++ = something;

> This is interesting.  The answer is that the *above* is valid ANSI C,
> but the seemingly equivalent last line

> 	for (i=0;i<2000;++i) fp[i] = something;   i=0;

> *isn't* valid.

> [...] Consequently, &foo[0][50] and &foo[1][0] are equal.

They are?  Is it legal to compare them?  (Do we even know that
foo[0][49] is stored adjacent to foo[1][0], as the stuff I quoted from
the sizeof section seems to imply?  I'm not as convinced of this as I'd
like to be.  But let's ignore that for now.)

> That is, [...] at some point you have the value of &foo[0][49] in fp.
> [...].  Then you increment fp.  This increment takes it from
> &foo[0][49] to &foo[0][50], which is valid since it doesn't go more
> than one place past the end of the array foo[0].  Now ordinarily you
> couldn't indirect through this pointer value.  But in this case you
> *can*, because it is known to be equal to &foo[1][0], which you can
> indirect through all right.

That's not entirely clear to me.  Why does &foo[0][50] have to point
anywhere?  And if it does point somewhere, why must it point to
&foo[1][0]?

> I checked by email with Doug Gwyn before posting this, and he
> confirmed that there had been an interpretation ruling on this or a
> very similar case.

That's reassuring, but also somewhat disturbing, because of the
uncertainties above.  Is the text of that ruling available?

I feel somewhat more confident of my actual code, because in that case
"sometype" is "char", to which special guarantees seem to apply, due to
the existence of memset and similar functions and the grandfather
clause equivalencing void * and char *.

					der Mouse

			old: mcgill-vision!mouse
			new: mouse@larry.mcrcim.mcgill.edu