[comp.lang.pascal] Array storage order

garry@batcomputer.tn.cornell.edu (Garry Wiegand) (05/30/87)

We're trying to arrange for a C subroutine library to be callable from
Fortran and Pascal and also transportable between machines. 

One of the questions that has come up is with respect to the storage
order of 2-dimensional arrays. In C, array element [1][2] is stored
in memory immediately after element [1][1]. I believe this is locked
into the language - pointer arithmetic and all that. On the Fortran
compilers I can get to, (2,1), not (1,2), immediately follows (1,1).
The pascal I haven't investigated yet.

Question: is this array storage order mandated by the Fortran-77 standard?
How about the Pascal standard?   If so, then I'll document the arrays as being
"backwards" in these languages. If not, I'll keep two code versions. 
Obviously, two manual versions would be less hassle.

Help appreciated.

garry wiegand   (garry@oak.cadif.cornell.edu - ARPA)
		(garry@crnlthry - BITNET)

papowell@umn-cs.UUCP (Patrick Powell) (05/30/87)

In article <1215@batcomputer.tn.cornell.edu> garry@oak.cadif.cornell.edu writes:
>We're trying to arrange for a C subroutine library to be callable from
>Fortran and Pascal and also transportable between machines. 
>
>One of the questions that has come up is with respect to the storage
>order of 2-dimensional arrays. In C, array element [1][2] is stored
>in memory immediately after element [1][1]. I believe this is locked
>into the language - pointer arithmetic and all that. On the Fortran
>compilers I can get to, (2,1), not (1,2), immediately follows (1,1).
>The pascal I haven't investigated yet.
>
>Question: is this array storage order mandated by the Fortran-77 standard?
>How about the Pascal standard?   If so, then I'll document the arrays as being
>"backwards" in these languages. If not, I'll keep two code versions. 
>Obviously, two manual versions would be less hassle.
>
>Help appreciated.
>
>garry wiegand   (garry@oak.cadif.cornell.edu - ARPA)
>		(garry@crnlthry - BITNET)

This is a rather involved discussion of arrays in FORTRAN, PASCAL, and
C If you want more details, I suggest that you might want to look at a
book on programming languages, such as "Principles of Programming
Languages" by Bruce MacLennan, "Programming Languages" by Ghezzi and
Jazayeri,  or "Design and Implementation of Programming Languages" by
Pratt.  The MacLennan is the easiest to read,  the Ghezzi is the most
technical.

If you document things,  document them correctly.  Briefly,  FORTRAN
is COLUMN major,  C and Pascal are ROW major.

FORTRAN
-------
Array storage order in FORTRAN is mandated
by the FORTRAN language standard (ANSI X3.9-1966 and 1977)
to be COLUMN MAJOR.  For example, A(3,2) would have representation in memory
A(1,1) A(2,1) A(3,1) A(1,2) A(2,2) A(3,2)

Actually,  what is specified is a method of translation of an array
subscript expressions into a position in the array.  For an array
declaration of the form: A(ll1:ul1, ll2:ul2,...lln:uln) (Max of 3
dimensionsin F66,7 in F77) a subscript expression of the form:
A(i1,i2,i3,...i7) has a subscript value of the form:
SF1*(i1-ll1)+SF2*(i2-ll2)+...+SFn(in-lln), where SFi is a "scale
factor";  if you have an array of ojects,
SF1 = 1
SF2 = (ul1-ll1+1)*1 = (elements in previous dimension)*(size of element)
    = M1*SF1
...
SFn = (ul(n-1)-ll(n-1)+1)*SF(n-1)
The size (number of elements) in the array is SZ=M1*M2*...*Mn = SFn+1

The array is actually a "chunk" of memory, of size SZ.
A position in the array is given by the value of the subscript
value.
If you work out the arithmetic,  you will find that A(1,x,y...) is
immediately followed by A(2,x,y...),  etc.,  which indicates that
the first column is stored first,  the next column next,
and so forth: i.e.- column major or column order.

What happens when you pass an array or an array element to a
subroutine/function? (I will refer to this as procedure).
Fortran passes variables (arrays and variables) by REFERENCE; it passes
expressions by VALUE.  Usually this is implemented by passing the
address of an array or variable,  and by evaluating an expression,
stuffing the value in some "anonymous variable", and passing the
address of the anonymous variable.

If the formal parameter (dummy variable) of a procedure is declared as an
array,
the actual parameter must be an array,  or an array element.
The size/dimensions of the array may be different than that of the
original array;  for example,  the following is (ugly) but acceptable:

A(2,3)           SUBROUTINE X(V)
... CALL X(A)    REAL V(2,2,2)

In the subroutine,  the array is "bound" to start at the actual parameter
position,  and continue until the end of the actual parameter array.
For this reason:
A(1,1) -> V(1,1,1)
A(2,1) -> V(2,1,1)
A(3,1) -> V(1,2,1)
A(1,2) -> V(2,2,1)
A(2,2) -> V(1,1,2)
A(3,2) -> V(2,1,2)

What about V(1,2,2)?  This references beyond the original (actual) array bounds,
and is an "undefined" (NOT ILLEGAL:  UNDEFINED!) reference.

By the way,  this is also legal:

A(2,3)           SUBROUTINE X(V)
... CALL X(A)    REAL W(1)
                    W(1) = W(2)+ W(3) + W(4)+ ...

Clearly CALL X(A(1,2)) will result in the following
A(1,2) -> V(1,1,1)
A(2,2) -> V(2,1,1)
A(3,2) -> V(1,2,1)
 ... (undefined past here)

You can pass a column of an array at a time,  using:
CALL A(1,1)    SUBROUTINE A(X)
CALL A(1,2)    REAL X(2)
               X(1) = X(2) ....

PASCAL
------

In pascal, arrays are declared as:
v: array [type] of base; {type can be subrange like 1..3, base is base type}
v: array [1..3] of real;
x: array[1..3] of array [1..2] of real;

To make life easier on people,  the language has a "rewriting rule" or
"sytactic redefintion rule",  stating that:
x : array [1..3,1..2] of real;  
is transformed/rewritten as:
x: array[1..3] of array [1..2] of real;

Similarly,  the array subscript x[i,j] ===> x[i][j] ===>(x[i])[j]
The last is actually the precedence relation of the subscript operator.

Now,  let us assume that elements in an array (single dimension array)
are stored in sequential order in memory.  AS FAR AS I KNOW,  THIS IS NOT
REPEAT NOT a language requirement.  However,  there are few reasons to
do it otherwise.  (Quiet in the back,  you animals!  Sparse arrays be damned!)

Now we have the interesting table: 
v[1,1] v[1,2] v[2,1] v[2,2] v[3,1] v[3,2]
That is,  the first ROW is stored first,  the next second, etc.

It turns out that the formula that was used for determing the position
of an element in an array,  which was used for FORTRAN,  can also be used
for PASCAL.

A(i1,i2,i3,...i7) will access the element in position:
SF1*(i1-ll1)+SF2*(i2-ll2)+...+SFn(in-lln), where SFi is a "scale
factor";
SFn = 1
SFn-1 = (uln-lln+1)*1 = (elements in previous dimension)*(size of element)
    = Mn-1*SF1
...
SF1 = (ul2-ll2+1)*SF2
SF0 = (ul1-ll1+1)*SF1 = size of the array.

When you pass an value in PASCAL,  you pass by value or reference;
in addition,  the TYPES of the formal and actual parameter must agree.
Thus, the usual game playing that is done in FORTRAN with resizing and
slicing cannot be done.  However,  you have another problem.
Since Pascal is a moderately strongly typed language,  and requires
the types of actual and formals to agree,  then you run into the problem
when you want a parametric or polymorphic typed function.  Say you want
to search an array for an element:
o
type s5= array [1..5] of char;
s6 = array [1..6] of char;
var p5:s5; p6:s6;
function S5(var x: array [1..5] of char ); system; {don't ask}
function S6(var x: array [1..5] of char ); system; {don't ask}

For each different size of string, you need to have A NEW FUNCTION.
To fix this problem,  Pascal has been extended to have conformant arrays.

function S(var x: array [ll..ul:int] of char ); system;

This definition will cause ll and ul to have the values of the lower and upper
limits of the array bounds in the body of the function.  How, you might ask
does a person implement this?  By using DOPE VECTORS (descriptors in some
languages), which contain all of the information about an array.
The format of the table is usually (but not always) in the form:

------------------------------------------------------
Address | (points to the array area)
------------------------------------------------------
ll1 | (lower limit of 1st dimension)
ul1 | (lower limit of 1st dimension)
SF1 | (Scale factor for 1st dimension)
------------------------------------------------------
ll2 | (lower limit of 2st dimension)
ul2 | (lower limit of 2st dimension)
SF2 | (Scale factor for 2st dimension)
------------------------------------------------------
.....
ll2 | (lower limit of 2st dimension)
ul2 | (lower limit of 2st dimension)
SF2 | (Scale factor for 2st dimension)
------------------------------------------------------

Note that all of the information below the first entry is static,
and determined at compilation time.

When a conformant array is passed,  usually a pointer to a dope vector
is passed.  Some times the dope vector has the form:
------------------------------------------------------
Address | (points to the array area)
------------------------------------------------------
Descriptor | (points to a descriptor)
------------------------------------------------------

Sometimes there are two things passed:  the address and the descriptor.
Sometimes this is done for ALL arrays;  sometimes only for conformant
arrays.  If it is done at all.

Have fun with this one!

C
_

In C, arrays are stored in ROW major form.
For example:
real w[3][2]; /* w[1][1] w[1][2] w[2][1] w[2][2] w[3][1] w[3][2] */
This relates to the C language definition syntax, which is described
in detail in Kernighan and Ritchie,  "The C Programming Language"

The C language passed parameters by value.  However, when you pass an
array,  say: printall(w), the C language requires a pointer to be
passed.  That is, the name of an array is COERCED or cast into a
pointer value that is the start of the array.  However,  if you pass
printit(w(1)),  the value of w(1) is passed.  If you want to pass the
address of the element w(1),  you have to use: printit(&w(1)) or
printit(w+1).

SUMMARY
-------
Fortran is COLUMN MAJOR
C and Pascal are ROW MAJOR
Fortran by reference (address)
Pascal is by value/reference,  exact format is anybodys guess.
C by reference (address)


Patrick Powell
-- 
Patrick Powell, Dept. Computer Science, 136 Lind Hall, 207 Church St. SE,
University of Minnesota,  Minneapolis, MN 55455 (612)625-3543/625-4002

eao@anumb.UUCP (e.a.olson) (06/30/87)

> We're trying to arrange for a C subroutine library to be callable from
> Fortran and Pascal and also transportable between machines. 
> 
> One of the questions that has come up is with respect to the storage
> order of 2-dimensional arrays. In C, array element [1][2] is stored
> in memory immediately after element [1][1]. I believe this is locked
> into the language - pointer arithmetic and all that. On the Fortran
> compilers I can get to, (2,1), not (1,2), immediately follows (1,1).
> The pascal I haven't investigated yet.
> 
> Question: is this array storage order mandated by the Fortran-77 standard?
> How about the Pascal standard?   If so, then I'll document the arrays as being
> "backwards" in these languages. If not, I'll keep two code versions. 
> Obviously, two manual versions would be less hassle.
> 
> Help appreciated.
> 
> garry wiegand   (garry@oak.cadif.cornell.edu - ARPA)
> 		(garry@crnlthry - BITNET)

    The order is mandated by the standard.
    In fortran, arrays are stored in row-major order.  In 'C'
    and Pascal arrays are stored in column-major order.

padpowell@watvlsi.UUCP (06/30/87)

In article <105@anumb.UUCP> eao@anumb.UUCP (e.a.olson) writes:
>> 
>> One of the questions that has come up is with respect to the storage
>> order of 2-dimensional arrays. In C, array element [1][2] is stored
>> in memory immediately after element [1][1]. I believe this is locked
>> into the language - pointer arithmetic and all that. On the Fortran
>> compilers I can get to, (2,1), not (1,2), immediately follows (1,1).
>> The pascal I haven't investigated yet.

>> Question: is this array storage order mandated by the Fortran-77 standard?
>> How about the Pascal standard?   If so, then I'll document the arrays as being
>> garry wiegand   (garry@oak.cadif.cornell.edu - ARPA)
>> 		(garry@crnlthry - BITNET)
>    The order is mandated by the standard.
>    In fortran, arrays are stored in row-major order.  In 'C'
>    and Pascal arrays are stored in column-major order.

NO NO NO... Other way around.  Fortran Columns, Pascal and C are rows.
Don't worry,  I've had days like that.   In fact,  I've had LIVES like that.

Patrick ("Read the manual?  What manual?  OHHH! THAT manual...") Powell

garry@batcomputer.tn.cornell.edu (Garry Wiegand) (07/05/87)

Apologies for not getting back sooner; I've been off the net. The
consensus of replies to my array-storage question were:

  Fortran: Storage order is visible in the language (the Equivalence 
           statement), and the standard mandates column-major storage.

  Pascal:  It's not visible in the (standard) language, and the standard does
           not mandate anything. It is "generally" implemented as row-major.

  C:       It is visible in the language, and the standard mandates row-major
           storage.

  Ada:     It is visible in the language, but the standard neglects to mandate
           anything.

Which means that Ada beats Fortran for being strange, and that I am living
on borrowed time if I assume anything about Pascal.

BTW, if you have trouble remembering what "row-major" and "column-major"
mean, "row-major" means that a 2-D array will be stored in memory row
by row by row (the first index is the row number), and "column-major" means
that it's stored column by column by column.

My thanks to the several people who replied.

garry wiegand   (garry@oak.cadif.cornell.edu - ARPA)
		(garry@crnlthry - BITNET)

dik@cwi.nl (Dik T. Winter) (07/06/87)

In article <1615@batcomputer.tn.cornell.edu> garry@oak.cadif.cornell.edu writes:
 > Apologies for not getting back sooner; I've been off the net. The
 > consensus of replies to my array-storage question were:
 > 
 >   Pascal:  It's not visible in the (standard) language, and the standard does
 >            not mandate anything. It is "generally" implemented as row-major.
 > 
 >   Ada:     It is visible in the language, but the standard neglects to mandate
 >            anything.
 > 
I disagree, the situation in Ada is similar to the situation in Pascal,
although the hints to use row major order are stronger in Pascal
(a: array [1..5,1..7] of integer;  is equivalent to
 a: array [1..5] of array [1..7] of integer;  in Pascal).
-- 
dik t. winter, cwi, amsterdam, nederland
INTERNET   : dik@cwi.nl
BITNET/EARN: dik@mcvax

eao@anumb.UUCP (e.a.olson) (07/07/87)

> >> Question: is this array storage order mandated by the Fortran-77 standard?
> >> How about the Pascal standard?   If so, then I'll document the arrays as being
> >> garry wiegand   (garry@oak.cadif.cornell.edu - ARPA)
> >> 		(garry@crnlthry - BITNET)
> >    The order is mandated by the standard.
> >    In fortran, arrays are stored in row-major order.  In 'C'
> >    and Pascal arrays are stored in column-major order.
> 
> NO NO NO... Other way around.  Fortran Columns, Pascal and C are rows.
> Don't worry,  I've had days like that.   In fact,  I've had LIVES like that.
> 

ooh sorry.. no wonder the students always got
that question wrong when i was teaching as a grad student!
now which goes across..???

bs@pbhye.UUCP (07/07/87)

I can never thank enough the FORTRAN teacher I had who taught me that
FORTRAN array storage works just like a cars odometer, i.e. the right
most digit/subscript increments the fastest.

He was also very adamant that the compiler didn't give squat about 
rows and columns.  Besides, what do you start calling things after
2, 3, or 4 dimensions.

	THANKS - Dr. Robert Fagen

Bruce Skelly
Do I need a disclaimer here?

eugene@pioneer.UUCP (07/07/87)

In article <1782@pbhye.UUCP> bs@pbhye.UUCP (Bruce Skelly) writes:
>
>I can never thank enough the FORTRAN teacher I had who taught me that
>FORTRAN array storage works just like a cars odometer, i.e. the right
>most digit/subscript increments the fastest.
>
>He was also very adamant that the compiler didn't give squat about 
>rows and columns.  Besides, what do you start calling things after
>2, 3, or 4 dimensions.

Wrong!  In FORTRAN, it's the left most subscript, but it's a nice attempt
at the analogy.  In Pascal, it's the rightmost (as well as C and Ada)
and it is defined in the Standard are right-fastest varying and
row-major language.  I am working on some interesting tests on subscript
calculation (FORTRAN).  (A Cray X-MP is the base machine for measurement
reasons.) Any other interesting material would be appreciated.

Geez, aren't there any other members of the FORTRAN or Pascal Standards
committees who read this group anymore?  Arrgh!  What happen to Klein,
Gustafson, and Price [I know what happened to Haynes].

--eugene miya
  formerly Joint ANSI X3J9/IEEE P770 Pascal Language Standards Committee
  eugene@ames-aurora.ARPA
  "You trust the `reply' command with all those different mailers out there?"
  {hplabs,hao,ihnp4,decwrl,allegra,tektronix,menlo70}!ames!aurora!eugene

bs@pbhye.UUCP (Bruce Skelly) (07/08/87)

In article <1782@pbhye.UUCP>, bs@pbhye.UUCP (Bruce Skelly) writes:
> 
> I can never thank enough the FORTRAN teacher I had who taught me that
> FORTRAN array storage works just like a cars odometer, i.e. the right
----------------------------------------- ^^^^^^^^^^^^^
Now that several people have reminded me, I think he must have said
"just the opposite of a cars odometer"

> most digit/subscript increments the fastest.

Bruce Skelly
I guess I did need a disclaimer,  against stupidity.