[comp.lang.misc] Row/Column Major Definition and His

dougs@hcx2.SSD.HARRIS.COM (09/06/88)

Row and/or column major refers to the way the arrays are stored in memory.
This is only relevant for 2D or greater dimensioned arrays.  Row major
means that as you move through *sequentially addressed* memory locations,
the right-most subscript of an array reference will increase fastest.  Column
major means the left-most subscript varies fastest.

When you dealt with graphs or arrays in school, the most usual way for you
to envision them was probably row-major.
	
	a(1,1..4)   5 6 7 8
        a(2,1..4)   6 1 8 0
        a(3,1..4)   2 8 4 7
          ...

It is interesting to note that almost all languages have array accessing
defined as row-major (C, Pascal, blah-blah-blah) but FORTRAN is column
major.  Why? ....

Douglas G. Scofield				dougs@hcx2.ssd.harris.com
Harris Computer Systems Division
2101 W. Cypress Creek Rd.
Ft. Lauderdale, FL  33069

eugene@eos.UUCP (Eugene Miya) (09/09/88)

Still no one can answer why column-major over row?
I suspect this had to do with early implementations of compilers simply
allocating space on the fly from left to right.  I'll ask Backus
when I get the chance in a couple of weeks.

Another gross generalization from

--eugene miya, NASA Ames Research Center, eugene@aurora.arc.nasa.gov
  resident cynic at the Rock of Ages Home for Retired Hackers:
  "Mailers?! HA!", "If my mail does not reach you, please accept my apology."
  {uunet,hplabs,ncar,decwrl,allegra,tektronix}!ames!aurora!eugene
  "Send mail, avoid follow-ups.  If enough, I'll summarize."

dik@cwi.nl (Dik T. Winter) (09/09/88)

In article <44600002@hcx2> dougs@hcx2.SSD.HARRIS.COM writes:
 > Row and/or column major refers to the way the arrays are stored in memory.
 > This is only relevant for 2D or greater dimensioned arrays.  Row major
 > means that as you move through *sequentially addressed* memory locations,
 > the right-most subscript of an array reference will increase fastest.  Column
 > major means the left-most subscript varies fastest.
...
 > It is interesting to note that almost all languages have array accessing
 > defined as row-major (C, Pascal, blah-blah-blah) but FORTRAN is column
(There are many languages where it is *not* defined.)
 > major.  Why? ....
 > 
Well, Fortran is an old language, so compiler writers had to use older
technology.  To see why people thought about column-major first, consider
how an array element is accessed.  Given the declaration:
	DIMENSION X(15, 10)
to access X(I, J) the following code has to be executed:
	&X + I + J * 15 - 16.
(Of course I and J may be expressions!)
The simplest way to do this is:
	push(&X), push(16), push(I), push(15), push(J), R=pop()*pop(),
	R=R*pop(), R=R+pop(), R=R-pop(), R=R+pop()
So essentially the access code is in lexical order.
To do this with row-major order involves a non-lexical order or
playing with the stack (which older machines did not readily allow).

Pascal is row major because
	array [1..5,1..10] of integer;
is defined to be equivalent to
	array [1..5] of array [1..10] of integer;

Interestingly, for languages that use dope vectors (Algol 60, Algol 68) for
arrays, it does not matter what way the array is stored.
However, in Algol 60, many implementations had facilities to do
I/O on arrays; and it was thought more pleasing to have the output
row major.  The Algol 68 compiler for CDC Cybers has a flag to generate
code for row major order or column major order.  And routines compiled
one way could cooperate without problems with routines compiled the
other way!  To see how this is done, let's first consider dopevectors.
Assume in Algol 60, an array declared as
	real array a[l1:u1,l2:u2,l3:u3]
The address of a[i,j,k] is:
	&a + (i-l1)*(u2-l2+1)*(u3-l3+1) + (j-l2)*(u3-l3+1) + (k-l3).
The terms (u.-l.+1) are called strides.
We can rearrange this to:
	&a + ((i-l1)*(u2-l2+1) + (j-l2))*(u3-l3+1) + (k-l3).
So we need the following values:
	&a, l1, (u2-l2+1), l2, (u3-l3+1) and l3.
We can do better (by semi-constant folding):
	&a-((l1*(u2-l2+1)+l2)*(u3-l3+1)+l3 + ((i*(u2-l2+1)+j)*(u3-l3+2)+l,
where the first term is constant during the lifetime of the array.
This must be adapted for Algol 68, because for arrays it is not
necessary that there is any dimension with stride 1.
Given the declarations:
	# this is the long form declaration, it can be done shorter. #
	ref [,,] real a = loc [l1:u1,l2:u2,l3:u3] real;
	ref [,] real b = a[,,k];
Now b is a two-dimensional array that accesses parts of a; and if we
assume row-major for Algol 68, no dimension has stride 1.  The best way
to deal with this is to calculate (for b[i,j]) as follows:
	&b-(factor to compensate for lower bounds) + i*s1 + j*s2
where s1 and s2 are the strides for the first and second dimension.
This is better on machines that have more than one register available.
But now it no longer matters whether s2 evenly divides s1, and so
the system may choose to make s1 < s2.  And this is only a function
of the code to manipulate the declaration; access remains the same,
whatever way is chosen.

It is a pity that a fairly modern language as Ada did not go this
way, and dissallows the selections of parts of more-dimensional
arrays.
--
Disclaimer: I am not a compiler writer.
-- 
dik t. winter, cwi, amsterdam, nederland
INTERNET   : dik@cwi.nl
BITNET/EARN: dik@mcvax