[net.lang.c] ... datum ** and ij

phipps@fortune.UUCP (Clay Phipps) (08/22/84)

Lines beginning "!?  ": Chris Torek (does he really want to be named ?)
Lines beginning "    ": Clay [editing indicated by -- CP]

!?  let's take a look at a three dimensional [row major -- CP] array.  
!?  Given ``int a[4][6][5];'' as the declaration 
!?  and ``a[2][3][4]'' as the reference, 
!?  we take 2, multiply by six, add three, 
!?  multiply the result by five, and add four, giving 79.

The above calculations are correct.
	
!?  We can generalize the first as follows:
!?
!?      offset = 0;
!?      for (i = 0, j = num_dimens - 1; i < num_dimensions; i++, j--) {
!?  	    offset = offset + array_index[i];
!?  	    offset = offset * array_sub[j];
!?      }
   
!?  ... I know it's not optimized C code.  This is just an illustration.  
!?  ... I see some smiles.  You, you look like you want to say something.

Yes, teacher, your program fragment is just plain WRONG.
It gives an incorrect answer, 142,
instead of the correct answer, 79, that was worked out above.

!?  where 
!?  num_dimens		is the number of dimensions in the array,
!?  array_index[]	contains the constants 
!?  			used in the declaration of the array, 

The text above does not describe array "indexes" (or "indices"),
but it does describe the "size" (K&R p. 93) of the dimensions.
To most people, apparently including K&R, 
an "index" is synonymous with a "subscript".
This counter-mnemonic name was a continual source of confusion
while I was trying to work through some examples.
Perhaps this even confused the teacher.
	
!?  array_sub[]	contains the values used in the reference, and
!?  offset	computes the offset from the base of the storage area.
	 
Assuming that you have followed C style,
and that the above compiler arrays subscripts 0, 1, 2
correspond to the compiled source code subscripts written left to right,
the following is correct for row major arrays in C:
    
    offset = array_sub[0];
    for (i = 1; i < num_dimens; i++) {
    	offset = offset * array_size[i];
    	offset = offset + array_sub[i];
    }

where 
array_size[]	is what Chris unfortunately named "array_index";
		it contains the constants 
		used in the declaration of the array. 

A better program fragment would be the following:	
    
    offset = array_sub[0];
    for (i = 1; i < num_dimens; i++) {
    	offset = offset * array_size[i] + array_sub[i];
    }
    
Note that the code above takes advantage of the fact
that C array declarations, like old-style FORTRAN (IV, 66),
specify a size, instead of lower and upper bounds.
With other languages like PL/I, Pascal, and Full FORTRAN 77,
the dimension size t be calculated, usually at compilation time.

!?  Ok, that's all for today.  Quiz Thursday.  Good night.

I think you had better review Gries, p. 178 (Pratt isn't much help;
don't know about the Dragon Book) before you spring it on us netlanders.

-- Clay Phipps

-- 
            { amd  hplabs!hpda  sri-unix  ucbvax!amd }          
                                                      !fortune!phipps
   { ihnp4  cbosgd  decvax!decwrl!amd  harpo  allegra}

chris@umcp-cs.UUCP (Chris Torek) (08/24/84)

Oops.  Clay Phipps' corrections are definitely in order.  For some
reason when I wrote that I thought of the indicies being stored in the
sub[] guys in backwards order.  I never did specify that (obviously I
wasn't thinking clearly there).  Oh well, at least I got the pointers
part right.  (Well, that is, I *assume* I got it right, or someone
would have pointed out any errors there, too, right? :-) )

``array_index'' *was* a lousy name, wasn't it?

By the way, (just to get revenge (-: ) this wasn't quite precise:

> Note that the code above takes advantage of the fact
> that C array declarations, like old-style FORTRAN (IV, 66),
> specify a size, instead of lower and upper bounds.
> With other languages like PL/I, Pascal, and Full FORTRAN 77,
> the dimension size t be calculated, usually at compilation time.

Actually, in the case of FORTRAN IV and FORTRAN 66, you'd also have to
subtract one from the given indicies, since the legal index values are
1 to ``size'' instead of C's 0 to ``size''-1.  In languages with
compiler support for arbitrary subscript ranges (in F77, stuff like
``INTEGER I(-10:10)'') the lower bounds have to be saved, too, along
with the sizes (or the upper bounds).  If the compiler is to do bounds
checking, then the upper bounds are required (and it's up to the
compiler writer to decide whether to compute the sizes once and save
them, or once for each array reference).

A complete row-major order implementation (that worked only with
constants for subscripts) might look like this:

/* a single array dimension, i.e., the -10:10 part of an F77 array */
struct arydim {
	int	ad_low;		/* lower bound */
	int	ad_high;	/* upper bound */
	int	ad_size;	/* high-low+1 */
};

#define MAXDIM 7		/* good enough for FORTRAN anyway */
				/* or have they changed it again? */

/* an array */
struct ary_descriptor {
	char	*a_name;	/* name of the array, for the assembler */
	int	a_ndims;	/* number of dimensions */
	struct arydim a_dim[MAXDIM];/* the dimensions themselves */
};

/* Compute the offset in the array described by `d' given the indicies
   in `sub'.  The number of subscripts has already been verified.
   The subscripts are stored such that sub[0] == leftmost,
   sub[MAXDIM-1] == rightmost possible. */
int
compute_offset (d, sub)
struct ary_descriptor *d;
int sub[MAXDIM];
{
	int i, off, t;

	off = 0;
	for (i = 0; i < d -> a_ndims; i++) {
		off *= d -> a_dim[i].ad_size;
		t = sub[i];
		if (t < d->a_dim[i].ad_low || t > d->a_dim[i].ad_high)
			uerror("array index out of range");
		off += t - d -> a_dim[i].ad_low;
	}
	return off;
}

One can generalize even further, to arrays whose sizes are computed
at run time, to extensible arrays, and so forth.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci (301) 454-7690
UUCP:	{seismo,allegra,brl-bmd}!umcp-cs!chris
CSNet:	chris@umcp-cs		ARPA:	chris@maryland