[comp.lang.fortran] Conformant Arrays in C

dgh%dgh@Sun.COM (David Hough) (02/20/88)

I have been wondering about a problem that's been discussed,
often incorrectly, in comp.lang.c.  The problem is easy enough
to understand if you've ever written linear algebra software,
but the correct "in the spirit of C" solution wasn't clear
to me.

Fortunately Richard O'Keefe has volunteered some ideas.
I'd appreciate feedback on them.  If you're simply opposed
to making C suitable for scientific programming, the net has
already heard from you, so you needn't repeat your views,
but if you have ideas for better syntax or better ways to
achieve the goal, please speak up.  The following is one of my
comments on the January 1988 X3J11 draft for ANSI C,
followed by the proposed solution.


Comment #24, Section 3.5.4.2:   fix arrays

     I know no C translation that's as clear as the  follow-
ing Fortran code:

        SUBROUTINE MATMUL(X,LX,Y,LY,Z,LZ,NX,NY,NZ)
        REAL X(LX,*),Y(LY,*),Z(LZ,*)
        DO 1 I=1,NX
                DO 2 J=1,NZ
                SUM=0
                        DO 3 K=1,NY
 3                      SUM=SUM+X(I,K)*Y(K,J)
 2              Z(I,J)=SUM
 1      CONTINUE
        END

Code like this is at the heart of most of the major portable
Fortran  libraries  of  mathematical software developed over
the last twenty years. The declared leading dimensions of X,
Y, and Z are not known until runtime.

     The Draft, like traditional C, disallows the equivalent

        void matmul(x,lx,cx,...)
        int lx, cx;
        double x[lx][cx] ;

unless lx and cx are  known  at  compile  time.   Equivalent
functionality  can  be  obtained  by  treating all arrays as
one-dimensional and doing the  subscripting  "by  hand",  so
that  it's harder to get right and harder to optimize.  Fix-
ing C's handling of arrays would be a far worthier task  for
the  X3J11 experts than (for instance) specifying details of
the math library.  Maybe there's some good fundamental  rea-
son  why  variables  are disallowed as array dimensions; C's
treatment of arrays has always been confusing.

     Note that the section numberings in the Draft  and  Ra-
tionale are out of synch in the declarations subchapter.

     Recommendation: Fix, by language design  if  necessary,
the  treatment  of  arrays  in  C  to  properly  accommodate
multiply-dimensioned arrays whose  dimensions  vary  at  run
time.   The  goal is not to duplicate Fortran's features ex-
actly, but rather to insure that portable linear algebra li-
braries are as easy to create in C as in Fortran.

Array proposal
----- --------

     Richard O'Keefe has kindly provided the following  out-
line of how an array facility might work in C:

     The  smallest  possible  change  would  be  "conformant
arrays" much as in ISO Pascal or in Turing, where the param-
eter declaration was something like

        void matmul(double a[p][q], b[q][r], c[p][r])
            {
                int i, j, k;
                double t;
                for (i = p; --i >= 0; )
                    for (j = r; --j >= 0; ) {
                        t = 0.0;
                        for (k = q; --k >= 0; ) t *= a[i][j]*b[j][k];
                        c[i][j] = t;
                    }
            }

If p, q, and r are #defined to be constant expressions, this
is already legal, so we need one more thing to indicate that
p,q,r are being defined here, not used.  Consider  the  fol-
lowing:

        declarator:     ...
                  |     declarator '[' subscript_spec ']'
                  ;

        subscript_spec: '?' identifier
                      | constant_expression
                      | /* empty */
                      ;

where  /*  empty  */  is   only   allowed   as   the   first
subscript_spec,  and  ?id  is  only  allowed  in  a function
header.  To avoid having to specify what happens if  ?x  ap-
pears several times but the arguments don't agree with that,
make it illegal, so the first suggestion would  have  to  be
written

        void matmul(double a[?ar][?ac], /* ar >= p, ac >= q */
                    double b[?br][?bc], /* br >= q, bc >= r */
                    double c[?cr][?cc], /* cr >= p, cc >= r */
                    int p,              /* 0 <= p <= min(ar,cr) */
                    int q,              /* 0 <= q <= min(ac,br) */
                    int r)              /* 0 <= r <= min(bc,cc) */
            {
                int i, j, k;
                double t;
                for (i = p; --i >= 0; )
                    for (j = r; --j >= 0; ) {
                        t = 0.0;
                        for (k = q; --k >= 0; ) t *= a[i][j]*b[j][k];
                        c[i][j] = t;
                    }
            }

A conformant array may be passed as a parameter to  a  func-
tion  provided  the function's prototype was visible to con-
firm that a conformant array parameter was intended.

     The simplest way of treating sizeof is to rule it  out:
if  the  description of 'a' involves any ?s, you can't apply
sizeof to it.  So given a parameter

        float fred[?f1][?f2][10];

sizeof fred and sizeof fred[1] would be illegal, but  sizeof
fred[1][2] and sizeof fred[1][2][3] would be legal.


David Hough

ARPA: dhough@sun.com
UUCP: {ucbvax,decvax,decwrl,seismo}!sun!dhough

gwyn@brl-smoke.ARPA (Doug Gwyn ) (02/20/88)

In article <42529@sun.uucp> dgh%dgh@Sun.COM (David Hough) writes:
>     The Draft, like traditional C, disallows the equivalent
>        void matmul(x,lx,cx,...)
>        int lx, cx;
>        double x[lx][cx] ;

Yes, indeed, this is a pain.  I discussed the possibility of
allowing something like this several years ago with Dennis
Ritchie and Steve Johnson, and the consensus was that it was
"doable", although specifying it correctly is tricky.

>        void matmul(double a[?ar][?ac], /* ar >= p, ac >= q */
>                    double b[?br][?bc], /* br >= q, bc >= r */
>                    double c[?cr][?cc], /* cr >= p, cc >= r */
>                    int p,              /* 0 <= p <= min(ar,cr) */
>                    int q,              /* 0 <= q <= min(ac,br) */
>                    int r)              /* 0 <= r <= min(bc,cc) */

I didn't see how this proposal would work.  For correct code
to be generated, all but one of the array dimensions has to
be known via the calling sequence.  There seem to be only
two ways to achieve this:  (1) rely on explicit dimension
parameters, as in the first example; (2) automatically pass
extra size information along with the array address.  The
second approach is incompatible with current treatment of
arrays, unless the function prototype has parameters declared
some special way (perhaps using yet another type qualifier),
so that the compilers will pass array information specially
for the particular function.  In other words, to make method
(2) work, more must be involved than just the function
definition.  Is this what was intended by your second example?
(Since the body of the example didn't make use of ar, ac, etc.,
it's hard to be sure.)

roy@phri.UUCP (Roy Smith) (02/21/88)

	I'm not sure I follow all the details of David's note, but I
thought I would throw in my suggestion anyway.  Some time ago I had need
for variable-dimension arrays in C.  What I ended up using was vectored
arrays.  I wrote an array allocator which called malloc to get memory for
the main array and for the row address vector, and which initialized the
vector to point to the right places in the main array.  After that, you
could pretend you had a variable dimensional array; i.e. you could pass a
subroutine the pointer returned by arrayalloc and the dimensions you used.
It was posted a few years back to mod.sources.
-- 
Roy Smith, {allegra,cmcl2,philabs}!phri!roy
System Administrator, Public Health Research Institute
455 First Avenue, New York, NY 10016

ok@quintus.UUCP (Richard A. O'Keefe) (02/22/88)

Background:
	David Hough sent me his comments on dpANS C.
	He wants a language which can do at least as much as Fortran could.

	One of his criteria was that you should be able to write
	subroutines that handle multi-dimensional arrays sensibly.

	I wrote back to him about some of his points.  I started to
	argue that this criterion required much too big a change to
	the language, and ended up talking myself out of that
	position.

	David Hough sent (part of) my very sketchy proposal to
	comp.lang.c.

I should point out that *no* part of the suggestion is original to me.
Conformant arrays are part of ISO Pascal, and Turing has them.

Doug Gwyn writes:
> I didn't see how this proposal would work.  For correct code
> to be generated, all but one of the array dimensions has to
> be known via the calling sequence.  There seem to be only
> two ways to achieve this:  (1) rely on explicit dimension
> parameters, as in the first example; (2) automatically pass
> extra size information along with the array address.  The
> second approach is incompatible with current treatment of
> arrays, unless the function prototype has parameters declared
> some special way (perhaps using yet another type qualifier),
> so that the compilers will pass array information specially
> for the particular function.  In other words, to make method
> (2) work, more must be involved than just the function
> definition.  Is this what was intended by your second example?
> (Since the body of the example didn't make use of ar, ac, etc.,
> it's hard to be sure.)
	[ there was meant to be an assert() checking the comments,
	  but I forgot.  Sorry about that, but it wasn't ready for
	  "publication".
	]
The answer is that I wrote that it would not be legal to call a
function with a conformant array parameter unless the actual
function definition or a prototype were in scope, so that the
compiler would know to pass the dimension information.  This is
easily checked by a tool like "lint":  simply mark in the symbol
table each function which is called through through an implicit
or old-fashioned declaration, and complain about any marked
function which has a conformation array parameter.  Functions
which take a variable number of parameters are already subject to
just such a restriction.  The declaration of conformant
dimension parameters "? id" is precisely the "special way" of
declaration which Doug Gwyn rightly claims to be necessary.

And yes, conformant array parameters *are* incompatible with the
current treatment of arrays.  If you have
	p[D1]...[Dn]
then
	p[E1]...[Ek]
would be a pointer only if none of Dk+1,...,Dn were a conformant dimension
parameter (?id) [that's "only if", not "if and only if"], and
 sizeof p[E1]...[Ek]
would be legal if and only if none of Dk+1,...,Dn were a conformant
dimension parameter.

Oddly enough, a weak spot in C comes to our aid:  Pascal has this
nasty problem:  what does
	procedure p(var a: array [la..ua: integer] of char;
		    var b: array [lb..ub: integer] of char);
	    begin
		a := b;
	    end {p};
mean?  C wouldn't have that problem, because it lacks array assignment.


I have managed to convince myself that
 o David Hough is right: something like this IS necessary if one is
   to be able to write mathematical libraries in C with at least the
   same generality as Fortran (e.g. EISPACK, LINPACK, IMSL, NAG, ...)
 o conformant array parameters, though a *hack*, are at least a
   hack which is known to work
 o conformant array parameters can be added to C without major
   distortion, though they would be exceptions to "sizeof" and
   quiet-conversion-to-pointer

I have NOT yet managed to convince myself that
 ? conformant array parameters should be added to C
but I think one could make a considerably better argument for this
addition than for 'noalias'.

chris@trantor.umd.edu (Chris Torek) (02/22/88)

In article <676@cresswell.quintus.UUCP> ok@quintus.UUCP
(Richard A. O'Keefe) writes:
>Background:
>	David Hough sent me his comments on dpANS C.
>	He wants a language which can do at least as much as Fortran could.
>
>	One of his criteria was that you should be able to write
>	subroutines that handle multi-dimensional arrays sensibly.

I would claim that the `best' way to do this is to use the row-vector
approach.  To create a two-dimensional array, e.g., do the following:

	typedef struct dmat2 {
		int	m2_nrows;
		int	m2_ncols;
		double	**m2_m;		/* row vectors */
		double	*m2_data;
	} dmat2;

	dmat2 *
	create(rows, cols)
		register int rows, cols;
	{
		register dmat2 *m;
		register int r;

		m = (dmat2 *)malloc(sizeof(dmat2));
		if (m == NULL)
			return (NULL);
		m->m2_rows = rows;
		m->m2_cols = cols;
		m->m2_m = (double **)malloc(rows * sizeof(double *));
		if (m->m2_m == NULL) {
			free((char *)m);
			return (NULL);
		}
		m->m2_data = (double *)malloc(rows * cols * sizeof(double));
		if (m->m2_data == NULL) {
			free((char *)m->m2_m);
			free((char *)m->m2_data);
		}
		for (r = 0; r < rows; r++)
			m->m2_m[r] = &m->m2_data[r * cols];
		return (m);
	}

These arrays can then be used as

	m->m2_m[r][c] = val;

and parts of the array can be passed to functions with

		f(&m->m2_m[off], rows - off, cols);

Note that a whole matrix can be passed (by reference) with

	f(m);

If you dislike having to specify the range for subarrays, you could
create a `make subarray' function:

	dmat2 *
	makesub(m, roff, coff, rsize, csize)
		dmat2 *m;
		int roff, coff, rsize, csize;
	{

		if (m->m2_rows - roff < rsize || m->m2_rows - coff < csize)
			/* it is not that big! */...
		etc.
	}

This does, of course, require more memory to describe, but row vectors
are generally faster anyway.  Since the whole matrix is still there
as a single block of memory, you can use existing routines as well
(by passing m->m2_data).

So why is this solution insufficient?
-- 
In-Real-Life: Chris Torek, Univ of MD Computer Science, +1 301 454 7163
(hiding out on trantor.umd.edu until mimsy is reassembled in its new home)
Domain: chris@mimsy.umd.edu		Path: not easily reachable

ok@quintus.UUCP (Richard A. O'Keefe) (02/23/88)

In article <2336@umd5.umd.edu>, chris@trantor.umd.edu (Chris Torek) writes:
> In article <676@cresswell.quintus.UUCP> ok@quintus.UUCP
> (Richard A. O'Keefe) writes:
> >Background:
> >	David Hough sent me his comments on dpANS C.
> >	He wants a language which can do at least as much as Fortran could.
> >
> >	One of his criteria was that you should be able to write
> >	subroutines that handle multi-dimensional arrays sensibly.
> 
> I would claim that the `best' way to do this is to use the row-vector
> approach.  To create a two-dimensional array, e.g., do the following:
> This does, of course, require more memory to describe, but row vectors
> are generally faster anyway.  Since the whole matrix is still there
> as a single block of memory, you can use existing routines as well
> (by passing m->m2_data).
> 
> So why is this solution insufficient?

I like the approach Torek recommends, having met this approach on the
B6700.  But the sad fact of the matter is that C doesn't support it.
C *will* let you do it yourself, but it *won't* do it for you.
Torek's own example is a better illustration of how much *work* it is
to use this approach in C than I could have come up with, and the
result doesn't look much like ordinary array access in C.  The point
of the conformant array suggestion is that
 o it is a proven technique
 o for hacking around a basic design flaw in Pascal that is also present in C
 o it lets you use array parameters *directly* 

Torek is absolutely right that "row vectors" are a very sensible
implementation technique, but having the machinery show through is
not a sensible >language< technique.

I'm sure that something better than conformant arrays can be done.
But we're looking for something that makes it EASY to pass array
parameters and EASY to use them, nearly as easy as passing fixed
size arrays would be.  We're also looking for something which will
be the same for all C programmers:  if I want to pass an array to
a function I got from Fred, and to pass the same array to a function
I got from Joe, I really do *not* want to find out that Fred used
row pointers and Joe used column pointers.

kurtk@tekcae.TEK.COM (Kurt Krueger) (02/26/88)

Look at "The C Programming Language" by Kernighan & Ritchie on p. 110.  If you
set up your array like  int *b[10]  you can treat it as a conformal 2-d array
without having to pass ANY bounds information (except for whatever the
procedure logic requires).

henry@utzoo.uucp (Henry Spencer) (02/26/88)

>      Recommendation: Fix, by language design  if  necessary,
> the  treatment  of  arrays  in  C  to  properly  accommodate
> multiply-dimensioned arrays whose  dimensions  vary...

This is an interesting idea, and fortunately you are just in time to have
some chance of getting it into the next revision of C, five years or so
hence.  Here's what you do:

	(a) implement it in a C compiler
	(b) use it, and have others use it, for a few years
	(c) write it up and submit it formally

I guarantee that if you follow these steps, IN THE ABOVE ORDER, X3J11 will
at least listen attentively when you propose it.  Note that writing up the
proposal and submitting it comes after you've tried it out, *in C*, not
before.

(Yes, X3J11 has done a few things that have not been based on prior
experience.  Conformant arrays are arguably a better idea than, say,
noalias.  This is not much of a recommendation, however, since noalias
is increasingly looking like the biggest mistake X3J11 has ever made.)

Alternatively you might consider doing it in C++, which probably won't
require any compiler changes at all.  X3J11 has never claimed that C is
the answer to all mankind's problems.  You will be listened to *more*
attentively if you can explain why C++ does *not* solve your problem.
-- 
Those who do not understand Unix are |  Henry Spencer @ U of Toronto Zoology
condemned to reinvent it, poorly.    | {allegra,ihnp4,decvax,utai}!utzoo!henry

edw@IUS1.CS.CMU.EDU (Eddie Wyatt) (02/26/88)

> Look at "The C Programming Language" by Kernighan & Ritchie on p. 110.  If you
> set up your array like  int *b[10]  you can treat it as a conformal 2-d array
> without having to pass ANY bounds information (except for whatever the
> procedure logic requires).


   But you have to access the array as if it where a n by 10 matrix.
This method does not provide for n by m by .... arrays.  At most
you may leave the first dimension (the rows) unspecified.




-- 

Eddie Wyatt 				e-mail: edw@ius1.cs.cmu.edu

reeder@ut-emx.UUCP (William P. Reeder) (02/27/88)

In article <1491@tekcae.TEK.COM>, kurtk@tekcae.TEK.COM (Kurt Krueger) writes:
> Look at "The C Programming Language" by Kernighan & Ritchie on p. 110.  If you
> set up your array like  int *b[10]  you can treat it as a conformal 2-d array
> without having to pass ANY bounds information (except for whatever the
> procedure logic requires).

Since no one else has had the gumption to say it, I will.

Can this discussion continue without cross-posting to comp.lang.fortran?

I know it started out as a discussion of what fortran could learn from
C, but it has long since deviated to be a pure C discussion.

Thanks.

peter@sugar.UUCP (Peter da Silva) (03/06/88)

In article <42529@sun.uucp>, dgh%dgh@Sun.COM (David Hough) writes:
>         void matmul(double a[?ar][?ac], /* ar >= p, ac >= q */
>                     double b[?br][?bc], /* br >= q, bc >= r */
>                     double c[?cr][?cc], /* cr >= p, cc >= r */
>                     int p,              /* 0 <= p <= min(ar,cr) */
>                     int q,              /* 0 <= q <= min(ac,br) */
>                     int r)              /* 0 <= r <= min(bc,cc) */

WOw. I wouldn't have believed it... I prefer Fortran's syntax on this one.

	void matmul(double a[ar][ac], int ar, int ac,
	            double b[br][bc], int br, int bc,
	            double c[cr][cc], int cr, int cc,
		    int p,
		    int q,
		    int r)
	{
	}

Or:

	void matmul(a, ar, ac,
	            b, br, bc,
	            c, cr, cc,
		    p,
		    q,
		    r)
	int ar, ac, br, bc, cr, cc, p, q, r;
	double a[ar][ac];
	double b[br][bc];
	double c[cr][cc];
	{
	}

Doesn't require any new operators, doesn't require passing descriptors
around on the stack (yech), and allows you to allocate some space and
give it different dimensions in different places.
-- 
-- Peter da Silva  `-_-'  ...!hoptoad!academ!uhnix1!sugar!peter
-- Disclaimer: These U aren't mere opinions... these are *values*.