[comp.lang.c] Conformant Arrays in C

dgh%dgh@Sun.COM (David Hough) (02/20/88)

I have been wondering about a problem that's been discussed,
often incorrectly, in comp.lang.c.  The problem is easy enough
to understand if you've ever written linear algebra software,
but the correct "in the spirit of C" solution wasn't clear
to me.

Fortunately Richard O'Keefe has volunteered some ideas.
I'd appreciate feedback on them.  If you're simply opposed
to making C suitable for scientific programming, the net has
already heard from you, so you needn't repeat your views,
but if you have ideas for better syntax or better ways to
achieve the goal, please speak up.  The following is one of my
comments on the January 1988 X3J11 draft for ANSI C,
followed by the proposed solution.


Comment #24, Section 3.5.4.2:   fix arrays

     I know no C translation that's as clear as the  follow-
ing Fortran code:

        SUBROUTINE MATMUL(X,LX,Y,LY,Z,LZ,NX,NY,NZ)
        REAL X(LX,*),Y(LY,*),Z(LZ,*)
        DO 1 I=1,NX
                DO 2 J=1,NZ
                SUM=0
                        DO 3 K=1,NY
 3                      SUM=SUM+X(I,K)*Y(K,J)
 2              Z(I,J)=SUM
 1      CONTINUE
        END

Code like this is at the heart of most of the major portable
Fortran  libraries  of  mathematical software developed over
the last twenty years. The declared leading dimensions of X,
Y, and Z are not known until runtime.

     The Draft, like traditional C, disallows the equivalent

        void matmul(x,lx,cx,...)
        int lx, cx;
        double x[lx][cx] ;

unless lx and cx are  known  at  compile  time.   Equivalent
functionality  can  be  obtained  by  treating all arrays as
one-dimensional and doing the  subscripting  "by  hand",  so
that  it's harder to get right and harder to optimize.  Fix-
ing C's handling of arrays would be a far worthier task  for
the  X3J11 experts than (for instance) specifying details of
the math library.  Maybe there's some good fundamental  rea-
son  why  variables  are disallowed as array dimensions; C's
treatment of arrays has always been confusing.

     Note that the section numberings in the Draft  and  Ra-
tionale are out of synch in the declarations subchapter.

     Recommendation: Fix, by language design  if  necessary,
the  treatment  of  arrays  in  C  to  properly  accommodate
multiply-dimensioned arrays whose  dimensions  vary  at  run
time.   The  goal is not to duplicate Fortran's features ex-
actly, but rather to insure that portable linear algebra li-
braries are as easy to create in C as in Fortran.

Array proposal
----- --------

     Richard O'Keefe has kindly provided the following  out-
line of how an array facility might work in C:

     The  smallest  possible  change  would  be  "conformant
arrays" much as in ISO Pascal or in Turing, where the param-
eter declaration was something like

        void matmul(double a[p][q], b[q][r], c[p][r])
            {
                int i, j, k;
                double t;
                for (i = p; --i >= 0; )
                    for (j = r; --j >= 0; ) {
                        t = 0.0;
                        for (k = q; --k >= 0; ) t *= a[i][j]*b[j][k];
                        c[i][j] = t;
                    }
            }

If p, q, and r are #defined to be constant expressions, this
is already legal, so we need one more thing to indicate that
p,q,r are being defined here, not used.  Consider  the  fol-
lowing:

        declarator:     ...
                  |     declarator '[' subscript_spec ']'
                  ;

        subscript_spec: '?' identifier
                      | constant_expression
                      | /* empty */
                      ;

where  /*  empty  */  is   only   allowed   as   the   first
subscript_spec,  and  ?id  is  only  allowed  in  a function
header.  To avoid having to specify what happens if  ?x  ap-
pears several times but the arguments don't agree with that,
make it illegal, so the first suggestion would  have  to  be
written

        void matmul(double a[?ar][?ac], /* ar >= p, ac >= q */
                    double b[?br][?bc], /* br >= q, bc >= r */
                    double c[?cr][?cc], /* cr >= p, cc >= r */
                    int p,              /* 0 <= p <= min(ar,cr) */
                    int q,              /* 0 <= q <= min(ac,br) */
                    int r)              /* 0 <= r <= min(bc,cc) */
            {
                int i, j, k;
                double t;
                for (i = p; --i >= 0; )
                    for (j = r; --j >= 0; ) {
                        t = 0.0;
                        for (k = q; --k >= 0; ) t *= a[i][j]*b[j][k];
                        c[i][j] = t;
                    }
            }

A conformant array may be passed as a parameter to  a  func-
tion  provided  the function's prototype was visible to con-
firm that a conformant array parameter was intended.

     The simplest way of treating sizeof is to rule it  out:
if  the  description of 'a' involves any ?s, you can't apply
sizeof to it.  So given a parameter

        float fred[?f1][?f2][10];

sizeof fred and sizeof fred[1] would be illegal, but  sizeof
fred[1][2] and sizeof fred[1][2][3] would be legal.


David Hough

ARPA: dhough@sun.com
UUCP: {ucbvax,decvax,decwrl,seismo}!sun!dhough

gwyn@brl-smoke.ARPA (Doug Gwyn ) (02/20/88)

In article <42529@sun.uucp> dgh%dgh@Sun.COM (David Hough) writes:
>     The Draft, like traditional C, disallows the equivalent
>        void matmul(x,lx,cx,...)
>        int lx, cx;
>        double x[lx][cx] ;

Yes, indeed, this is a pain.  I discussed the possibility of
allowing something like this several years ago with Dennis
Ritchie and Steve Johnson, and the consensus was that it was
"doable", although specifying it correctly is tricky.

>        void matmul(double a[?ar][?ac], /* ar >= p, ac >= q */
>                    double b[?br][?bc], /* br >= q, bc >= r */
>                    double c[?cr][?cc], /* cr >= p, cc >= r */
>                    int p,              /* 0 <= p <= min(ar,cr) */
>                    int q,              /* 0 <= q <= min(ac,br) */
>                    int r)              /* 0 <= r <= min(bc,cc) */

I didn't see how this proposal would work.  For correct code
to be generated, all but one of the array dimensions has to
be known via the calling sequence.  There seem to be only
two ways to achieve this:  (1) rely on explicit dimension
parameters, as in the first example; (2) automatically pass
extra size information along with the array address.  The
second approach is incompatible with current treatment of
arrays, unless the function prototype has parameters declared
some special way (perhaps using yet another type qualifier),
so that the compilers will pass array information specially
for the particular function.  In other words, to make method
(2) work, more must be involved than just the function
definition.  Is this what was intended by your second example?
(Since the body of the example didn't make use of ar, ac, etc.,
it's hard to be sure.)

roy@phri.UUCP (Roy Smith) (02/21/88)

	I'm not sure I follow all the details of David's note, but I
thought I would throw in my suggestion anyway.  Some time ago I had need
for variable-dimension arrays in C.  What I ended up using was vectored
arrays.  I wrote an array allocator which called malloc to get memory for
the main array and for the row address vector, and which initialized the
vector to point to the right places in the main array.  After that, you
could pretend you had a variable dimensional array; i.e. you could pass a
subroutine the pointer returned by arrayalloc and the dimensions you used.
It was posted a few years back to mod.sources.
-- 
Roy Smith, {allegra,cmcl2,philabs}!phri!roy
System Administrator, Public Health Research Institute
455 First Avenue, New York, NY 10016

ok@quintus.UUCP (Richard A. O'Keefe) (02/22/88)

Background:
	David Hough sent me his comments on dpANS C.
	He wants a language which can do at least as much as Fortran could.

	One of his criteria was that you should be able to write
	subroutines that handle multi-dimensional arrays sensibly.

	I wrote back to him about some of his points.  I started to
	argue that this criterion required much too big a change to
	the language, and ended up talking myself out of that
	position.

	David Hough sent (part of) my very sketchy proposal to
	comp.lang.c.

I should point out that *no* part of the suggestion is original to me.
Conformant arrays are part of ISO Pascal, and Turing has them.

Doug Gwyn writes:
> I didn't see how this proposal would work.  For correct code
> to be generated, all but one of the array dimensions has to
> be known via the calling sequence.  There seem to be only
> two ways to achieve this:  (1) rely on explicit dimension
> parameters, as in the first example; (2) automatically pass
> extra size information along with the array address.  The
> second approach is incompatible with current treatment of
> arrays, unless the function prototype has parameters declared
> some special way (perhaps using yet another type qualifier),
> so that the compilers will pass array information specially
> for the particular function.  In other words, to make method
> (2) work, more must be involved than just the function
> definition.  Is this what was intended by your second example?
> (Since the body of the example didn't make use of ar, ac, etc.,
> it's hard to be sure.)
	[ there was meant to be an assert() checking the comments,
	  but I forgot.  Sorry about that, but it wasn't ready for
	  "publication".
	]
The answer is that I wrote that it would not be legal to call a
function with a conformant array parameter unless the actual
function definition or a prototype were in scope, so that the
compiler would know to pass the dimension information.  This is
easily checked by a tool like "lint":  simply mark in the symbol
table each function which is called through through an implicit
or old-fashioned declaration, and complain about any marked
function which has a conformation array parameter.  Functions
which take a variable number of parameters are already subject to
just such a restriction.  The declaration of conformant
dimension parameters "? id" is precisely the "special way" of
declaration which Doug Gwyn rightly claims to be necessary.

And yes, conformant array parameters *are* incompatible with the
current treatment of arrays.  If you have
	p[D1]...[Dn]
then
	p[E1]...[Ek]
would be a pointer only if none of Dk+1,...,Dn were a conformant dimension
parameter (?id) [that's "only if", not "if and only if"], and
 sizeof p[E1]...[Ek]
would be legal if and only if none of Dk+1,...,Dn were a conformant
dimension parameter.

Oddly enough, a weak spot in C comes to our aid:  Pascal has this
nasty problem:  what does
	procedure p(var a: array [la..ua: integer] of char;
		    var b: array [lb..ub: integer] of char);
	    begin
		a := b;
	    end {p};
mean?  C wouldn't have that problem, because it lacks array assignment.


I have managed to convince myself that
 o David Hough is right: something like this IS necessary if one is
   to be able to write mathematical libraries in C with at least the
   same generality as Fortran (e.g. EISPACK, LINPACK, IMSL, NAG, ...)
 o conformant array parameters, though a *hack*, are at least a
   hack which is known to work
 o conformant array parameters can be added to C without major
   distortion, though they would be exceptions to "sizeof" and
   quiet-conversion-to-pointer

I have NOT yet managed to convince myself that
 ? conformant array parameters should be added to C
but I think one could make a considerably better argument for this
addition than for 'noalias'.

chris@trantor.umd.edu (Chris Torek) (02/22/88)

In article <676@cresswell.quintus.UUCP> ok@quintus.UUCP
(Richard A. O'Keefe) writes:
>Background:
>	David Hough sent me his comments on dpANS C.
>	He wants a language which can do at least as much as Fortran could.
>
>	One of his criteria was that you should be able to write
>	subroutines that handle multi-dimensional arrays sensibly.

I would claim that the `best' way to do this is to use the row-vector
approach.  To create a two-dimensional array, e.g., do the following:

	typedef struct dmat2 {
		int	m2_nrows;
		int	m2_ncols;
		double	**m2_m;		/* row vectors */
		double	*m2_data;
	} dmat2;

	dmat2 *
	create(rows, cols)
		register int rows, cols;
	{
		register dmat2 *m;
		register int r;

		m = (dmat2 *)malloc(sizeof(dmat2));
		if (m == NULL)
			return (NULL);
		m->m2_rows = rows;
		m->m2_cols = cols;
		m->m2_m = (double **)malloc(rows * sizeof(double *));
		if (m->m2_m == NULL) {
			free((char *)m);
			return (NULL);
		}
		m->m2_data = (double *)malloc(rows * cols * sizeof(double));
		if (m->m2_data == NULL) {
			free((char *)m->m2_m);
			free((char *)m->m2_data);
		}
		for (r = 0; r < rows; r++)
			m->m2_m[r] = &m->m2_data[r * cols];
		return (m);
	}

These arrays can then be used as

	m->m2_m[r][c] = val;

and parts of the array can be passed to functions with

		f(&m->m2_m[off], rows - off, cols);

Note that a whole matrix can be passed (by reference) with

	f(m);

If you dislike having to specify the range for subarrays, you could
create a `make subarray' function:

	dmat2 *
	makesub(m, roff, coff, rsize, csize)
		dmat2 *m;
		int roff, coff, rsize, csize;
	{

		if (m->m2_rows - roff < rsize || m->m2_rows - coff < csize)
			/* it is not that big! */...
		etc.
	}

This does, of course, require more memory to describe, but row vectors
are generally faster anyway.  Since the whole matrix is still there
as a single block of memory, you can use existing routines as well
(by passing m->m2_data).

So why is this solution insufficient?
-- 
In-Real-Life: Chris Torek, Univ of MD Computer Science, +1 301 454 7163
(hiding out on trantor.umd.edu until mimsy is reassembled in its new home)
Domain: chris@mimsy.umd.edu		Path: not easily reachable

gwyn@brl-smoke.ARPA (Doug Gwyn ) (02/22/88)

In article <676@cresswell.quintus.UUCP> ok@quintus.UUCP (Richard A. O'Keefe) writes:
-I have NOT yet managed to convince myself that
- ? conformant array parameters should be added to C
-but I think one could make a considerably better argument for this
-addition than for 'noalias'.

Send this in as a formal public comment and I'll support it.

gwyn@brl-smoke.ARPA (Doug Gwyn ) (02/22/88)

In article <2336@umd5.umd.edu> chris@trantor.umd.edu (Chris Torek) writes:
>So why is this solution insufficient?

Because it's a pain in the lower part of the anatomy!
Or, in other words, just because it is theoretically sufficient
does not mean that it is convenient.
This is one of the limitations of C that Fortran programmers
have difficulty with; they're used to being able to write
reasonable array manipulation routines.

edw@IUS1.CS.CMU.EDU (Eddie Wyatt) (02/23/88)

> >So why is this solution insufficient?
> 
> Because it's a pain in the lower part of the anatomy!
> Or, in other words, just because it is theoretically sufficient
> does not mean that it is convenient.
> This is one of the limitations of C that Fortran programmers
> have difficulty with; they're used to being able to write
> reasonable array manipulation routines.

   How about matrix assignment (a = b).  If you have matrices
contiguously in memory you can do a "single" bcopy to move the data.
The optional representation doesn't allow this.

  Matrix addition  (a = b + c).  while (--size >= 0) *a++ = *b++ + *c++;

    Questionable whether this is actually faster than a[i][j] method
on the non-contiguous matrix.

   A pain that I'm faced with is transferring data between machines.
If the machines are of the same type, I do some heavy optimizations
on data transfer.  In the case of the matrices, I write the whole
matrix at once into a message.  If I where to use the optional 
representation I couldn't perform this optimization.


-- 

Eddie Wyatt 				e-mail: edw@ius1.cs.cmu.edu

gwyn@brl-smoke.ARPA (Doug Gwyn ) (02/23/88)

In article <929@PT.CS.CMU.EDU> edw@IUS1.CS.CMU.EDU (Eddie Wyatt) writes:
-   How about matrix assignment (a = b).  If you have matrices
-contiguously in memory you can do a "single" bcopy to move the data.
-The optional representation doesn't allow this.

We weren't proposing to make [][] objects have non-contiguous storage.

ok@quintus.UUCP (Richard A. O'Keefe) (02/23/88)

In article <2336@umd5.umd.edu>, chris@trantor.umd.edu (Chris Torek) writes:
> In article <676@cresswell.quintus.UUCP> ok@quintus.UUCP
> (Richard A. O'Keefe) writes:
> >Background:
> >	David Hough sent me his comments on dpANS C.
> >	He wants a language which can do at least as much as Fortran could.
> >
> >	One of his criteria was that you should be able to write
> >	subroutines that handle multi-dimensional arrays sensibly.
> 
> I would claim that the `best' way to do this is to use the row-vector
> approach.  To create a two-dimensional array, e.g., do the following:
> This does, of course, require more memory to describe, but row vectors
> are generally faster anyway.  Since the whole matrix is still there
> as a single block of memory, you can use existing routines as well
> (by passing m->m2_data).
> 
> So why is this solution insufficient?

I like the approach Torek recommends, having met this approach on the
B6700.  But the sad fact of the matter is that C doesn't support it.
C *will* let you do it yourself, but it *won't* do it for you.
Torek's own example is a better illustration of how much *work* it is
to use this approach in C than I could have come up with, and the
result doesn't look much like ordinary array access in C.  The point
of the conformant array suggestion is that
 o it is a proven technique
 o for hacking around a basic design flaw in Pascal that is also present in C
 o it lets you use array parameters *directly* 

Torek is absolutely right that "row vectors" are a very sensible
implementation technique, but having the machinery show through is
not a sensible >language< technique.

I'm sure that something better than conformant arrays can be done.
But we're looking for something that makes it EASY to pass array
parameters and EASY to use them, nearly as easy as passing fixed
size arrays would be.  We're also looking for something which will
be the same for all C programmers:  if I want to pass an array to
a function I got from Fred, and to pass the same array to a function
I got from Joe, I really do *not* want to find out that Fred used
row pointers and Joe used column pointers.

edw@IUS1.CS.CMU.EDU (Eddie Wyatt) (02/23/88)

> 
> We weren't proposing to make [][] objects have non-contiguous storage.

  Excuse me Chris.  I glanced over your post and ASS-u-ME-d
that you were proposing the array of pointers to arrays solution
without carefully looking at you code.  On second glance I noticed that
you allocate the whole memory block contiguously hence my conplaints
are no valid.  I'll go crawl back into my whole now. 8-)



-- 

Eddie Wyatt 				e-mail: edw@ius1.cs.cmu.edu

ERICMC@USU.BITNET (Eric McQueen) (02/24/88)

Summary:  Standard run-time function would be a better solution.

Problems with conformant arrays:
  - Complicates what should be a simple compiler (we don't want ADA, please!)
  - Does not allow for allocation of matrices at run time (unless I mis-
        understand the proposal) -- why make C as bad as Fortran?

I believe the addition of a couple of simple functions to the standard library
would provide more power than conformant arrays, is more within the "spirit
of C", and does not complicate the compiler (I can provide source for these
routines written in portable C -- no extra effort required when developing the
compiler for a new environment).

In message <42529@sun.uucp> David Hough and/or Richard O'Keefe write:
>     I know no C translation that's as clear as the  follow-
>ing Fortran code:
>        SUBROUTINE MATMUL(X,LX,Y,LY,Z,LZ,NX,NY,NZ)
>        REAL X(LX,*),Y(LY,*),Z(LZ,*)
>[arrays are accessed via "X(I,J)"]
>[then proposes the following C syntax]
>        void matmul(double a[?ar][?ac], /* ar >= p, ac >= q */
>                    double b[?br][?bc], /* br >= q, bc >= r */
>                    double c[?cr][?cc], /* cr >= p, cc >= r */
>                    int p,              /* 0 <= p <= min(ar,cr) */
>                    int q,              /* 0 <= q <= min(ac,br) */
>                    int r)              /* 0 <= r <= min(bc,cc) */
>[arrays are accessed via "a[i][j]"]
>[mentions having to disallow sizeof() in some cases]

As has been pointed out several times [recently in <2336@umd5.umd.edu> by Chris
Torek (chris@trantor.umd.edu)], the following is already possible:

void matmul( double **a, double **b, double **c, int p, int q, int r)
{       /* arrays are accessed via "a[i][j]" */

and is much simpler.  The only problem I see with this is that it is difficult
to declare such an array in the calling procedure.  Several people have
proposed methods to allocate "row vector" arrays.  I feel that some such
method should be standardized by the committee and routines to handle it be
provided in the standard library.  The allocation of arrays would be as simple
as:

  float ***m;   /* 3-Dimensional Matrix ALLOCation: */
        m = d3malloc( sizeof(float), 2, 3, 4 );

or:

  float **m;
  int d[] = { 2, 3, 4 };        /* Multi-Dimensional Matrix ALLOCation: */
        m = mdmalloc( sizeof(**m), 3, d );

Both of which would yield:

  m:  &a  &b   0        /* Pointers ("0" means "NULL"): */
  a:  &c  &d  &e   0
  b:  &f  &g  &h   0

  c:  000 001 002 003   /* Elements (must be contiguous): */
  d:  010 011 012 013
  e:  020 021 022 023
  f:  100 101 102 103
  g:  110 111 112 113
  h:  120 121 122 123

where "102" represents where "m[1][0][2]" is stored.  We allocate room for
11 pointers in a contiguous block and for 24 numbers in a contiguous block.
The space required for the pointers would on the order of that required if you
increased the last array dimension by 1 (not too bad).  The null pointers are
included to allow efficient traversal of the entire array.

Of course, this method doesn't provide for the Fortran trick of:
        real a(10,20), b(9,14), c(5,11)
        call matmult( a(2,3), 10, b(4,1), 9, c(2,5), 5, 2, 2, 2 )
for getting at strange sub-matrices.  But I think you should do this via:
        void matmult( a, ar, b, br, c, cr, p, q, r )
          float *a, *b, *c;  int ar, br, cr, p, q, r;
        #define  A(i,j)  a[ar*i+j]
which is what Fortran is really doing anyway.

I plan to send a more detailed proposal along these lines to the committee.
Any suggestions about my choice of allocation style would be appreciated.
For the sake of definateness, here are the basic prototypes:

void *mdmalloc(
  size_t siz    /* Size of elements in the array. */
, int    ndim   /* Number of dimensions. */
, int    d[]    /* Maximum dimensions "arr[ d[0] ]...[ d[ndim-1] ]". */
);

void *d2malloc( size_t siz, int d1, int d2 ) {
  int d[2];  d[0] = d1;  d[1] = d2;  return( mdmalloc( siz, 2, d );
}
void *d3malloc( size_t, siz, int d1, int d2, int d3 ) {
  int d[3];  d[0] = d1;  d[1] = d2;  d[2] = d3;  return( mdmalloc( siz, 3, d );
}

void mdmfree( void *arr, int ndim );
#define   d2free( arr )   d2free( arr, 2 )
#define   d3free( arr )   d2free( arr, 3 )

I think some strange systems would want to provide special versions that deal
only with certain data types if, say, sizeof(float *) < sizeof(void *):

float   **d2falloc( int d1, int d2 );
double ***d3dalloc( int d1, int d2, int d3 );

But I think most systems would just use:

#define  d3dalloc(d1,d2,d3)  (double ***) d3malloc( sizeof(double), d1, d2, d3 )

Is there any reason to:
    *Explicitly* provide for allocating multi-dimensional arrays of anything
        other than float, double, int, or long?
    Force non-strange systems to provide {md,d2,d3}{f,d,i,l}alloc()?
    Provide for the allocation of rows separately in case of fragmented memory?
    Make the inclusion of the null pointers optional or drop them altogether?
    Include the dimensions or number of dimensions in the array structure?
    Not include these routines in the standard?  (no-cost added option?)

Comments not of general interest to the net could be mailed directly to me.
I will summarize significant comments.  (Comments not of significant interest
should just be shouted at your roommate/boss/dog.)

---
Eric Tye McQueen          Mathematics Department        Also at (after some
ericmc@usu.bitnet         Utah State University           time in March):
 (801) 753-4683           Logan, Utah  84322-3900       ericmc@usu.usu.edu

   UUCP:  ...{uunet,psuvax1}!usu.bitnet!ericmc     "Nothing is obvious
  CSNET:  ericmc%usu.bitnet@relay.cs.net            unless you're over-
   Arpa:  ericmc%usu.bitnet@cunyvm.cuny.edu         looking something."

bvs@light.uucp (Bakul Shah) (02/25/88)

Multi-dimensional arrays are a useful feature and David Hough / Richard
O'Keefe's proposal of conformant arrays is usable & implementable.  But
the proposed conformant arrays are not elegant and, in my opinion, it is
a *bad* idea to add them to C (atleast as bad as adding structure
comparison or nested procedures or name-your-favorite-non-C-feature).

Such things are best left for C++.  No new features are needed to
implement multi-dimensional arrays in C++, Chris Torek's ``row vectors''
would do just fine AND, and the machinery will be hidden (what O'Keefe
wants in a >sensible< language).

If they are added to C, C++ will be forced to carry them (thereby
introducing an unneeded hack)  OR, C++ will *not* be upward compatible
to C.  Since conformant arrays do not exist in C today, only new code
will use this feature (if added).  Why not use C++ or Chris Torek's idea
for new code?

C just doesn't have the bandwidth to do many sensible things.  Attempts
to increase this bandwidth in an elegant way is extremely difficult, if
not impossible.

My two cent's worth.
-- 
Bakul Shah

..!{ucbvax,sun}!amdcad!light!bvs

kurtk@tekcae.TEK.COM (Kurt Krueger) (02/26/88)

Look at "The C Programming Language" by Kernighan & Ritchie on p. 110.  If you
set up your array like  int *b[10]  you can treat it as a conformal 2-d array
without having to pass ANY bounds information (except for whatever the
procedure logic requires).

henry@utzoo.uucp (Henry Spencer) (02/26/88)

>      Recommendation: Fix, by language design  if  necessary,
> the  treatment  of  arrays  in  C  to  properly  accommodate
> multiply-dimensioned arrays whose  dimensions  vary...

This is an interesting idea, and fortunately you are just in time to have
some chance of getting it into the next revision of C, five years or so
hence.  Here's what you do:

	(a) implement it in a C compiler
	(b) use it, and have others use it, for a few years
	(c) write it up and submit it formally

I guarantee that if you follow these steps, IN THE ABOVE ORDER, X3J11 will
at least listen attentively when you propose it.  Note that writing up the
proposal and submitting it comes after you've tried it out, *in C*, not
before.

(Yes, X3J11 has done a few things that have not been based on prior
experience.  Conformant arrays are arguably a better idea than, say,
noalias.  This is not much of a recommendation, however, since noalias
is increasingly looking like the biggest mistake X3J11 has ever made.)

Alternatively you might consider doing it in C++, which probably won't
require any compiler changes at all.  X3J11 has never claimed that C is
the answer to all mankind's problems.  You will be listened to *more*
attentively if you can explain why C++ does *not* solve your problem.
-- 
Those who do not understand Unix are |  Henry Spencer @ U of Toronto Zoology
condemned to reinvent it, poorly.    | {allegra,ihnp4,decvax,utai}!utzoo!henry

ok@quintus.UUCP (Richard A. O'Keefe) (02/26/88)

In article <1988Feb24.165307.4938@light.uucp>, bvs@light.uucp (Bakul Shah) writes:
> Multi-dimensional arrays are a useful feature and David Hough / Richard
> O'Keefe's proposal of conformant arrays is usable & implementable.  But
...
> Such things are best left for C++.
...
> If they are added to C, C++ will be forced to carry them (thereby
> introducing an unneeded hack)  OR, C++ will *not* be upward compatible
> to C.  Since conformant arrays do not exist in C today, only new code
> will use this feature (if added).  Why not use C++ or Chris Torek's idea
> for new code?

I am sorry to blaspheme against anyone's religion, but IHMO, C++ itself
is an unneeded hack.  Well, add a smiley to "unneeded", but "hack" I am
serious about.  C++ is the Fortran 8X of object-oriented languages.
X3J3, in trying to put together a language with all sorts of things which
are in themselves good, yet which contains Fortran 77 as a subset, have
created a monster.  C++ is just such a monster.  The things that Stroustrup
added to C are in themselves fine things, the trouble was that more things
should have been left out.  If C++ is not upwards compatible with C, so
much the better, and about time too!  There is something distinctly odd
about letting the C++ tail wag the C dog.

C is not the only language in the world.  It would be so nice to have
intercallability between ANSI C and ISO Pascal.  It is particularly
important not to commit to a particular device (such as Chris Torek's
idea) so that an implementor can provide something which can be connected
in a sensible fashion to Pascal or Turing or PL/I or even (via a wee bit of
macro expansion) to Fortran.  If I go to the trouble of coding a simplex
algorithm in C, and run it on an IBM mainframe, I would like the C vendor
to have been able to select an implementation of conformant array parameters
which is compatible with PL/I.  Or if I find that there is a better one
already there in PL/I, I don't want to have written my program using
edge-vectors so that I am *guaranteed* incompatibility with everything in
sight!

It is only too credible that conformant array parameters may not be the
best way of achieving the goals of
  o minimal change to the language and implementation
  o making it straightforward to write functions with multidimensional
    array parameters
  o avoiding over-specification, so that intercallability with other
    language is not impaired more than we can help
But it is CERTAIN that Bakul Shah's suggested alternatives
> Why not use C++ or Chris Torek's idea for new code?
do not meet those goals.

ark@alice.UUCP (02/26/88)

In article <694@cresswell.quintus.UUCP>, ok@quintus.UUCP writes:
> The things that Stroustrup
> added to C are in themselves fine things, the trouble was that more things
> should have been left out.

A challenge: name five things that ``should have been left out.''

edw@IUS1.CS.CMU.EDU (Eddie Wyatt) (02/26/88)

> Look at "The C Programming Language" by Kernighan & Ritchie on p. 110.  If you
> set up your array like  int *b[10]  you can treat it as a conformal 2-d array
> without having to pass ANY bounds information (except for whatever the
> procedure logic requires).


   But you have to access the array as if it where a n by 10 matrix.
This method does not provide for n by m by .... arrays.  At most
you may leave the first dimension (the rows) unspecified.




-- 

Eddie Wyatt 				e-mail: edw@ius1.cs.cmu.edu

edw@IUS1.CS.CMU.EDU (Eddie Wyatt) (02/26/88)

> 
> C is not the only language in the world.  It would be so nice to have
> intercallability between ANSI C and ISO Pascal.  It is particularly
> important not to commit to a particular device (such as Chris Torek's
> idea) so that an implementor can provide something which can be connected
> in a sensible fashion to Pascal or Turing or PL/I or even (via a wee bit of
> macro expansion) to Fortran.  If I go to the trouble of coding a simplex
> algorithm in C, and run it on an IBM mainframe, I would like the C vendor
> to have been able to select an implementation of conformant array parameters
> which is compatible with PL/I.  Or if I find that there is a better one
> already there in PL/I, I don't want to have written my program using
> edge-vectors so that I am *guaranteed* incompatibility with everything in
> sight!

  The issue of intercallability procedures is no simple matter.  You
have to deal with the mapping of data types from one language to the
next.   A mapping of a data type between languages need not be trivial
or computational inexpensive.

   Unions in C to the language of your choice is one example of
a non-trivial mapping.  Union in C are indiscriminate - you need not
have a tag fields explicitly determining the data type of the 
union.  Because of this you can't determine the equivalent representation
in another language.  Another example would be mapping structures
in C to Fortran.  Fortran doesn't have a niece equivalent.

   I seem to recall that FORTRASH :-) (or at least some versions) uses column
major form for storing arrays and C uses row major form.  To move
multi-dimensional arrays (tensor) from one language to the next
you must preform a transpose.  This could get expensive for large
matrices.

  However, I didn't miss part of what you were advocating which
was common representation for data types.  Well, let me add why I
don't think this is a good idea - at least not yet and
here's an example why.  I seem to remember Simula  uses row-vector
representation for matrices.  One of the advantages of this
is that the matrix my have dynamic bounds.  A disadvantage
it that indexing becomes more expensives (a couple levels of
indirection per matrix access).  The point being made it that
the choice of data representations is dictated by the 
design goals of the language.  In many cases there are tradeoffs
between flexablity  and efficiency.
-- 

Eddie Wyatt 				e-mail: edw@ius1.cs.cmu.edu

reeder@ut-emx.UUCP (William P. Reeder) (02/27/88)

In article <1491@tekcae.TEK.COM>, kurtk@tekcae.TEK.COM (Kurt Krueger) writes:
> Look at "The C Programming Language" by Kernighan & Ritchie on p. 110.  If you
> set up your array like  int *b[10]  you can treat it as a conformal 2-d array
> without having to pass ANY bounds information (except for whatever the
> procedure logic requires).

Since no one else has had the gumption to say it, I will.

Can this discussion continue without cross-posting to comp.lang.fortran?

I know it started out as a discussion of what fortran could learn from
C, but it has long since deviated to be a pure C discussion.

Thanks.

karl@haddock.ISC.COM (Karl Heuer) (02/27/88)

In article <8802240725.AA22255@jade.berkeley.edu> ERICMC@USU.BITNET (Eric McQueen) writes:
>Summary:  Standard run-time function would be a better solution.
>
>As has been pointed out several times [recently in <2336@umd5.umd.edu> by
>Chris Torek (chris@trantor.umd.edu)], the following is already possible:
>void matmul( double **a, double **b, double **c, int p, int q, int r) {...}
>and is much simpler.  The only problem I see with this is that it is
>difficult to declare such an array in the calling procedure.

Yes.

>[X3J11 should standardize a function to allocate such row-vector arrays, e.g.
>  float **a = d2malloc( sizeof(float), nrows, ncols ); ]

Now we have a problem.  How does the routine know that the row vectors should
be "float *" rather than some other pointer type?  This design implicitly
assumes that all pointers look alike.

Now, you can (and do) assert that only "strange systems" have this problem,
but that doesn't make it go away.  If you insist that an ANSI implementation
must supply d2malloc(), and that strange systems must therefore be non-ANSI,
you're going against the X3J11 charter to work on as many systems as possible.
If you allow an implementation to omit d2malloc(), it kind of defeats the
purpose of putting it into the standard.

To put it another way, if I know of a system where your function can't
possibly be implemented, I can't comfortably use it in my portable programs,
even if it somehow gets put in the standard.

What can we salvage?  As you mentioned, you could make explicit functions for
the most common datatypes (int, long, float, double).  The implicit rule that
all struct pointers are equivalent allows you to do it for generic structs,
too; this ought to be useful.

Alternately, you could petition X3J11 to add this with the syntax
  float **a = d2malloc( float, nrows, ncols );
where the first argument is the type.  On non-strange systems this has the
simple implementation you outlined; the first argument is used only as a
parameter to sizeof().  On strange systems, this would require assistance from
the compiler via a builtin.  (Cf. <stdarg.h>)

Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint

ok@quintus.UUCP (Richard A. O'Keefe) (02/27/88)

In article <973@PT.CS.CMU.EDU>, edw@IUS1.CS.CMU.EDU (Eddie Wyatt) writes:
>   The issue of intercallability procedures is no simple matter.  You
> have to deal with the mapping of data types from one language to the
> next.   A mapping of a data type between languages need not be trivial
> or computational inexpensive.

>    I seem to recall that FORTRASH :-) (or at least some versions) uses column
> major form for storing arrays and C uses row major form.  To move
> multi-dimensional arrays (tensor) from one language to the next
> you must preform a transpose.  This could get expensive for large
> matrices.
> 
>   However, I didn't miss part of what you were advocating which
> was common representation for data types.

I begin to appreciate the problems that X3J11 have faced.

NO!  I was not advocating common representation of data types!
At least, not in the standard as such.
What I was advocating was that the standard should not go out
of its way to PRECLUDE intercallability, which is a different point.

The point about Fortran using column-major and other languages (Algol
60, Pascal, ADA, C, ...) using row-major is a rather old red herring.
So a Fortran subroutine might use MAT(I,J) and the C caller would
identify the exact same element as mat[j][i]. So what?  ADA[*] uses the
same order as C, and nobody seems to think ADA can't interface to
Fortran.  Since one is passing a description of an array to the other
language, the cost of transposing the array proper is not an issue:
what matters is what it costs to pass the description of the transpose.
(I am *not* suggesting that descriptor-passing should be *required* or
that the need for transposition is genuine.)  Similarly, C would use a
fixed lower bound of zero, but Turing would use a constant expression,
and Pascal would be given two bounds.  So what?  Any reasonable
bijection will do, it doesn't have to be the identity.

(By the way, my spinor calculus teacher would kill you for confusing
arrays and tensors.  Fortunately, he's not here.  :-)

The C standard cannot possibly require intercallibility, because it has
no authority over other languages.  All I claim for the conformant
array parameter approach is that it avoids over-specification, and,
since this technique is already in use in other languages, is unlikely
to make intercallability worse, and if any other mechanism were to be
adopted, not *precluding* intercallability is a good criterion.

Someone suggested taking the auxiliary-array-of-pointers approach and
requiring it in the standard library.  That approach may be a wonderful
implementation method, but it is too specific.  (C is already too much
like Lisp, no point in making it worse.)  The worst thing about that
approach is that it requires explicit memory management.  It's great
that C permits explicit memory management, but *requiring* it does not
come under the heading of "acts of kindness".  Why should I have to
explicitly manage a block of pointers to do a trivial thing that
ALGOL 60 did 28 years ago, PL/I does, Pascal does, ... and so on.

Oh well, someone promised me a copy of GNU CC, I guess I'd just better
wait for the other half to arrive, then start hacking.

[*] ADA is a birthmark of the DuD UFO.

ok@quintus.UUCP (Richard A. O'Keefe) (02/29/88)

In article <7715@alice.UUCP>, ark@alice.UUCP writes:
> In article <694@cresswell.quintus.UUCP>, ok@quintus.UUCP writes:
> > The things that Stroustrup
> > added to C are in themselves fine things, the trouble was that more things
> > should have been left out.
> 
> A challenge: name five things that ``should have been left out.''

The challenge is easily met.  But we have to nail down an ambiguity
in "should have been left out" first.

Given that Stroustrup *wanted* a high-powered language which was upwards
compatible with C, he didn't have any choices at all about what to leave
out:  *nothing* could be left out.

But the greatest virtue of C is that it doesn't get in your way much.
It doesn't have a lot of positive virtues.
This is no reflection on the designers of C.  C wasn't *supposed* to
be an early ADA, it was supposed to be a fairly minimal sort of tool.

So when I said that "more things should have been left out", the
question I was addressing is "what should a successor to C look like,
which tries to stay as comprehensible as C, but which tries to make it
easier to write correct programs?"  I repeat that this is not the
problem that Stroustrup was trying to solve, so my claim that some
things we'd be better off without is not a reflection on Stroustrup.

1.  Integer types as 'char', 'short', 'long' &c.
    This is KNOWN to be a cause of portability problems.
    It's fine when you are programming a particular machine, and want
    to be certain that things are the size you think they are.
    It's terrible for writing portable programs.

    These days, there is no good reason for a programming language to
    let the machine dictate the range of integers.  (The machine does
    dictate what range of integers is *efficient*, but that's another
    question.)  SUN didn't let the fact that MC68010s have no 32-bit
    multiply instruction dictate to them that int==short, so why should
    the absence of 48-bit integer instructions mean that I can't have
    a variable capable of holding 48-bit integers?  Yes, C++ will let
    me define such a data-type, and will let me define operations on
    it.  But it's my job, and it should be the compiler's.

2.  Identification of integer and boolean types.
    This is a case of over-specification.  On some machines, it might
    be more efficient to use (< 0)=true,(>=0)=false.  It is also a
    cause of confusion.  Suppose that integer and boolean were
    distinct types.  Then
	if (i = 0) ...
    would be a type error.  Similarly, the quiet conversion of
    pointers and floating-point numbers to boolean (in each case,
    equality to 0 is falsehood and difference from 0 is truth)
    is something we'd be better off without.

3.  Implicit 'int'.
    "register i;" is a legal C declaration, hence legal in C++.

    Things used as functions are implicitly "int foo(...);".

    In fact, to keep the number down, let's include old-fashioned
    function declaration syntax.

4.  Untagged unions.

    Not only does C++ have C's unions, it goes out of its way to make
    it easier to trip over your own feet.  I refer to anonymous unions.

    I could write a hymn of praise to tagged unions, discrimination
    case statements, and polymorphic types, and an execration text
    for untagged unions, but let's keep it down.

5.  Identification of 'endcase' and 'break'.

    Possibly the most frequent reason for the presence of a label in
    my C code is that C uses the same symbol for "finish case" and
    "finish loop".  This is particularly bewildering, because BCPL
    had different symbols for the two ('endcase' and 'break'), and
    it didn't make the BCPL compiler any more complicated.

    In fact, let's broaden this to the switch statement as a whole.
    A statement like

	switch (i) { int i; { int i; case 1: ...; }
			if (...) case 2: ...;
			    else case 3: ...;
		   }

    is in fact a legal C statement, and from my reading of the C++
    book it is a legal C++ statement.  (The book explicitly says
    that the controlled statement doesn't have to be compound, and
    *may* contain declarations.)

6.  C-style arrays.

    I can go into more detail if anyone's interested.
    The problem is that the size of an array is part of its type,
    sort of.  It's all very confusing, really.

SUMMARY.
    Each of the things I have mentioned is an obstacle to the
    writing of reliable and portable programs.  Number 6 in particular
    is responsible for the large number of UNIX utilities which die
    horribly in the input data exceed some undocumented limit (and
    the rather wierd limits that *are* documented, see xargs(1)).
    Each of them is something that Stroustrup had to include in C++
    *if* it was to be upwards compatible with C.  Which is why I
    want to use "D", not "C++".

edw@IUS1.CS.CMU.EDU (Eddie Wyatt) (02/29/88)

In article <708@cresswell.quintus.UUCP>, ok@quintus.UUCP (Richard A. O'Keefe) writes:
> 
> 4.  Untagged unions.
> 
>     Not only does C++ have C's unions, it goes out of its way to make
>     it easier to trip over your own feet.  I refer to anonymous unions.
> 
>     I could write a hymn of praise to tagged unions, discrimination
>     case statements, and polymorphic types, and an execration text
>     for untagged unions, but let's keep it down.
> 

 There are tradeoffs in requiring or not requiring tag fields.  It's
the space vs readablity problem again.   In the system that I maintain,
I use indiscriminate unions for space and time reasons.  I use arrays of
indiscriminate unions (TOKENs) to basically represent C structures.  There
are a number of fields predefined by the system for systems use.  One
of these fields is an index into a table that discribes the structure
of the TOKEN.  So the tag fields of the union can be view as external
to the union that they are associated with in my example.   This
save a lot of space.  Time savings comes into play when transferring
these guys over the network.  I only transfer the data object, and
no tag field information.  Again tables  are used to discriminate
the object coming across the net.

BTW I not gunning for you Rich.  I know that I some of my posts may seem to 
be attacking your views.  They really are not, I'm just adding my .02 
to the discussion.  And I have to say, I agree with a lot of your posts.

(P.S. I think I have to get out my differential geometry notes and 
review tensor :-))
-- 

Eddie Wyatt 				e-mail: edw@ius1.cs.cmu.edu

msb@sq.uucp (Mark Brader) (03/01/88)

Karl Heuer (karl@haddock.ima.isc.com) writes:
> ... Alternately, you could petition X3J11 to add this with the syntax
>   float **a = d2malloc( float, nrows, ncols );
> where the first argument is the type. ... [in general]
> ... this would require assistance from the compiler via a builtin.
> 
> Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint

Well, there is also this:

	float **a;
	d2malloc (a, float, nrows, ncols);

with
	#define d2malloc(var,type,nr,nc) { \
		int _i, _j = (nc);\
		(var) = ((type) **) malloc ((_i = (nr)) * sizeof ((type) *));\
		while (_i) (var)[--(_i)] = ((type) *) malloc (sizeof ((type)));\
	}
[Not tested; checks of return status of malloc() omitted for brevity]

But I find this at least equally unsatisfactory.  Certainly there is nothing
in the Draft now that requires a macro that may expand to a statement, and
the above can't be done without temporary variables, thus requiring {...}.

And besides all THAT, what do you do when you want a *3*-dimensional array?

To me, the choice is between the Fortran approach and no change.

Mark Brader		    "Howeb45 9 qad no5 und8ly diturvrd v7 7jis dince
SoftQuad Inc., Toronto	     9 qas 8mtillihemt mot ikkfavpur4d 5esoyrdeful
utzoo!sq!msb, msb@sq.com     abd fill if condif3nce on myd3lf."      -- Cica

cabo@tub.UUCP (Carsten Bormann) (03/01/88)

In article <694@cresswell.quintus.UUCP> ok@quintus.UUCP (Richard A. O'Keefe) writes:
() C++ is the Fortran 8X of object-oriented languages.

What a piece of nonsense.

[For the uninitiated: C++ is not an ``object-oriented language'', it is
 a language that retains the spirit of C and that among other (often more
 important) improvements on C facilitates object-oriented programming.  If
 you compare the position of C++ in the development of the C language to
 the position of a FORTRAN dialect in the history of the FORTRAN language,
 you're better off using FORTRAN-77 as a reference.  Somehow, the recent
 comp.lang.c discussions about ``D'' (e.g. about conformant arrays) seem
 to have picked up the spirit of FORTRAN 8X.]
-- 
Carsten Bormann, <cabo@tub.UUCP> <cabo@db0tui6.BITNET> <cabo@tub.BITNET>
Communications and Operating Systems Research Group
Technical University of Berlin (West, of course...)
Path: ...!pyramid!tub!cabo from the world, ...!unido!tub!cabo from Europe only.

bvs@light.uucp (Bakul Shah) (03/01/88)

In an earlier article I wrote that multi-dimensional arrays are best
left out of ANSI C and for C++.

Richard A. O'Keefe <ok@quintus.UUCP> responded:
>I am sorry to blaspheme against anyone's religion, but IHMO, C++ itself
>is an unneeded hack.  Well, add a smiley to "unneeded", but "hack" I am
>serious about.  C++ is the Fortran 8X of object-oriented languages.
> ...
>It is only too credible that conformant array parameters may not be the
>best way of achieving the goals of
>  o minimal change to the language and implementation
>  o making it straightforward to write functions with multidimensional
>    array parameters
>  o avoiding over-specification, so that intercallability with other
>    language is not impaired more than we can help
>But it is CERTAIN that Bakul Shah's suggested alternatives
>> Why not use C++ or Chris Torek's idea for new code?
>do not meet those goals.

Somewhat different set of goals make sense to me.

1.  Avoid any extensions to C at this point -- this would only delay the
    the standard.  Conformant array parameters or any other means of
    making multi-dimensional arrays a first class object would be a major
    extension.  Leave such extensions to a successor language (I used C++
    in this sense, as an *example* -- I don't care to join a discussion
    of whether C++ is an abortion or an immaculate conception).

2.  A successor language, call it NEXT(C), should first clean up C and
    provide a more regular set of features.

3.  For NEXT(C) minimal change is not as critical as making all new
    features orthogonal (to get maximal use out of a minimal number of
    features).  Conformant array parameters extension may be *minimal*,
    it certainly isn't as orthogonal as some other mechanisms.

4.  Do not standardize a new feature until you have implemented and used
    it for a few years.

Related topics of
    o  what KIND of features should go in NEXT(C),
    o  whether NEXT(C) should be upward compatible to C or a *subset* of C,
    o  whether NEXT(C) should be a high level assembler (like C) or allow
       data abstraction facilities (like CLU does), and,
    o  whether C++ fits the bill or not
warrant further discussion, separate from this one.  I bring up NEXT(c)
mainly to point out that it will be a better language to add MD arrays
(multi-dimensional arrays -- sorry, I can't think of a better name).

In C there ARE some ways of implementing functions that take MD arrays.
They may not be pretty but will get the job done.  In my previous article
I used Chris Torek's proposal as an *example*, not as the *right* way to
implement MD arrays.  Besides, in a modern language you would want more
than one representation for MD arrays.

BTW, MD arrays in C++ using a variation of Torek's idea is *not* an over-
specification.  It is just another piece of useful code, not an extension
to the language.  Now if someone proposed legislating it as the only way
to do MD arrays in C++, I would be very surprised and definitely fight it.

Intercallability is not an issue since an MD array can be mapped one-to-
one from one language to another but you will need some glue in some
cases; even Pascal and Fortran are not intercallable without some glue.
But compatibility with Pascal's conformant array parameters can not be a
desirable goal -- conformant arrays are a hack in *any* language!

-- 
Bakul Shah

..!{ucbvax,sun,py

ok@quintus.UUCP (Richard A. O'Keefe) (03/01/88)

In article <996@PT.CS.CMU.EDU>, edw@IUS1.CS.CMU.EDU (Eddie Wyatt) writes:
> In article <708@cresswell.quintus.UUCP>, ok@quintus.UUCP (Richard A. O'Keefe) writes:
> >     I could write a hymn of praise to tagged unions, discrimination
> >     case statements, and polymorphic types, and an execration text
> >     for untagged unions, but let's keep it down.
> > 
[Recall that this started with me saying that C/C++ compatibility didn't
 strike me as important because I thought C++ would be a better language
 without some of C's features, and having been challenged to name five
 such features, I named six.  C's unions are what they are, and a good C
 programmer will learn to live with them.  I do not advocate changing C.
] Wyatt writes:
> There are tradeoffs in requiring or not requiring tag fields.
> It's the space vs readablity problem again.

I can't really agree.  In Wyatt's own example, he makes it plain that
the information as to what his unions are is held in a field, it's
just that the field is somewhere else.  If you are storing information
about a union somewhere, what's so terrible about storing it as part
of the union?

There is something of a cultural conflict here.  My basic preference is
for languages like Simula and ML, where there is no such thing as an
untagged union, and you don't have to debug using-the-wrong-union bugs
because they just can't happen.  This goes with a view of what a union
*is*, which says that 'union' ("sum" types) is the categorial dual of
'struct' ("product" types), that the "fields" of a union are injection
functions, and it goes with a pattern-matching programming style (the
Simula INSPECT statement, for example).  On the other hand, C has
nothing but untagged union, which goes with a quite different view of
what a union is, namely that it is a storage overlay.  Pascal tried to
compromise, which meant that it offered neither the security of sum
types nor the storage efficiency of untagged unions.

I think everyone will admit that sum types are theoretically clean,
and that union types are riskier.  Further, sum types are better news
for optimising compilers.  I think we can discount the "space saving"
argument:  if space saving were of paramount concern in C, it would
have dynamically allocated arrays, not fixed-size arrays.  As for
transmitting things over networks, presumably there is some reason
why SUN's XDR coding doesn't pack things down to byte, but works in
multiples of 32 bits...

C's unions are what they are, and there are legitimate uses of them.
But are they a desirable feature in a successor to C?

Perhaps we can steer this discussion back to something of more immediate
use to C programmers:

    the six points I listed in the message Wyatt quoted:
  o do other C programmers experience them as nuisances?
  o how do people work around them?

pardo@june.cs.washington.edu (David Keppel) (03/02/88)

In article <1988Feb29.205138.28452@sq.uucp> msb@sq.UUCP (Mark Brader) writes:
 >
 >Karl Heuer (karl@haddock.ima.isc.com) writes:
 >> ... Alternately, you could petition X3J11 to add this with the syntax
 >>   float **a = d2malloc( float, nrows, ncols );
 >> where the first argument is the type. ... [in general]
 >> ... this would require assistance from the compiler via a builtin.
 >
 >Well, there is also this:
 >
 >	float **a;
 >	d2malloc (a, float, nrows, ncols);
 >
 >with
 >	#define d2malloc(var,type,nr,nc) { \
 >		int _i, _j = (nc);\
 >		(var) = ((type) **) malloc ((_i = (nr)) * sizeof ((type) *));\
 >		while (_i) (var)[--(_i)] = ((type) *) malloc (sizeof ((type)));\
 >	}
 >[Not tested; checks of return status of malloc() omitted for brevity]
 >But I find this at least equally unsatisfactory.

Or possibly:

    float **a = (float **) dmalloc( dims, dimsvals, ptrsizes );

where "dims" is an array of n+1 ints for n dimensions, zero terminated,
sizebase is the sizes of the intermediate objects:

    static int dismvals[] = { 5, 6 }
    static int ptrsizes = { sizeof(float), sizeof(float *) }

This assumes that the important thing about the intermediate type pointers
is their size and not their format.  While (I think) this is mostly portable,
I can bet that it isn't entirely portable.  I'm not sure that making it a
builtin would make it any more portable.

	;-D on  (Just my $0.02 in real *copper* pennies)  Pardo

karl@haddock.ISC.COM (Karl Heuer) (03/02/88)

In article <4340@june.cs.washington.edu> pardo@uw-june.UUCP (David Keppel) writes:
> >Karl Heuer (karl@haddock.ima.isc.com) writes:
> >> ... Alternately, you could petition X3J11 to add this with the syntax
> >>   float **a = d2malloc( float, nrows, ncols );
> >> where the first argument is the type. ... [in general]
> >> ... this would require assistance from the compiler via a builtin.
>
>Or possibly: [pass an array of { sizeof(float), sizeof(float *) }]
>This assumes that the important thing about the intermediate type pointers
>is their size and not their format.

There are systems where (char *) and (int *) have the same size but different
formats.  If the type in question is (char[4]), I think the distinction would
be lost completely in your model.

>I'm not sure that making it a builtin would make it any more portable.

Assuming that there are only a finite number of pointer formats, the compiler
could have a builtin `__typeof(type)` which returns a magic number.  The
macro could be defined as
  #define d2malloc(type, nr, nc) _d2malloc(__typeof(type *), nr, nc)
and the library routine could be written
  void *_d2malloc(int typecode, size_t nr, size_t nc) {
    switch (typecode) {
    case __typeof(char *): ...;
    case __typeof(int *): ...;
    }
  }
This seems like a fairly clean solution, and it would make the usage portable.

(Recall that we were assuming that X3J11 would require this to be supported
somehow.  The above shows that my model could be supported on a strange
architecture, in contrast to the original proposal.)

(I did assume that appending "*" yields a valid type; this restriction is
similar to one already in place in <stdarg.h>.  It's still possible without
that restriction, but I find the implementation a bit less clean.)

Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint

throopw@xyzzy.UUCP (Wayne A. Throop) (03/04/88)

> karl@haddock.ISC.COM (Karl Heuer)
>> pardo@uw-june.UUCP (David Keppel)
>>Or possibly: [pass an array of { sizeof(float), sizeof(float *) }]
> [...this is not sufficent because...]
> There are systems where (char *) and (int *) have the same size but different
> formats.  If the type in question is (char[4]), I think the distinction would
> be lost completely in your model.

Right.  Karl's proposed solution:

>   void *_d2malloc(int typecode, size_t nr, size_t nc) {
>     switch (typecode) {
>     case __typeof(char *): ...;
>     case __typeof(int *): ...;
>     }
>   }

... would work, and allow one to code functions to create such array
structures for arbitrary levels of dimensions (albeit nonportably), but
it is an interesting problem to come up with a compact way of defining
these things to any depth of dimension with no extensions to current C.
And try to get it coded portably.  Hmmmm...  (imagine Lurch-like
sandpapery sound of fingertips above keyboard...)

Howzabout this:  A macro alloc_dimention( type, n, initial_value ),
which returns a (type *) pointer to the initial element of an array of n
elements of (type), each initialized to the given initial_value (the
initial value being evaluated "by-name" for each element, of course).
Thus, to get the two-D, N-by-M float array, one would say

        { float ** a = alloc_dimention( float *, N,
                       alloc_dimention( float, M, 0.0 ));
            ...
        }

and references to elements of the array are just a[i][j], and the whole
shebang can be passed to subroutines, wherein references to a[i][j]
would work just as well.  To get a three-dimentional structure declared,
just do the obvious:

        { float *** a = alloc_dimention( float **, I,
                        alloc_dimention( float *, J,
                        alloc_dimention( float, K, 0.0 )));
            ...
        }                

Now, can this macro be implemented?  Well... sort of.  Consider:

    /* This include file defines the "alloc_dimension" macro, and
       associated support cruft.  The macro takes a type, a dimension limit,
       and an initial value.  An array n of type is allocated, each element
       of which is assigned the by-name value of the initial value supplied,
       and a pointer to the first element of the array is returned.
    
       This version assumes that setjmp.h is already included, and malloc
       already defined.
    
    */
    
    typedef struct _lab_node_s { struct _lab_node_s * next;
                                 jmp_buf jbuf;
                                 void *p;
                                 int i;
                               } _lab_node_t;
    static _lab_node_t *_lab_list = 0, *_cur_lab;
    static void *_cur_p;
    
    #define alloc_dimension( t, n, v ) \
        (_cur_lab = (_lab_node_t *)malloc(sizeof(_lab_node_t)),\
         _cur_lab->next = _lab_list, _lab_list = _cur_lab,\
         _lab_list->p = malloc(sizeof(t)*(n)),\
         _lab_list->i = 0,\
         setjmp( _lab_list->jbuf ) < 2 ?(\
             ((t*)_lab_list->p)[_lab_list->i] = (v),\
             ++(_lab_list->i) < (n) ? longjmp(_lab_list->jbuf,1)\
                                    : longjmp(_lab_list->jbuf,2),\
         0):0,\
         _cur_lab = _lab_list,_lab_list = _lab_list->next,\
         _cur_p = _cur_lab->p,free(_cur_lab),(t*)_cur_p)

Of course, this has some unpleasant limitations

- it isn't clear to me that setjmp and longjmp are guaranteed to work
  inside comma expressions like this (though I *think* they are), so
  this may not really be portable code
- freeing the thing becomes a hassle, requires another macro
- some errorchecking was omitted for brevity (whew)
- insufficent error checking for values of dimention limits
- the overkill of using setjmp
- the tedium of using a dynamic frame mechanism when a simple
  static mechanism would do if only we had compile-time execution
  of code a-la lisp macros
- the general cruftiness, slowness, and bulkiness of the code


Now it seems to me that, allowing non-constant expressions in the bounds
of formal array arguments is the minimal, conservative, non-dope-vector,
covers-most-bases solution.  That is:

        g(){
            double a[I][J][K];
            f( a, I, J, K );
        }
        f(a,ilim,jlim,klim)
            double a[ilim][jlim][klim]; /*currently illegal*/
            int ilim,jlim,klim;
        {
            int i,j,k;
            /* do something to each array element */
            for( i=0; i<ilim; ++i ){
                for( j=0; j<jlim; ++j ){
                    for( k=0; k<klim; ++k ){
                        ... a[i][j][k] ...;
                    }
                }
            }
        }

... would be made legal.  The problematic point that the type of the
formal a isn't precisely known until runtime can be made a special case
to lint, and be declared to have unknown behavior when the bounds of the
actual passed don't match those of the formal at run-time.  Probably the
harshest problem is that pointer formats for arrays of arrays of... to
any number of "arrays of" levels all be the same, which isn't true now.

I'd say either leave it as it is and do such things with pointers, or
take the above rather conservative step.  But then, who listens to me?

--
A LISP programmer knows the value of everything, but the cost of nothing.
                                        --- Alan J. Perlis
-- 
Wayne Throop      <the-known-world>!mcnc!rti!xyzzy!throopw

ok@quintus.UUCP (Richard A. O'Keefe) (03/04/88)

In article <662@xyzzy.UUCP>, throopw@xyzzy.UUCP (Wayne A. Throop) writes:
> Now it seems to me that, allowing non-constant expressions in the bounds
> of formal array arguments is the minimal, conservative, non-dope-vector,
> covers-most-bases solution.  That is:
> 
>         g(){
>             double a[I][J][K];
>             f( a, I, J, K );
>         }
>         f(a,ilim,jlim,klim)
>             double a[ilim][jlim][klim]; /*currently illegal*/
>             int ilim,jlim,klim;
>         {
	      ...
>         }
> 
> ... would be made legal.  The problematic point that the type of the
> formal a isn't precisely known until runtime can be made a special case

(That's not a problematic point in my view; the bounds of an array *are*
currently regarded as part of its type, *but* that is a major design
error shared with Pascal.)

What are the differences between this and the C.A.P. proposal, other than
the fact that GNU CC is said to support it, so that this proposal *is*
prior art, and the C.A.P. proposal isn't?

(1) This is exactly the Fortran approach.  There is lots of experience
    with it, and the numerical people for whose sake the C.A.P. proposal
    was made will already be familiar with it.

(2) One of the benefits is that you can easily declare that several
    array parameters are the same size, which is not possible in the
    current version of the C.A.P. proposal.

(3) A major problem is that you can get the arguments wrong.

(4) A minor problem is that a C program may alter arguments like ilim,
    jlim, klim (though presumably lint would warn about this), so a
    compiler either has to be prepared to make a copy of them.
    Conformant array bound parameters are 'const int' by definition, so
    this problem cannot arise.

The current version of the C.A.P. proposal replaces the rather ugly '?'
marker by the alternative keywords 'auto' and 'register', so that one
would declare f in the C.A.P. proposal as

	void f(double a[auto ilim][auto jlim][auto klim])
	    {
		...
	    }

(Note that the type is implicitly 'int', and no explicit over-ride of
this is possible.)  

I think conformant arrays are cleaner than the Fortran hack, but I'd
be happy with either.

peter@sugar.UUCP (Peter da Silva) (03/06/88)

In article <42529@sun.uucp>, dgh%dgh@Sun.COM (David Hough) writes:
>         void matmul(double a[?ar][?ac], /* ar >= p, ac >= q */
>                     double b[?br][?bc], /* br >= q, bc >= r */
>                     double c[?cr][?cc], /* cr >= p, cc >= r */
>                     int p,              /* 0 <= p <= min(ar,cr) */
>                     int q,              /* 0 <= q <= min(ac,br) */
>                     int r)              /* 0 <= r <= min(bc,cc) */

WOw. I wouldn't have believed it... I prefer Fortran's syntax on this one.

	void matmul(double a[ar][ac], int ar, int ac,
	            double b[br][bc], int br, int bc,
	            double c[cr][cc], int cr, int cc,
		    int p,
		    int q,
		    int r)
	{
	}

Or:

	void matmul(a, ar, ac,
	            b, br, bc,
	            c, cr, cc,
		    p,
		    q,
		    r)
	int ar, ac, br, bc, cr, cc, p, q, r;
	double a[ar][ac];
	double b[br][bc];
	double c[cr][cc];
	{
	}

Doesn't require any new operators, doesn't require passing descriptors
around on the stack (yech), and allows you to allocate some space and
give it different dimensions in different places.
-- 
-- Peter da Silva  `-_-'  ...!hoptoad!academ!uhnix1!sugar!peter
-- Disclaimer: These U aren't mere opinions... these are *values*.

throopw@xyzzy.UUCP (Wayne A. Throop) (03/07/88)

> ok@quintus.UUCP (Richard A. O'Keefe)
>> throopw@xyzzy.UUCP (Wayne A. Throop)
>> Now it seems to me that, allowing non-constant expressions in the bounds
>> of formal array arguments is the minimal, conservative, non-dope-vector,
>> covers-most-bases solution.
> What are the differences between this and the C.A.P. proposal, other than
> the fact that GNU CC is said to support it, so that this proposal *is*
> prior art, and the C.A.P. proposal isn't?
> 
> (1) This is exactly the Fortran approach.  There is lots of experience
>     with it, and the numerical people for whose sake the C.A.P. proposal
>     was made will already be familiar with it.

This first point hits the nail directly on the head.  At this stage of
standardization, it seems a Bad Idea to build a paper-mache feature on
top of a plaster-of-paris patch to a weak point in the original K&R
version of the language.  I mean, we are treading on ground related to
array "promotion" to pointers to start with, and are tacking on dubious
and unproven syntax in a newly-proposed portion of the language
(prototypes), and an area for which alternative proposals are likely to
be made (for handling passing arrays by-value, for example).  This
definitely seems the wrong time to be tinkering around in this
particular area in this particular fashion.  Allowing non-constant
expressions in this context doesn't alter the syntax much, and seems a
safer step to take.

But of course, I'm *really* upset that nobody blanched at my
demonstration of how to shoehorn a loop into the body of a
value-returning macro, to get iterated by-name evaluation.  I guess you
folks have stronger stomachs that I gave y'all credit for...

--
The LISP programmer knows the value of everything, but the cost of nothing.
                                        --- Alan J. Perlis
-- 
Wayne Throop      <the-known-world>!mcnc!rti!xyzzy!throopw