dgh%dgh@Sun.COM (David Hough) (02/20/88)
I have been wondering about a problem that's been discussed, often incorrectly, in comp.lang.c. The problem is easy enough to understand if you've ever written linear algebra software, but the correct "in the spirit of C" solution wasn't clear to me. Fortunately Richard O'Keefe has volunteered some ideas. I'd appreciate feedback on them. If you're simply opposed to making C suitable for scientific programming, the net has already heard from you, so you needn't repeat your views, but if you have ideas for better syntax or better ways to achieve the goal, please speak up. The following is one of my comments on the January 1988 X3J11 draft for ANSI C, followed by the proposed solution. Comment #24, Section 3.5.4.2: fix arrays I know no C translation that's as clear as the follow- ing Fortran code: SUBROUTINE MATMUL(X,LX,Y,LY,Z,LZ,NX,NY,NZ) REAL X(LX,*),Y(LY,*),Z(LZ,*) DO 1 I=1,NX DO 2 J=1,NZ SUM=0 DO 3 K=1,NY 3 SUM=SUM+X(I,K)*Y(K,J) 2 Z(I,J)=SUM 1 CONTINUE END Code like this is at the heart of most of the major portable Fortran libraries of mathematical software developed over the last twenty years. The declared leading dimensions of X, Y, and Z are not known until runtime. The Draft, like traditional C, disallows the equivalent void matmul(x,lx,cx,...) int lx, cx; double x[lx][cx] ; unless lx and cx are known at compile time. Equivalent functionality can be obtained by treating all arrays as one-dimensional and doing the subscripting "by hand", so that it's harder to get right and harder to optimize. Fix- ing C's handling of arrays would be a far worthier task for the X3J11 experts than (for instance) specifying details of the math library. Maybe there's some good fundamental rea- son why variables are disallowed as array dimensions; C's treatment of arrays has always been confusing. Note that the section numberings in the Draft and Ra- tionale are out of synch in the declarations subchapter. Recommendation: Fix, by language design if necessary, the treatment of arrays in C to properly accommodate multiply-dimensioned arrays whose dimensions vary at run time. The goal is not to duplicate Fortran's features ex- actly, but rather to insure that portable linear algebra li- braries are as easy to create in C as in Fortran. Array proposal ----- -------- Richard O'Keefe has kindly provided the following out- line of how an array facility might work in C: The smallest possible change would be "conformant arrays" much as in ISO Pascal or in Turing, where the param- eter declaration was something like void matmul(double a[p][q], b[q][r], c[p][r]) { int i, j, k; double t; for (i = p; --i >= 0; ) for (j = r; --j >= 0; ) { t = 0.0; for (k = q; --k >= 0; ) t *= a[i][j]*b[j][k]; c[i][j] = t; } } If p, q, and r are #defined to be constant expressions, this is already legal, so we need one more thing to indicate that p,q,r are being defined here, not used. Consider the fol- lowing: declarator: ... | declarator '[' subscript_spec ']' ; subscript_spec: '?' identifier | constant_expression | /* empty */ ; where /* empty */ is only allowed as the first subscript_spec, and ?id is only allowed in a function header. To avoid having to specify what happens if ?x ap- pears several times but the arguments don't agree with that, make it illegal, so the first suggestion would have to be written void matmul(double a[?ar][?ac], /* ar >= p, ac >= q */ double b[?br][?bc], /* br >= q, bc >= r */ double c[?cr][?cc], /* cr >= p, cc >= r */ int p, /* 0 <= p <= min(ar,cr) */ int q, /* 0 <= q <= min(ac,br) */ int r) /* 0 <= r <= min(bc,cc) */ { int i, j, k; double t; for (i = p; --i >= 0; ) for (j = r; --j >= 0; ) { t = 0.0; for (k = q; --k >= 0; ) t *= a[i][j]*b[j][k]; c[i][j] = t; } } A conformant array may be passed as a parameter to a func- tion provided the function's prototype was visible to con- firm that a conformant array parameter was intended. The simplest way of treating sizeof is to rule it out: if the description of 'a' involves any ?s, you can't apply sizeof to it. So given a parameter float fred[?f1][?f2][10]; sizeof fred and sizeof fred[1] would be illegal, but sizeof fred[1][2] and sizeof fred[1][2][3] would be legal. David Hough ARPA: dhough@sun.com UUCP: {ucbvax,decvax,decwrl,seismo}!sun!dhough
gwyn@brl-smoke.ARPA (Doug Gwyn ) (02/20/88)
In article <42529@sun.uucp> dgh%dgh@Sun.COM (David Hough) writes: > The Draft, like traditional C, disallows the equivalent > void matmul(x,lx,cx,...) > int lx, cx; > double x[lx][cx] ; Yes, indeed, this is a pain. I discussed the possibility of allowing something like this several years ago with Dennis Ritchie and Steve Johnson, and the consensus was that it was "doable", although specifying it correctly is tricky. > void matmul(double a[?ar][?ac], /* ar >= p, ac >= q */ > double b[?br][?bc], /* br >= q, bc >= r */ > double c[?cr][?cc], /* cr >= p, cc >= r */ > int p, /* 0 <= p <= min(ar,cr) */ > int q, /* 0 <= q <= min(ac,br) */ > int r) /* 0 <= r <= min(bc,cc) */ I didn't see how this proposal would work. For correct code to be generated, all but one of the array dimensions has to be known via the calling sequence. There seem to be only two ways to achieve this: (1) rely on explicit dimension parameters, as in the first example; (2) automatically pass extra size information along with the array address. The second approach is incompatible with current treatment of arrays, unless the function prototype has parameters declared some special way (perhaps using yet another type qualifier), so that the compilers will pass array information specially for the particular function. In other words, to make method (2) work, more must be involved than just the function definition. Is this what was intended by your second example? (Since the body of the example didn't make use of ar, ac, etc., it's hard to be sure.)
roy@phri.UUCP (Roy Smith) (02/21/88)
I'm not sure I follow all the details of David's note, but I thought I would throw in my suggestion anyway. Some time ago I had need for variable-dimension arrays in C. What I ended up using was vectored arrays. I wrote an array allocator which called malloc to get memory for the main array and for the row address vector, and which initialized the vector to point to the right places in the main array. After that, you could pretend you had a variable dimensional array; i.e. you could pass a subroutine the pointer returned by arrayalloc and the dimensions you used. It was posted a few years back to mod.sources. -- Roy Smith, {allegra,cmcl2,philabs}!phri!roy System Administrator, Public Health Research Institute 455 First Avenue, New York, NY 10016
ok@quintus.UUCP (Richard A. O'Keefe) (02/22/88)
Background: David Hough sent me his comments on dpANS C. He wants a language which can do at least as much as Fortran could. One of his criteria was that you should be able to write subroutines that handle multi-dimensional arrays sensibly. I wrote back to him about some of his points. I started to argue that this criterion required much too big a change to the language, and ended up talking myself out of that position. David Hough sent (part of) my very sketchy proposal to comp.lang.c. I should point out that *no* part of the suggestion is original to me. Conformant arrays are part of ISO Pascal, and Turing has them. Doug Gwyn writes: > I didn't see how this proposal would work. For correct code > to be generated, all but one of the array dimensions has to > be known via the calling sequence. There seem to be only > two ways to achieve this: (1) rely on explicit dimension > parameters, as in the first example; (2) automatically pass > extra size information along with the array address. The > second approach is incompatible with current treatment of > arrays, unless the function prototype has parameters declared > some special way (perhaps using yet another type qualifier), > so that the compilers will pass array information specially > for the particular function. In other words, to make method > (2) work, more must be involved than just the function > definition. Is this what was intended by your second example? > (Since the body of the example didn't make use of ar, ac, etc., > it's hard to be sure.) [ there was meant to be an assert() checking the comments, but I forgot. Sorry about that, but it wasn't ready for "publication". ] The answer is that I wrote that it would not be legal to call a function with a conformant array parameter unless the actual function definition or a prototype were in scope, so that the compiler would know to pass the dimension information. This is easily checked by a tool like "lint": simply mark in the symbol table each function which is called through through an implicit or old-fashioned declaration, and complain about any marked function which has a conformation array parameter. Functions which take a variable number of parameters are already subject to just such a restriction. The declaration of conformant dimension parameters "? id" is precisely the "special way" of declaration which Doug Gwyn rightly claims to be necessary. And yes, conformant array parameters *are* incompatible with the current treatment of arrays. If you have p[D1]...[Dn] then p[E1]...[Ek] would be a pointer only if none of Dk+1,...,Dn were a conformant dimension parameter (?id) [that's "only if", not "if and only if"], and sizeof p[E1]...[Ek] would be legal if and only if none of Dk+1,...,Dn were a conformant dimension parameter. Oddly enough, a weak spot in C comes to our aid: Pascal has this nasty problem: what does procedure p(var a: array [la..ua: integer] of char; var b: array [lb..ub: integer] of char); begin a := b; end {p}; mean? C wouldn't have that problem, because it lacks array assignment. I have managed to convince myself that o David Hough is right: something like this IS necessary if one is to be able to write mathematical libraries in C with at least the same generality as Fortran (e.g. EISPACK, LINPACK, IMSL, NAG, ...) o conformant array parameters, though a *hack*, are at least a hack which is known to work o conformant array parameters can be added to C without major distortion, though they would be exceptions to "sizeof" and quiet-conversion-to-pointer I have NOT yet managed to convince myself that ? conformant array parameters should be added to C but I think one could make a considerably better argument for this addition than for 'noalias'.
chris@trantor.umd.edu (Chris Torek) (02/22/88)
In article <676@cresswell.quintus.UUCP> ok@quintus.UUCP (Richard A. O'Keefe) writes: >Background: > David Hough sent me his comments on dpANS C. > He wants a language which can do at least as much as Fortran could. > > One of his criteria was that you should be able to write > subroutines that handle multi-dimensional arrays sensibly. I would claim that the `best' way to do this is to use the row-vector approach. To create a two-dimensional array, e.g., do the following: typedef struct dmat2 { int m2_nrows; int m2_ncols; double **m2_m; /* row vectors */ double *m2_data; } dmat2; dmat2 * create(rows, cols) register int rows, cols; { register dmat2 *m; register int r; m = (dmat2 *)malloc(sizeof(dmat2)); if (m == NULL) return (NULL); m->m2_rows = rows; m->m2_cols = cols; m->m2_m = (double **)malloc(rows * sizeof(double *)); if (m->m2_m == NULL) { free((char *)m); return (NULL); } m->m2_data = (double *)malloc(rows * cols * sizeof(double)); if (m->m2_data == NULL) { free((char *)m->m2_m); free((char *)m->m2_data); } for (r = 0; r < rows; r++) m->m2_m[r] = &m->m2_data[r * cols]; return (m); } These arrays can then be used as m->m2_m[r][c] = val; and parts of the array can be passed to functions with f(&m->m2_m[off], rows - off, cols); Note that a whole matrix can be passed (by reference) with f(m); If you dislike having to specify the range for subarrays, you could create a `make subarray' function: dmat2 * makesub(m, roff, coff, rsize, csize) dmat2 *m; int roff, coff, rsize, csize; { if (m->m2_rows - roff < rsize || m->m2_rows - coff < csize) /* it is not that big! */... etc. } This does, of course, require more memory to describe, but row vectors are generally faster anyway. Since the whole matrix is still there as a single block of memory, you can use existing routines as well (by passing m->m2_data). So why is this solution insufficient? -- In-Real-Life: Chris Torek, Univ of MD Computer Science, +1 301 454 7163 (hiding out on trantor.umd.edu until mimsy is reassembled in its new home) Domain: chris@mimsy.umd.edu Path: not easily reachable
ok@quintus.UUCP (Richard A. O'Keefe) (02/23/88)
In article <2336@umd5.umd.edu>, chris@trantor.umd.edu (Chris Torek) writes: > In article <676@cresswell.quintus.UUCP> ok@quintus.UUCP > (Richard A. O'Keefe) writes: > >Background: > > David Hough sent me his comments on dpANS C. > > He wants a language which can do at least as much as Fortran could. > > > > One of his criteria was that you should be able to write > > subroutines that handle multi-dimensional arrays sensibly. > > I would claim that the `best' way to do this is to use the row-vector > approach. To create a two-dimensional array, e.g., do the following: > This does, of course, require more memory to describe, but row vectors > are generally faster anyway. Since the whole matrix is still there > as a single block of memory, you can use existing routines as well > (by passing m->m2_data). > > So why is this solution insufficient? I like the approach Torek recommends, having met this approach on the B6700. But the sad fact of the matter is that C doesn't support it. C *will* let you do it yourself, but it *won't* do it for you. Torek's own example is a better illustration of how much *work* it is to use this approach in C than I could have come up with, and the result doesn't look much like ordinary array access in C. The point of the conformant array suggestion is that o it is a proven technique o for hacking around a basic design flaw in Pascal that is also present in C o it lets you use array parameters *directly* Torek is absolutely right that "row vectors" are a very sensible implementation technique, but having the machinery show through is not a sensible >language< technique. I'm sure that something better than conformant arrays can be done. But we're looking for something that makes it EASY to pass array parameters and EASY to use them, nearly as easy as passing fixed size arrays would be. We're also looking for something which will be the same for all C programmers: if I want to pass an array to a function I got from Fred, and to pass the same array to a function I got from Joe, I really do *not* want to find out that Fred used row pointers and Joe used column pointers.
kurtk@tekcae.TEK.COM (Kurt Krueger) (02/26/88)
Look at "The C Programming Language" by Kernighan & Ritchie on p. 110. If you set up your array like int *b[10] you can treat it as a conformal 2-d array without having to pass ANY bounds information (except for whatever the procedure logic requires).
henry@utzoo.uucp (Henry Spencer) (02/26/88)
> Recommendation: Fix, by language design if necessary, > the treatment of arrays in C to properly accommodate > multiply-dimensioned arrays whose dimensions vary... This is an interesting idea, and fortunately you are just in time to have some chance of getting it into the next revision of C, five years or so hence. Here's what you do: (a) implement it in a C compiler (b) use it, and have others use it, for a few years (c) write it up and submit it formally I guarantee that if you follow these steps, IN THE ABOVE ORDER, X3J11 will at least listen attentively when you propose it. Note that writing up the proposal and submitting it comes after you've tried it out, *in C*, not before. (Yes, X3J11 has done a few things that have not been based on prior experience. Conformant arrays are arguably a better idea than, say, noalias. This is not much of a recommendation, however, since noalias is increasingly looking like the biggest mistake X3J11 has ever made.) Alternatively you might consider doing it in C++, which probably won't require any compiler changes at all. X3J11 has never claimed that C is the answer to all mankind's problems. You will be listened to *more* attentively if you can explain why C++ does *not* solve your problem. -- Those who do not understand Unix are | Henry Spencer @ U of Toronto Zoology condemned to reinvent it, poorly. | {allegra,ihnp4,decvax,utai}!utzoo!henry
edw@IUS1.CS.CMU.EDU (Eddie Wyatt) (02/26/88)
> Look at "The C Programming Language" by Kernighan & Ritchie on p. 110. If you > set up your array like int *b[10] you can treat it as a conformal 2-d array > without having to pass ANY bounds information (except for whatever the > procedure logic requires). But you have to access the array as if it where a n by 10 matrix. This method does not provide for n by m by .... arrays. At most you may leave the first dimension (the rows) unspecified. -- Eddie Wyatt e-mail: edw@ius1.cs.cmu.edu
reeder@ut-emx.UUCP (William P. Reeder) (02/27/88)
In article <1491@tekcae.TEK.COM>, kurtk@tekcae.TEK.COM (Kurt Krueger) writes: > Look at "The C Programming Language" by Kernighan & Ritchie on p. 110. If you > set up your array like int *b[10] you can treat it as a conformal 2-d array > without having to pass ANY bounds information (except for whatever the > procedure logic requires). Since no one else has had the gumption to say it, I will. Can this discussion continue without cross-posting to comp.lang.fortran? I know it started out as a discussion of what fortran could learn from C, but it has long since deviated to be a pure C discussion. Thanks.
peter@sugar.UUCP (Peter da Silva) (03/06/88)
In article <42529@sun.uucp>, dgh%dgh@Sun.COM (David Hough) writes: > void matmul(double a[?ar][?ac], /* ar >= p, ac >= q */ > double b[?br][?bc], /* br >= q, bc >= r */ > double c[?cr][?cc], /* cr >= p, cc >= r */ > int p, /* 0 <= p <= min(ar,cr) */ > int q, /* 0 <= q <= min(ac,br) */ > int r) /* 0 <= r <= min(bc,cc) */ WOw. I wouldn't have believed it... I prefer Fortran's syntax on this one. void matmul(double a[ar][ac], int ar, int ac, double b[br][bc], int br, int bc, double c[cr][cc], int cr, int cc, int p, int q, int r) { } Or: void matmul(a, ar, ac, b, br, bc, c, cr, cc, p, q, r) int ar, ac, br, bc, cr, cc, p, q, r; double a[ar][ac]; double b[br][bc]; double c[cr][cc]; { } Doesn't require any new operators, doesn't require passing descriptors around on the stack (yech), and allows you to allocate some space and give it different dimensions in different places. -- -- Peter da Silva `-_-' ...!hoptoad!academ!uhnix1!sugar!peter -- Disclaimer: These U aren't mere opinions... these are *values*.