fangchin@elaine46.stanford.edu (Chin Fang) (02/02/91)
Greetings, I recently started thinking whether it is possible to take advantages many numerical liner algebra algorithms to their fullest extend in C, and if so, how to come up a clean and hard-limit free implementation of it. Let's use 2d matrices for discussion for now: as is well know among numerical computing people that matrix algorithms have to be "gaxpy" (defined below) rich to gain high performance: gaxpy: given x in Rn, y in Rn and A in Rmxn, the following algorithm computes z=y+Ax. function: z=gaxpy(A,x,y) n=cols(A); z=y for j=1:n z=z+x(j)A(:,j) end end gaxpy I used Cleve Moler's Matlab language for the above pseudo-code for compactness. Now, note that this algorithm requires most rapid changes in indixing occuring in COLUMNS, not rows! I believe in default C design, data is stored in rows (discussed in K&R II) instead in columns. It is almost very common these days people use array of pointers to array as a way to simulate 2d matrices so that (1) no requirement to declare array dimensions beforehand, (2) flexibility in passing "matrices" to functions (3) allowing index offsetting if that's desired. However, if a so declared matrix is used in gaxpy, the memory traffic caused by pointer jumping (figuratively speaking) for a large matrix is significant and I have a strong suspicision that such traffic would slow down one's code. Note, however, FORTRAN store data in columns, so gaxpy is perfectly suitable for codes coded in Fortran and compilied with a good FORTRAN compiler. Since this group is not for numerical computations, I won't get in depth why gaxpy type algorithms are much perferred than others. For details, please see: Gene H. Golub & Charles F. Fan Loan, Matrix Computations, 2nd, 1989, The Johns Hopkins University Press. The leading author is The no. 1 person in the world now in this field and is the head of Stanford's CS dept. Numerical Computations group. BTW, in his group, Fortran is THE language of choice, though not eveyone likes it. Please no flame regarding above comments. I am not interested in C vs. F I would like to know, assuming the way C stores it's data is a deficiency from the point of view of numerical computations, can someone come up a good, hard-limit free way as flexible as the conventional method yet provides more efficiency? Or, if you DON'T think the way data stored in C is a deficiency, what is your justifications? Any hard proofs? Please email me if possible if you like to respond. I will, after three weeks or so, post a summary if there is enough interest. Let me emphasize again, your rationale, experimental results, hardware plateforms (CISC or RISC), running conditions (all in core or swapping occured), OS used, and a simple example of how you implement a matrix to make it more efficient than the conventional way mentioned above all are welcomed, preferrably all included in your response. I would imagine it would be quite difficult to judge a claim without all of these available. Best Regards and Look forward to hearing from you people Chin Fang Mechanical Engineering Department Stanford University fangchin@portia.stanford.edu
willing@pantheon.seas.ucla.edu (Scott Willingham) (02/02/91)
In article <1991Feb1.214342.4982@portia.Stanford.EDU> fangchin@elaine46.stanford.edu (Chin Fang) writes: >... >Now, note that this algorithm requires most rapid changes in indixing occuring >in COLUMNS, not rows! > >I believe in default C design, data is stored in rows (discussed in K&R II) >instead in columns. It is almost very common these days people use >array of pointers to array as a way to simulate 2d matrices so that ... It seems to me that this is just a matter of syntax. Whether you assign the first or second _index_ of a C array to correspond to a matrix row or column is arbitrary. You just must be consistent in your usage throughout the algorithm. -- Scott D. Willingham
fangchin@portia.Stanford.EDU (Chin Fang) (02/02/91)
In article <1748@lee.SEAS.UCLA.EDU> willing@pantheon.seas.ucla.edu (Scott Willingham) writes: >In article <1991Feb1.214342.4982@portia.Stanford.EDU> fangchin@elaine46.stanford.edu (Chin Fang) writes: >>... >>Now, note that this algorithm requires most rapid changes in indixing occuring >>in COLUMNS, not rows! >> >... > >It seems to me that this is just a matter of syntax. Whether you assign >the first or second _index_ of a C array to correspond to a matrix row >or column is arbitrary. You just must be consistent in your usage >throughout the algorithm. > Scott, yes. if you are willing to accept row index as column index. You can use gaxpy ops. But, what a contortion! In addition, for people not aware the importance of gaxpy ops, each "seemly" trivial column element access in fact involves a long ptr jump equal the length of entire row. ie. when they use row index as row index, col index as col index. To me, reversing index would fall into the catagory of "kuldge". But you are right, you can do gaxpy ops with such matrix declarations. Remember I didn't say in my 1st posting that it couldn't be done. In fact, the reason that I posted the question because I used the "kludge" way of using matrices and got teased by my fellow FORTRAN programmers and so far I haven't come up any less kludgy way yet. Finally, I want to make it clear that when I said most rapid changes in indixing occuring in COLUMNS, I meant that the ops "marching" in columns, not rows. The index used in the sentence didn't mean array element index! I was using MATLAB syntax, not C. Regards, Chin Fang Mechanical Engineering Department Stanford University fangchin@portia.stanford.edu
sarima@tdatirv.UUCP (Stanley Friesen) (02/03/91)
In article <1991Feb1.214342.4982@portia.Stanford.EDU> fangchin@elaine46.stanford.edu (Chin Fang) writes: >Now, note that this algorithm requires most rapid changes in indixing occuring >in COLUMNS, not rows! > >I believe in default C design, data is stored in rows (discussed in K&R II) >instead in columns. ... >However, if a so declared matrix is used in gaxpy, the memory traffic caused >by pointer jumping (figuratively speaking) for a large matrix is significant Good grief!! The answer is really quite simple. Store the 'transpose' of the external matrix in the C matrix! That is reverse the order of the indexes, storing the 'real' columns in C 'rows'. The whole thing is just a linguistic convention anyway - if you define the last index as refering to columns instead of rows, the columns are now magically varying the fastest. In this mode it is only during input and output that you do the nasty pointer jumping. And I/O delays will dominate over any inefficiencies there, so it doesn't really matter as much as during calculations. -- --------------- uunet!tdatirv!sarima (Stanley Friesen)
willing@pantheon.seas.ucla.edu (Scott Willingham) (02/03/91)
In article <1991Feb2.071930.9879@portia.Stanford.EDU> fangchin@portia.Stanford.EDU (Chin Fang) writes: >In article <1748@lee.SEAS.UCLA.EDU> willing@pantheon.seas.ucla.edu (Scott Willingham) writes: >>In article <1991Feb1.214342.4982@portia.Stanford.EDU> fangchin@elaine46.stanford.edu (Chin Fang) writes: >>>... >>>Now, note that this algorithm requires most rapid changes in indixing occuring >>>in COLUMNS, not rows! >>> >>... >> >>It seems to me that this is just a matter of syntax. Whether you assign >>the first or second _index_ of a C array to correspond to a matrix row >>or column is arbitrary. You just must be consistent in your usage >>throughout the algorithm. >> >Scott, yes. if you are willing to accept row index as column index. You >can use gaxpy ops. But, what a contortion! > >In addition, for people not aware the importance of gaxpy ops, each >"seemly" trivial column element access in fact involves a long ptr jump >equal the length of entire row. ie. when they use row index as row index, >col index as col index. Why do you insist on naming C array indices as row & column indices? They are simply 1st, 2nd, 3rd, etc. indices. Their syntax makes sense in the context of a computer language's design. I agree that it is convenient for a programming language's systax to correspond directly to common mathematical notation, but I wouldn't call it a kludge to be otherwise. Further, I don't consider it all that confusing -- and most mathematicians seem to adapt to many notational changes much better than I do. I would suggest that you stay with such notation and be sure to comment your code well. Then again, maybe FORTRAN is better suited to your programming needs. However, it has its own set of "inconveniences". It seems to me that a translation of a mathematical algorithm to computer code requires an intimate familiarity with both notations, no matter what language is chosen. Maybe some other net contributers can come up with a more creative solution that I have suggested. I wish you good luck on your project. -- Scott D. Willingham
fangchin@portia.Stanford.EDU (Chin Fang) (02/03/91)
In article <15054@smoke.brl.mil> gwyn@smoke.brl.mil (Doug Gwyn) writes: >In article <1991Feb2.071930.9879@portia.Stanford.EDU> fangchin@portia.Stanford.EDU (Chin Fang) writes: >>"seemly" trivial column element access in fact involves a long ptr jump >>equal the length of entire row. > >If you are concerned about such things, you should traverse the elements >in an order more suited to the implementation language. Many experienced >C programmers would in fact convert some array operations to use one- >dimensional traversals with access via an appropriately incremented >pointer, rather than using [] operators. Doug, you might think my question is trival and I didn't do my homework. I typed in the following code segment (errors may exist however) to show that I am aware what you are talking about, if not in depth. My question is not related the capability of C whatsoever. Maybe I didn't make my wording right in my 1st question. The thing is, sometimes we have our preferences (could be unreasonable ones too, like some people insist ident N blanks). I happen to be uneasy with switching indices, and have been trying to use the following scheme to rid of my own uneasiness. (sort of like Numerical Recipes authors like to use non-zero offsets). From my own point of view, the scheme below works but probably not efficient. That's why I posted my 1st question to see if there are better solutions. Perhaps I should have included the attachment below the first time. Well, if you can forgive my idiosyncrasy.. Yeah, you don't like N.R.'s way of indexing, I guess you will blame me for the same reason. Sigh.... Best Regards, Chin Fang Mechanical Engineering Department Stanford University fangchin@portia.stanford.edu **************** How I hide interchanged row, cols ************** #typedef struct {int nrl, nrh, ncl, nch; DATA **m} * Matrix; Matrix Matrix_alloc(int nrl, int nrh, int ncl, int nch) { /* routine modified from Numerical Recipes utility matrix() */ int i; Matrix m; if(nrl > nrh || ncl > nch) errormsg("bounds are wrong in allocation!"); /* no direct malloc() here, get memory from a free block cache using */ /* routine memory_alloc() */ m=(DATA **)memory_alloc((nch-ncl+1)*sizeof(DATA *)); assert(m); m-=ncl; for(i=ncl;i<=nch;i++){ m[i]=(DATA *)memory_alloc((nrh-nrl+1)*sizeof(DATA)); assert(m[i]); m[i]-=nrl; } return m; } #ifdef MATRIX_DEBUG DATA bound_checked_access(Matrix m,int row, int col) { int WRONG=FALSE; if(row <m->ncl||row >m->nch||col <m->nrl||col >m->nrh) WRONG = TRUE; if(WRONG) errormsg("input bounds wrong in matrix access!"); return m->DATA[col][row]; /* we switch indices here so user won't know */ } #endif #ifdef MATRIX_DEBUG DATA bound_checked_access(Matrix, int, int); #define M(m,row,col) bound_checked_access(m,row,col) #else #define M(m,row,col) m->DATA[col-m->nrl][row-m->ncl] #endif
mcdaniel@adi.com (Tim McDaniel) (02/05/91)
In article <1991Feb1.214342.4982@portia.Stanford.EDU> fangchin@elaine46.stanford.edu (Chin Fang) writes: as is well know among numerical computing people that matrix algorithms have to be "gaxpy" (defined below) rich to gain high performance: Well, talk to the Center for Supercomputing Research and Development at the University of Illinois, for one (Kuck, Lawrie, Padua, et al). Or Ken Kennedy at Rice. Or Ron Cytron at IBM-Yorktown Heights. They would probably dispute this "well known" statement. gaxpy: given x in Rn, y in Rn and A in Rmxn, the following algorithm computes z=y+Ax. Where I came from, the convention was that an m*n matrix had m rows and n columns, and if x is in Rn and A is in Rm*n, then A*x is in Rm. In that case, "z=y+Ax" above makes no sense. (Also, I've seen the problem referred to as SAXPY, for the single-precision BLAS routine.) Now, note that this algorithm [omitted] requires most rapid changes in indexing occurring in COLUMNS, not rows! Then don't use THIS algorithm; use ANOTHER algorithm. Carefully distinguish "a problem" from "an algorithm". The problem is a goal; the algorithm is one means to that end. You're trying to solve a problem; any particular algorithm is simply one approach to the problem, to be discarded when conditions change. The first algorithm proposed is suited for FORTRAN; devise one more suited for C. Just optimizing matrix multiply for one given machine is hard. For a thorough job, I would suppose you should consider vector register length, number of registers, total cache size, cache line length, page size, memory latency, and function unit overlap, just for starters. No doubt there are other factors I haven't thought of. A very-high-performance supercomputer SAXPY should be in a hand-tuned library; the average expert is not familiar enough with the problem to code an excellent solution. -- Tim McDaniel Applied Dynamics Int'l.; Ann Arbor, Michigan, USA Work phone: +1 313 973 1300 Home phone: +1 313 677 4386 Internet: mcdaniel@adi.com UUCP: {uunet,sharkey}!amara!mcdaniel
bliss@sp64.csrd.uiuc.edu (Brian Bliss) (02/06/91)
So reverse the row/column index, and #define elmt(array, row, column) array[column][row] bb