[comp.lang.c] Column-wise data storage for matrices in C, any advantage

fangchin@elaine46.stanford.edu (Chin Fang) (02/02/91)

Greetings,
 
I recently started thinking whether it is possible to take advantages many
numerical liner algebra algorithms to their fullest extend in C, and if so,
how to come up a clean and hard-limit free implementation of it.

Let's use 2d matrices for discussion for now:

as is well know among numerical computing people that matrix algorithms 
have to be "gaxpy" (defined below) rich to gain high performance:

gaxpy: given x in Rn, y in Rn and A in Rmxn, the following algorithm
       computes z=y+Ax.

       function: z=gaxpy(A,x,y)
             n=cols(A); z=y
             for j=1:n
                 z=z+x(j)A(:,j) 
             end
       end gaxpy

I used Cleve Moler's Matlab language for the above pseudo-code for compactness.

Now, note that this algorithm requires most rapid changes in indixing occuring
in COLUMNS, not rows!

I believe in default C design, data is stored in rows (discussed in K&R II)
instead in columns.  It is almost very common these days people use 
array of pointers to array as a way to simulate 2d matrices so that 
(1) no requirement to declare array dimensions beforehand,
(2) flexibility in passing "matrices" to functions    
(3) allowing index offsetting if that's desired.

However, if a so declared matrix is used in gaxpy, the memory traffic caused
by pointer jumping (figuratively speaking) for a large matrix is significant
and I have a strong suspicision that such traffic would slow down one's code.

Note, however, FORTRAN store data in columns, so gaxpy is perfectly suitable
for codes coded in Fortran and compilied with a good FORTRAN compiler.

Since this group is not for numerical computations, I won't get in depth why
gaxpy type algorithms are much perferred than others.  For details, please
see:
 
Gene H. Golub & Charles F. Fan Loan, Matrix Computations, 2nd, 1989,
                                     The Johns Hopkins University Press.

The leading author is The no. 1 person in the world now in this field and
is the head of Stanford's CS dept. Numerical Computations group.  BTW, in
his group, Fortran is THE language of choice, though not eveyone likes it.

Please no flame regarding above comments.  I am not interested in C vs. F
I would like to know, assuming the way C stores it's data is a deficiency
from the point of view of numerical computations, can someone come up a 
good, hard-limit free way as flexible as the conventional method yet provides
more efficiency?  Or, if you DON'T think the way data stored in C is a 
deficiency, what is your justifications?  Any hard proofs?

Please email me if possible if you like to respond.  I will, after three
weeks or so, post a summary if there is enough interest.

Let me emphasize again, your rationale, experimental results, hardware
plateforms (CISC or RISC), running conditions (all in core or swapping 
occured), OS used, and a simple example of how you implement a matrix
to make it more efficient than the conventional way mentioned above
all are welcomed, preferrably all included in your response.  I would
imagine it would be quite difficult to judge a claim without all of these
available.

Best Regards and Look forward to hearing from you people
 
Chin Fang
Mechanical Engineering Department
Stanford University
fangchin@portia.stanford.edu

willing@pantheon.seas.ucla.edu (Scott Willingham) (02/02/91)

In article <1991Feb1.214342.4982@portia.Stanford.EDU> fangchin@elaine46.stanford.edu (Chin Fang) writes:
>...
>Now, note that this algorithm requires most rapid changes in indixing occuring
>in COLUMNS, not rows!
>
>I believe in default C design, data is stored in rows (discussed in K&R II)
>instead in columns.  It is almost very common these days people use 
>array of pointers to array as a way to simulate 2d matrices so that 
...

It seems to me that this is just a matter of syntax.  Whether you assign
the first or second _index_ of a C array to correspond to a matrix row
or column is arbitrary.  You just must be consistent in your usage
throughout the algorithm.

-- Scott D. Willingham

fangchin@portia.Stanford.EDU (Chin Fang) (02/02/91)

In article <1748@lee.SEAS.UCLA.EDU> willing@pantheon.seas.ucla.edu (Scott Willingham) writes:
>In article <1991Feb1.214342.4982@portia.Stanford.EDU> fangchin@elaine46.stanford.edu (Chin Fang) writes:
>>...
>>Now, note that this algorithm requires most rapid changes in indixing occuring
>>in COLUMNS, not rows!
>>
>...
>
>It seems to me that this is just a matter of syntax.  Whether you assign
>the first or second _index_ of a C array to correspond to a matrix row
>or column is arbitrary.  You just must be consistent in your usage
>throughout the algorithm.
>
Scott, yes. if you are willing to accept row index as column index.  You 
can use gaxpy ops. But, what a contortion!

In addition, for people not aware the importance of gaxpy ops, each 
"seemly" trivial column element access in fact involves a long ptr jump
equal the length of entire row. ie. when they use row index as row index,
col index as col index.

To me, reversing index would fall into the catagory of "kuldge".  But you 
are right, you can do gaxpy ops with such matrix declarations.  Remember I 
didn't say in my 1st posting that it couldn't be done.

In fact, the reason that I posted the question because I used the "kludge"
way of using matrices and got teased by my fellow FORTRAN programmers and 
so far I haven't come up any less kludgy way yet.

Finally, I want to make it clear that when I said most rapid changes in 
indixing occuring in COLUMNS, I meant that the ops "marching" in columns,
not rows.  The index used in the sentence didn't mean array element index!
I was using MATLAB syntax, not C.

Regards,

Chin Fang
Mechanical Engineering Department
Stanford University
fangchin@portia.stanford.edu

sarima@tdatirv.UUCP (Stanley Friesen) (02/03/91)

In article <1991Feb1.214342.4982@portia.Stanford.EDU> fangchin@elaine46.stanford.edu (Chin Fang) writes:
>Now, note that this algorithm requires most rapid changes in indixing occuring
>in COLUMNS, not rows!
>
>I believe in default C design, data is stored in rows (discussed in K&R II)
>instead in columns. ...
>However, if a so declared matrix is used in gaxpy, the memory traffic caused
>by pointer jumping (figuratively speaking) for a large matrix is significant

Good grief!!  The answer is really quite simple.  Store the 'transpose' of
the external matrix in the C matrix!  That is reverse the order of the
indexes, storing the 'real' columns in C 'rows'.   The whole thing is just
a linguistic convention anyway - if you define the last index as refering
to columns instead of rows, the columns are now magically varying the fastest.

In this mode it is only during input and output that you do the nasty pointer
jumping. And I/O delays will dominate over any inefficiencies there, so it
doesn't really matter as much as during calculations.
-- 
---------------
uunet!tdatirv!sarima				(Stanley Friesen)

willing@pantheon.seas.ucla.edu (Scott Willingham) (02/03/91)

In article <1991Feb2.071930.9879@portia.Stanford.EDU> fangchin@portia.Stanford.EDU (Chin Fang) writes:
>In article <1748@lee.SEAS.UCLA.EDU> willing@pantheon.seas.ucla.edu (Scott Willingham) writes:
>>In article <1991Feb1.214342.4982@portia.Stanford.EDU> fangchin@elaine46.stanford.edu (Chin Fang) writes:
>>>...
>>>Now, note that this algorithm requires most rapid changes in indixing occuring
>>>in COLUMNS, not rows!
>>>
>>...
>>
>>It seems to me that this is just a matter of syntax.  Whether you assign
>>the first or second _index_ of a C array to correspond to a matrix row
>>or column is arbitrary.  You just must be consistent in your usage
>>throughout the algorithm.
>>
>Scott, yes. if you are willing to accept row index as column index.  You 
>can use gaxpy ops. But, what a contortion!
>
>In addition, for people not aware the importance of gaxpy ops, each 
>"seemly" trivial column element access in fact involves a long ptr jump
>equal the length of entire row. ie. when they use row index as row index,
>col index as col index.

Why do you insist on naming C array indices as row & column indices?
They are simply 1st, 2nd, 3rd, etc. indices.  Their syntax makes sense
in the context of a computer language's design.  I agree that it is
convenient for a programming language's systax to correspond directly
to common mathematical notation, but I wouldn't call it a kludge to
be otherwise.  Further, I don't consider it all that confusing -- and
most mathematicians seem to adapt to many notational changes much better
than I do.  I would suggest that you stay with such notation and be sure
to comment your code well.

Then again, maybe FORTRAN is better suited to your programming needs.
However, it has its own set of "inconveniences".  It seems to me
that a translation of a mathematical algorithm to computer code
requires an intimate familiarity with both notations, no matter
what language is chosen.

Maybe some other net contributers can come up with a more creative
solution that I have suggested.  I wish you good luck on your project.

-- Scott D. Willingham

fangchin@portia.Stanford.EDU (Chin Fang) (02/03/91)

In article <15054@smoke.brl.mil> gwyn@smoke.brl.mil (Doug Gwyn) writes:
>In article <1991Feb2.071930.9879@portia.Stanford.EDU> fangchin@portia.Stanford.EDU (Chin Fang) writes:
>>"seemly" trivial column element access in fact involves a long ptr jump
>>equal the length of entire row.
>
>If you are concerned about such things, you should traverse the elements
>in an order more suited to the implementation language.  Many experienced
>C programmers would in fact convert some array operations to use one-
>dimensional traversals with access via an appropriately incremented
>pointer, rather than using [] operators.

Doug, you might think my question is trival and I didn't do my homework.
I typed in the following code segment (errors may exist however) to show 
that I am aware what you are talking about, if not in depth.  My question is not
related the capability of C whatsoever.  Maybe I didn't make my wording 
right in my 1st question.  The thing is, sometimes we have our preferences
(could be unreasonable ones too, like some people insist ident N blanks).

I happen to be uneasy with switching indices, and have been trying to use 
the following scheme to rid of my own uneasiness.  (sort of like Numerical
Recipes authors like to use non-zero offsets).  From my own point of view,
the scheme below works but probably not efficient.  That's why I posted my
1st question to see if there are better solutions.  Perhaps I should have 
included the attachment below the first time.

Well, if you can forgive my idiosyncrasy..

Yeah, you don't like N.R.'s way of indexing, I guess you will blame me 
for the same reason. Sigh....

Best Regards,

Chin Fang
Mechanical Engineering Department
Stanford University
fangchin@portia.stanford.edu

**************** How I hide interchanged row, cols **************

#typedef struct {int nrl, nrh, ncl, nch;
                 DATA **m} * Matrix;

Matrix Matrix_alloc(int nrl, int nrh, int ncl, int nch)
{   
    /* routine modified from Numerical Recipes utility matrix() */

    int i;
    Matrix m;

    if(nrl > nrh || ncl > nch) errormsg("bounds are wrong in allocation!");

    /* no direct malloc() here, get memory from a free block cache using */
    /* routine memory_alloc()                                            */

    m=(DATA **)memory_alloc((nch-ncl+1)*sizeof(DATA *));
    assert(m);
    m-=ncl;

    for(i=ncl;i<=nch;i++){
	m[i]=(DATA *)memory_alloc((nrh-nrl+1)*sizeof(DATA));
	assert(m[i]);
	m[i]-=nrl;
    }
    return m;
}

#ifdef MATRIX_DEBUG

DATA bound_checked_access(Matrix m,int row, int col)
{
    int WRONG=FALSE;

    if(row <m->ncl||row >m->nch||col <m->nrl||col >m->nrh) WRONG = TRUE;
    if(WRONG) errormsg("input bounds wrong in matrix access!");
    return m->DATA[col][row]; /* we switch indices here so user won't know */
}

#endif

#ifdef MATRIX_DEBUG

    DATA bound_checked_access(Matrix, int, int);
    #define M(m,row,col) bound_checked_access(m,row,col)

#else

    #define M(m,row,col)   m->DATA[col-m->nrl][row-m->ncl]

#endif

mcdaniel@adi.com (Tim McDaniel) (02/05/91)

In article <1991Feb1.214342.4982@portia.Stanford.EDU>
fangchin@elaine46.stanford.edu (Chin Fang) writes:

   as is well know among numerical computing people that matrix algorithms 
   have to be "gaxpy" (defined below) rich to gain high performance:

Well, talk to the Center for Supercomputing Research and Development
at the University of Illinois, for one (Kuck, Lawrie, Padua, et al).
Or Ken Kennedy at Rice.  Or Ron Cytron at IBM-Yorktown Heights.  They
would probably dispute this "well known" statement.

   gaxpy: given x in Rn, y in Rn and A in Rmxn, the following algorithm
          computes z=y+Ax.

Where I came from, the convention was that an m*n matrix had m rows
and n columns, and if x is in Rn and A is in Rm*n, then A*x is in Rm.
In that case, "z=y+Ax" above makes no sense.  (Also, I've seen the
problem referred to as SAXPY, for the single-precision BLAS routine.)

   Now, note that this algorithm [omitted] requires most rapid changes
   in indexing occurring in COLUMNS, not rows!

Then don't use THIS algorithm; use ANOTHER algorithm.  Carefully
distinguish "a problem" from "an algorithm".  The problem is a goal;
the algorithm is one means to that end.  You're trying to solve a
problem; any particular algorithm is simply one approach to the
problem, to be discarded when conditions change.  The first algorithm
proposed is suited for FORTRAN; devise one more suited for C.

Just optimizing matrix multiply for one given machine is hard.  For a
thorough job, I would suppose you should consider vector register
length, number of registers, total cache size, cache line length, page
size, memory latency, and function unit overlap, just for starters.
No doubt there are other factors I haven't thought of.

A very-high-performance supercomputer SAXPY should be in a hand-tuned
library; the average expert is not familiar enough with the problem to
code an excellent solution.

--
Tim McDaniel                 Applied Dynamics Int'l.; Ann Arbor, Michigan, USA
Work phone: +1 313 973 1300                        Home phone: +1 313 677 4386
Internet: mcdaniel@adi.com                UUCP: {uunet,sharkey}!amara!mcdaniel

bliss@sp64.csrd.uiuc.edu (Brian Bliss) (02/06/91)

 So reverse the row/column index, and 

 #define  elmt(array, row, column)	array[column][row]

 bb