[comp.sys.mips] optimizations, funny results

mcdonald@aries.scs.uiuc.edu (Doug McDonald) (02/11/90)

>In article <4616@brazos.Rice.edu> [of comp.lang.misc]
                           preston@titan.rice.edu (Preston Briggs) writes:
>>I'm still hoping to see somebody recode matrix multiply in C, using 
>>all the source hacks they want, so we can compare it to what we get
>>using Fortran and our (not mine, friends') latest transformations.
>>
>>We'd run it on a MIPS M/120, with 32 integer registers and 16 fp regs,
>
>>The Fortran source is
>>
>>	subroutine mm(A, B, C, n)
>>	do 1 i = 1, n
>>	  do 1 j = 1, n
>>	    A(i, j) = 0.0
>>	    do 1 k = 1, n
>>1	      A(i, j) = A(i, j) + B(i, k) * C(k, j)
>>	end

Which, of course, is not legal Fortran, being missing the declarations
for a, b, and c.

I took the following programs

        double precision a(200,200), b(200,200), c(200,200)
        do 1 i=1,200
          do 2 j=1,200
          b(i,j) = j
          c(i,j) = i
2         continue
1       continue
        call mm(a,b,c,200)
        end

	subroutine mm(A, B, C, n)
        double precision a(n,n), b(n,n), c(n,n), t
	do 1 i = 1, n
	  do 2 j = 1, n
	    t = 0.0
	    do 3 k = 1, n
	      t = t + B(i, k) * C(k, j)
              A(i,j) = t
3           continue 
2         continue
1       continue
	end

*******************************************************************************

double a[200][200], b[200][200], c[200][200];

main()
{
int i,j;

    for(i = 0; i < 200; i++) {
        for(j = 0; j < 200; j++){
          b[i][j] = j;
          c[i][j] = i;
        }
    }
    mm(a,b,c,200);
}

mm(a, b, c,  n)
double *a, *b, *c;
int n;
{
double t;
double *bb, *cc;
int i, j, k;

    for(i = 0; i < n; i++) {
       for(j = 0; j < n ;j++) {
	    t = 0;
            bb = b + n*i;
            cc = c + j;
            for ( k = 0; k < n; k++) {
                t += *bb++ * *cc;
                cc += n;
            }
            a[i*n + j] = t;
       }
    } 
}

and ran them, once on my Dell 310 (MicroWay NDPC and NDPFortran,
-n2 -n3 -OLM -on switches, and three times each on a MIPS 120,
-O3 switch. Here are the results, in stopwatch time (the three tries were
identical on the MIPS):

               Fortran                     C
Dell          1:23                        1:25          (minutes:seconds)
MIPS          1:33                        0:48

Something else must have been running on the MIPS - it is actually
about 2.5 times faster than a Dell 310. 

Surprised? I sure am. Why should the MIPS Fortran results be so slow?
I am cross posting this to comp.sys.mips - maybe somebody there
can figure it out.

Doug McDonald

preston@titan.rice.edu (Preston Briggs) (02/11/90)

In article <1990Feb10.160605.25254@ux1.cso.uiuc.edu> mcdonald@aries.scs.uiuc.edu (Doug McDonald) writes:

>	subroutine mm(A, B, C, n)
>        double precision a(n,n), b(n,n), c(n,n), t
>	do 1 i = 1, n
>	  do 2 j = 1, n
>	    t = 0.0
>	    do 3 k = 1, n
>	      t = t + B(i, k) * C(k, j)
>              A(i,j) = t
>3           continue 
>2         continue
>1       continue
>	end

>Surprised? I sure am. Why should the MIPS Fortran results be so slow?

The Fortran source is incorrect.  Correct is:

	subroutine mm(A, B, C, n)
        double precision a(n,n), b(n,n), c(n,n), t
	do 1 i = 1, n
	  do 2 j = 1, n
	    t = 0.0
	    do 3 k = 1, n
	      t = t + B(i, k) * C(k, j)
3           continue 
            A(i,j) = t
2         continue
1       continue
	end

Note the interchange of the store to A(i,j) and the continue.

Preston