[comp.sys.sun] Example pgm that is 3X slower on SS2 than on SS1+

mark@mips.com (Mark G. Johnson) (02/02/91)

Here is a short C program that runs 3 times faster on a SS1+ than on a
SS2.  Mesured with /bin/time,

    Compiled with               SPARCstation 2     SPARCstation 1+
 --------------------------------------------------------------------
    cc -O prog.c -lm             29.3 user sec       9.5 user sec


/* ************* code follows *************** */
#include <stdio.h>
#include <math.h>
/* compute the first 1000 digits of PI = 4arctan(1) */
main()
{
  long d = 4, r = 10000, n = 251, m = 3.322*n*d;
  long i, j, k, q;
  static long a[3340];

  for (i = 0; i <= m; i++) a[i] = 2;
  a[m] = 4;

  for (i = 1; i <= n; i++) {
    q = 0;
    for (k = m; k > 0; k--) {
      a[k] = a[k]*r+q;
      q = a[k]/(2*k+1);
      a[k] -= (2*k+1)*q;
      q *= k;
    }
    a[0] = a[0]*r+q;
    q = a[0]/r;
    a[0] -= q*r;
    printf("%04d%s", q, i & 7 ? "  " : "\n");
  }
}
/* ************* end of code *************** */

The reason for this unexpected slowdown is rather obscure: the subroutines
that implement multiplication and division get placed in a very unlucky
spot in the SS2's cache.  In the SS1 they get plopped in a less dangerous
area.

This can be seen by having the compiler and/or the OS move the
multiplication and division subroutines to new positions.  Or, by
finagling the *other* subroutines (e.g. fp math) that might get in the way
of the mult and div subroutines.  When you do this, the SS2 becomes faster
than the SS1+.

    Compiled with               SPARCstation 2     SPARCstation 1+
 --------------------------------------------------------------------
    cc -O prog.c -lm             29.3 user sec       9.5 user sec
    cc -O -Bstatic prog.c -lm     5.2 user sec       8.9 user sec
    cc -O prog.c                  5.5 user sec       9.5 user sec

Mark Johnson	
 	MIPS Computer Systems, 930 E. Arques M/S 2-02, Sunnyvale, CA 94086
	(408) 524-8308    mark@mips.com  {or ...!decwrl!mips!mark}

stpeters@dawn.crd.ge.com (Dick St.Peters) (02/19/91)

In article <1548@brchh104.bnr.ca> mark@mips.com (Mark G. Johnson) writes:
>
>Here is a short C program that runs 3 times faster on a SS1+ than on a
>SS2.

Well, it may run faster on Mark's SS1+, but when I just tried it, it sure
ran a *lot* faster on a 2 (5.5u) than a 1+ (59.8u).

Those are time for the program exactly as he posted it, compiled with the
standard cc, also just as he posted.

However, Mark does seem to be onto something.  Compiling the program in a
variety of ways, I did get an enormous range of run times on both
machines.

Dick St.Peters, GE Corporate R&D, Schenectady, NY
stpeters@crd.ge.com	uunet!crd.ge.com!stpeters