mark@mips.com (Mark G. Johnson) (02/02/91)
Here is a short C program that runs 3 times faster on a SS1+ than on a SS2. Mesured with /bin/time, Compiled with SPARCstation 2 SPARCstation 1+ -------------------------------------------------------------------- cc -O prog.c -lm 29.3 user sec 9.5 user sec /* ************* code follows *************** */ #include <stdio.h> #include <math.h> /* compute the first 1000 digits of PI = 4arctan(1) */ main() { long d = 4, r = 10000, n = 251, m = 3.322*n*d; long i, j, k, q; static long a[3340]; for (i = 0; i <= m; i++) a[i] = 2; a[m] = 4; for (i = 1; i <= n; i++) { q = 0; for (k = m; k > 0; k--) { a[k] = a[k]*r+q; q = a[k]/(2*k+1); a[k] -= (2*k+1)*q; q *= k; } a[0] = a[0]*r+q; q = a[0]/r; a[0] -= q*r; printf("%04d%s", q, i & 7 ? " " : "\n"); } } /* ************* end of code *************** */ The reason for this unexpected slowdown is rather obscure: the subroutines that implement multiplication and division get placed in a very unlucky spot in the SS2's cache. In the SS1 they get plopped in a less dangerous area. This can be seen by having the compiler and/or the OS move the multiplication and division subroutines to new positions. Or, by finagling the *other* subroutines (e.g. fp math) that might get in the way of the mult and div subroutines. When you do this, the SS2 becomes faster than the SS1+. Compiled with SPARCstation 2 SPARCstation 1+ -------------------------------------------------------------------- cc -O prog.c -lm 29.3 user sec 9.5 user sec cc -O -Bstatic prog.c -lm 5.2 user sec 8.9 user sec cc -O prog.c 5.5 user sec 9.5 user sec Mark Johnson MIPS Computer Systems, 930 E. Arques M/S 2-02, Sunnyvale, CA 94086 (408) 524-8308 mark@mips.com {or ...!decwrl!mips!mark}
stpeters@dawn.crd.ge.com (Dick St.Peters) (02/19/91)
In article <1548@brchh104.bnr.ca> mark@mips.com (Mark G. Johnson) writes: > >Here is a short C program that runs 3 times faster on a SS1+ than on a >SS2. Well, it may run faster on Mark's SS1+, but when I just tried it, it sure ran a *lot* faster on a 2 (5.5u) than a 1+ (59.8u). Those are time for the program exactly as he posted it, compiled with the standard cc, also just as he posted. However, Mark does seem to be onto something. Compiling the program in a variety of ways, I did get an enormous range of run times on both machines. Dick St.Peters, GE Corporate R&D, Schenectady, NY stpeters@crd.ge.com uunet!crd.ge.com!stpeters