juan@burdell.gatech.edu (Juan Orlandini) (12/18/90)
Can anyone tell me why the following program takes only .9 user seconds an .1 sys seconds on a sun 4/370 and 5.0 user seconds and 39.0 sys seconds on sparc 1's (1, 1+, IPC)? ---------------------------- cut here ----------------------------- #include <stdio.h> main() { int num_times,row1,col2,col1,num_cols1,num_cols2,num_rows; float mat1[4][4],mat2[4][4],res[4][4]; float temp; num_rows = 4; num_cols1 = 4; num_cols2 = 4; for(num_times=0;num_times<10000;num_times++) { for(row1=0;row1<num_rows;row1++) { for(col2=0;col2<num_cols2;col2++) { temp = 0; for(col1=0;col1<num_cols1;col1++) { temp = temp + mat1[row1][col1]*mat2[col1][col2]; } res[row1][col2] = temp; } } } } ------------------------ cut here too ------------------------------ Thanks, Juan ======================================================================= Juan Orlandini /// "Whe have not inherited this /// Super User At Large /// earth from our parents, but /// College of Computing \\\/// rather borrowed it from our \\\/// juan@cc.gatech.edu \XX/ children." -- Unknown \XX/ =======================================================================
juan@burdell.gatech.edu (Juan Orlandini) (12/18/90)
As a follow up, we compiled the code on the sparc1 an tested it on the 4/370 and it took 44.0 seconds (most of it system) to run. So we compiled it again on the 4/370 and ran this on the sparc1's, and presto it ran in .9 seconds real time. Then we did a diff on the compiler binaries and libraries and they turned out to be the same. What's going on? Juan ======================================================================= Juan Orlandini /// "Whe have not inherited this /// Super User At Large /// earth from our parents, but /// College of Computing \\\/// rather borrowed it from our \\\/// juan@cc.gatech.edu \XX/ children." -- Unknown \XX/ =======================================================================
wallace@math.ksu.edu (Wallace Bow) (12/18/90)
In article <MCCALPIN.90Dec17134832@pereland.cms.udel.edu> mccalpin@perelandra.cms.udel.edu (John D. McCalpin) writes: >> On 17 Dec 90 17:19:00 GMT, juan@burdell.gatech.edu (Juan Orlandini) said: >> >> Can anyone tell me why the following program takes only .9 user >> seconds an .1 sys seconds on a sun 4/370 and 5.0 user seconds >> and 39.0 sys seconds on sparc 1's (1, 1+, IPC)? > >I would guess that it is spending all of its time in a floating-point >exception handler, since the total time is cut to 2.9 seconds (on a >Sparc I, no optimization) if you initialize the two arrays before >using them! You guess correctly. Look at this: --------------------------------------------------------------------------- Script started on Mon Dec 17 18:16:34 1990 debbie:/home/debbie/wjbow>cat tempfile.c #include <stdio.h> #include <floatingpoint.h> int boom() { printf("Game over!\n"); exit(-1); } main() { int num_times,row1,col2,col1,num_cols1,num_cols2,num_rows; float mat1[4][4],mat2[4][4],res[4][4]; float temp; ieee_handler("set","all",boom); num_rows = 4; num_cols1 = 4; num_cols2 = 4; for(num_times=0;num_times<10000;num_times++) { for(row1=0;row1<num_rows;row1++) { for(col2=0;col2<num_cols2;col2++) { temp = 0; for(col1=0;col1<num_cols1;col1++) { temp = temp + mat1[row1][col1]*mat2[col1][col2]; } res[row1][col2] = temp; } } } } debbie:/home/debbie/wjbow>junk Game over! debbie:/home/debbie/wjbow>exit --------------------------------------------------------------------------- ieee_handler(3M) catches the exception. See the man page on how to use it and put it in all of your floating point programs. It can save you hours of time with dbx. -- Wallace J. Bow Jr., wjbow@gorman.cs.sandia.gov sig .sig .sig .sig .sig .sig .sig .sig .sig .sig .sig .sig .sig .sig .sig sig .sig .sig .sig .sig .sig .sig .sig .sig .sig .sig .sig .sig .sig .sig
tim@proton.amd.com (Tim Olson) (12/18/90)
In article <605@mephisto.edu> juan@burdell.gatech.edu (Juan Orlandini) writes: | | Can anyone tell me why the following program takes only .9 user seconds | an .1 sys seconds on a sun 4/370 and 5.0 user seconds and 39.0 sys seconds | on sparc 1's (1, 1+, IPC)? | | ---------------------------- cut here ----------------------------- | #include <stdio.h> | | main() | { | int num_times,row1,col2,col1,num_cols1,num_cols2,num_rows; | float mat1[4][4],mat2[4][4],res[4][4]; | float temp; | | num_rows = 4; | num_cols1 = 4; | num_cols2 = 4; | for(num_times=0;num_times<10000;num_times++) | { | for(row1=0;row1<num_rows;row1++) | { | for(col2=0;col2<num_cols2;col2++) | { | temp = 0; | for(col1=0;col1<num_cols1;col1++) | { | temp = temp + mat1[row1][col1]*mat2[col1][col2]; | } | res[row1][col2] = temp; | } | } | } | } | It's probably because the arrays mat1 and mat2 are local to the function main(), but are never initialized. Therefore, they can contain garbage. The large amount of system time you see is probably the Sparcstation taking FP traps on invalid or denormalized inputs to fix up the output per IEE-754. I'm not sure about the 4/370, but it may handle some of these in hardware, or it may just not have the same random garbage on the stack. If you move the declaration of mat1 and mat2 out of main, so they are global and get initialized to zero (luckily, the bss zero is the same as a single-precision 0.0, here), then you get closer to the time you expect: 1.6u 0.0s 0:01 100% 0+116k 4+0io 3pf+0w (on a Sparcstation 1) This is a graphic example of why benchmarks should be self-checking, or at least produce some output that can be checked by hand.... -- -- Tim Olson Advanced Micro Devices (tim@amd.com)