[comp.benchmarks] Sun weirdness.

juan@burdell.gatech.edu (Juan Orlandini) (12/18/90)

Can anyone tell me why the following program takes only .9 user seconds
an .1 sys seconds on a sun 4/370 and 5.0 user seconds and 39.0 sys seconds
on sparc 1's (1, 1+, IPC)?

---------------------------- cut here -----------------------------
#include <stdio.h>

main()
{
    int    num_times,row1,col2,col1,num_cols1,num_cols2,num_rows;
    float  mat1[4][4],mat2[4][4],res[4][4];
    float  temp;
    
    num_rows = 4;
    num_cols1 = 4;
    num_cols2 = 4;
    for(num_times=0;num_times<10000;num_times++)
    {
      for(row1=0;row1<num_rows;row1++)
      {
        for(col2=0;col2<num_cols2;col2++)
        {
            temp = 0;
            for(col1=0;col1<num_cols1;col1++)
            {
              temp = temp + mat1[row1][col1]*mat2[col1][col2];
            }
            res[row1][col2] = temp;
        }
    }
      }
}
            

------------------------ cut here too ------------------------------


Thanks,

Juan


=======================================================================
Juan Orlandini                /// "Whe have not inherited this      ///
Super User At Large          ///  earth from our parents, but      ///
College of Computing     \\\///   rather borrowed it from our  \\\///
juan@cc.gatech.edu        \XX/    children." -- Unknown         \XX/
=======================================================================

juan@burdell.gatech.edu (Juan Orlandini) (12/18/90)

As a follow up, we compiled the code on the sparc1 an tested it on the 4/370
and it took 44.0 seconds (most of it system) to run. So we compiled it again
on the 4/370 and ran this on the sparc1's, and presto it ran in .9 seconds
real time. Then we did a diff on the compiler binaries and libraries and they
turned out to be the same. What's going on?

Juan

=======================================================================
Juan Orlandini                /// "Whe have not inherited this      ///
Super User At Large          ///  earth from our parents, but      ///
College of Computing     \\\///   rather borrowed it from our  \\\///
juan@cc.gatech.edu        \XX/    children." -- Unknown         \XX/
=======================================================================

wallace@math.ksu.edu (Wallace Bow) (12/18/90)

In article <MCCALPIN.90Dec17134832@pereland.cms.udel.edu> mccalpin@perelandra.cms.udel.edu (John D. McCalpin) writes:
>> On 17 Dec 90 17:19:00 GMT, juan@burdell.gatech.edu (Juan Orlandini) said:
>>
>> Can anyone tell me why the following program takes only .9 user
>> seconds an .1 sys seconds on a sun 4/370 and 5.0 user seconds
>> and 39.0 sys seconds on sparc 1's (1, 1+, IPC)?
>
>I would guess that it is spending all of its time in a floating-point
>exception handler, since the total time is cut to 2.9 seconds (on a
>Sparc I, no optimization) if you initialize the two arrays before
>using them!

	You guess correctly.  Look at this:

---------------------------------------------------------------------------

Script started on Mon Dec 17 18:16:34 1990
debbie:/home/debbie/wjbow>cat tempfile.c
#include <stdio.h>
#include <floatingpoint.h>

int boom()
{
 printf("Game over!\n");
 exit(-1);
}

main()
{
    int    num_times,row1,col2,col1,num_cols1,num_cols2,num_rows;
    float  mat1[4][4],mat2[4][4],res[4][4];
    float  temp;

    ieee_handler("set","all",boom);

    num_rows = 4;
    num_cols1 = 4;
    num_cols2 = 4;

    for(num_times=0;num_times<10000;num_times++)
    {
      for(row1=0;row1<num_rows;row1++)
      {
        for(col2=0;col2<num_cols2;col2++)
        {
            temp = 0;
            for(col1=0;col1<num_cols1;col1++)
            {
              temp = temp + mat1[row1][col1]*mat2[col1][col2];
            }
            res[row1][col2] = temp;
        }
      }
    }
}
debbie:/home/debbie/wjbow>junk
Game over!
debbie:/home/debbie/wjbow>exit

---------------------------------------------------------------------------

ieee_handler(3M) catches the exception.  See the man page on how to use it
and put it in all of your floating point programs.  It can save you hours
of time with dbx.


--
Wallace J. Bow Jr., wjbow@gorman.cs.sandia.gov

sig .sig .sig .sig .sig .sig .sig .sig .sig .sig .sig .sig .sig .sig .sig 
sig .sig .sig .sig .sig .sig .sig .sig .sig .sig .sig .sig .sig .sig .sig 

tim@proton.amd.com (Tim Olson) (12/18/90)

In article <605@mephisto.edu> juan@burdell.gatech.edu (Juan Orlandini) writes:
| 
| Can anyone tell me why the following program takes only .9 user seconds
| an .1 sys seconds on a sun 4/370 and 5.0 user seconds and 39.0 sys seconds
| on sparc 1's (1, 1+, IPC)?
| 
| ---------------------------- cut here -----------------------------
| #include <stdio.h>
| 
| main()
| {
|     int    num_times,row1,col2,col1,num_cols1,num_cols2,num_rows;
|     float  mat1[4][4],mat2[4][4],res[4][4];
|     float  temp;
|     
|     num_rows = 4;
|     num_cols1 = 4;
|     num_cols2 = 4;
|     for(num_times=0;num_times<10000;num_times++)
|     {
|       for(row1=0;row1<num_rows;row1++)
|       {
|         for(col2=0;col2<num_cols2;col2++)
|         {
|             temp = 0;
|             for(col1=0;col1<num_cols1;col1++)
|             {
|               temp = temp + mat1[row1][col1]*mat2[col1][col2];
|             }
|             res[row1][col2] = temp;
|         }
|     }
|       }
| }
|             

It's probably because the arrays mat1 and mat2 are local to the
function main(), but are never initialized.  Therefore, they can
contain garbage.  The large amount of system time you see is probably
the Sparcstation taking FP traps on invalid or denormalized inputs to
fix up the output per IEE-754.  I'm not sure about the 4/370, but it
may handle some of these in hardware, or it may just not have the same
random garbage on the stack.

If you move the declaration of mat1 and mat2 out of main, so they are
global and get initialized to zero (luckily, the bss zero is the same
as a single-precision 0.0, here), then you get closer to the time you
expect:

	1.6u 0.0s 0:01 100% 0+116k 4+0io 3pf+0w (on a Sparcstation 1)


This is a graphic example of why benchmarks should be self-checking,
or at least produce some output that can be checked by hand....

--
	-- Tim Olson
	Advanced Micro Devices
	(tim@amd.com)