[comp.arch] Nsieve Results

aburto@marlin.NOSC.MIL (Alfred A. Aburto) (09/23/88)

Guess I goofed it up !!!  Previous posting was NSIEVE results of course.

Al Aburto
aburto@marlin.nosc.mil.UUCP

aburto@marlin.NOSC.MIL (Alfred A. Aburto) (09/23/88)

---------

I Goofed up the previous posting!!!!  Should have been NSIEVE results.

Al Aburto
aburto@marlin.nosc.mil.UUCP

aburto@marlin.NOSC.MIL (Alfred A. Aburto) (09/23/88)

----------
These are the NSIEVE (Sieve Of Eratosthenes) results I have at this time.
I have also updated NSIEVE.c.  Added 'free(ptr)' to the SIEVE() routine.
The program was not freeing allocated memory previously.  Added error
checks based on the number of primes found for each array size.  Program
will not bomb if 'malloc()' returns null pointer.  Also added timer
routine for Microsoft C.  I didn't change the Unix timing routines as
I think it is probably better to have the user confirm/input the right
'HZ' values and this is usually in the 'times()' documentation file. Also
while <sys/param.h> should contain the right 'HZ' or 'COUNTS' values this
may not always be the case (neither HZ or COUNTS were defined in our
system so I had to input it anyway).  Sorry about the 'Primes/sec' output
but some people seem to prefer this over just the RunTime output.  So
anyway there is a 'Primes/sec' output now (calculated as
Primes/sec = 1899 / ( Average RunTime(sec) ) ). I'll repost NSIEVE week.

NSIEVE (Scaled to 10 Iterations):
Array Size   --------------------RunTime(sec)----------------------------
 (Bytes)       1         2         3         4         5           6
             Amdahl    Amdahl    McCray     MIPS     McCray    Sun 3/280
              5890    5890-300E Amd 29000   R2000   AMD 29000    68020
             (gcc)      (cc)     BTC ON     M/120    BTC OFF      (cc)

    8191      0.033     0.050     0.116     0.130     0.183       0.267
   10000      0.050     0.083     0.150     0.150     0.200       0.300
   20000      0.117     0.133     0.300     0.320     0.450       0.650
   40000      0.200     0.300     0.616     0.630     0.900       1.333
   80000      0.483     0.683     1.233     1.270     1.816       2.917
  160000      1.200     1.533     2.633     2.580     3.833       7.833
  320000      2.583     3.333     5.300     5.570     7.680      17.600

  Average RunTime With Respect to the 8191 size array:
              0.049     0.067     0.126     0.131     0.185       0.315
  Primes/sec:
              38755     28343     15071     14496     10265        6029



Array Size ----------------------RunTime(sec)------------------------------
 (Bytes)       7              8          9           10          11
            VAX 8600     Turbo-Amiga    Amiga       Z-248       Z-248
           (12.5 MHz)    (14.32 MHz)  (7.16 MHz)  (8.00 MHz)  (8.00 MHz)
                            68020       68000       80286       80286
                                                   (small)     (huge)
    8191      0.267         0.480       2.297       4.830       5.660
   10000      0.383         0.582       2.801       5.930       6.970
   20000      0.800         1.180       5.699      12.030      14.170
   40000      1.767         2.359      11.539      24.380      28.670
   80000      3.800         4.820      23.340      ------      ------
  160000      8.167         9.726      47.180      ------      ------
  320000     17.733        19.660      95.262      ------      ------

  Average RunTime With Respect to the 8191 size Array:
              0.362         0.489       2.362       4.902       5.761
  Primes/sec:
               5245          3883         804         387         330


 (1) Amdahl 5890, Using GCC (compiled with 'gcc -S -O -DUNIX nsieve.c').
     From Chuck Simmons at Amdahl,  Sunnyvale CA.

 (2) Amdahl 5890-300E, SYS V Unix, cc -O nsieve.c
     From Chuck Simmons at Amdahl,  Sunnyvale CA.

 (3) AMD 29000 at 25 MHz.  Branch Target Cache (BTC) was ON.  Metaware
     High C 29000 V2.1 with -O option. No effective memory wait states.
     Memory was all physical (i.e., No cacheing).
     From Trevor Marshall, BIX 'supermicros/bench #925', 07 Sep 1988.

 (4) MIPS R2000 in M/120, 16.7 MHz, 128K Cache, low-latency memory system.
     From John Mashey at MIPS, Sunnyvale CA.

 (5) AMD 29000 at 25 MHz.  Branch Target Cache (BTC) was OFF.  Metaware
     High C 29000 V2.1 with -O option. No effective memory wait states.
     Memory was all physical (i.e., No Cacheing).

 (6) SUN 3/280, 68020 at 25 MHz.  Compiled with 'cc -O nsieve.c'.  The
     ICache was ON.

 (7) VAX 8600, 12.5 MHz.  Compiled with 'cc -O nsieve.c'.

 (8) Amiga with 68020 at 14.32 MHz, 32-bit memory at 14.32 MHz.  Compiled
     with Manx Aztec C V3.4B using 'cc +2 +L +ff nsieve.c'.  The ICache
     was ON.

 (9) Amiga with 68000 at  7.16 MHz, 16-bit memory at  7.16 MHz.  Compiled
     with Manx Aztec C V3.4B using 'cc +L +ff nsieve.c'.

(10) Zenith Z-248, 80286 at 8.00 MHz.  Turbo C with 'small' option set.
     Compiled for 'speed'.  Used Registers, register optimization, and
     jump optimization.

(11) Zenith Z-248, 80286 at 8.00 MHz.  Turbo C V1.0 'huge' option set.
     Compiled for 'speed', used registers, register optimization, and jump
     optimization.

Al Aburto.
aburto@marlin.nosc.mil.UUCP
'ala' on BIX

aburto@marlin.NOSC.MIL (Alfred A. Aburto) (05/18/89)

----------

NSIEVE C and Assembly Results For Various Array Sizes (17 May 1989).
NSIEVE is the SIEVE program, but run with various array sizes (up to 320K
using malloc() ).

All Results Scaled to 10 Iterations.  The last column in the table below
shows the run time also scaled with respect to the 8191 array size.

There is interesting information in these results. Primarily differences
due to Cache Type, size, and main memory speed are noted as we go to the
larger array sizes (things we don't see with the standard Sieve and 
Dhrystone results).


                      ---------- Array Size (KiloBytes) ------------  Average
                      8.191K     20K     40K     80K    160K    320K  WRT
   SYSTEM             -------------- Run Time (sec) ----------------  8191 
                                                                      Byte
                                                                      Array.

 1 Amdahl 5890         0.033   0.117   0.200   0.483   1.200   2.583   0.049
 2 Amdahl 5890-300E    0.050   0.133   0.300   0.683   1.533   3.333   0.067
 3 McCray AMD29000     0.116   0.300   0.616   1.233   2.633   5.300   0.126
 4 MIPS R2000          0.130   0.320   0.630   1.270   2.580   5.570   0.131
 5 68020 25.0MHz Assem 0.149   0.368   0.808   1.632   3.300   6.663   0.160
 6 Sun 4/280           0.117   0.350   0.717   1.433   3.567  10.200   0.162
 7 Sun 4/280           0.133   0.350   0.700   1.417   3.583  10.250   0.164
 8 Sun 4/280           0.133   0.350   0.700   1.433   3.583  10.200   0.164
 9 McCray AMD29000     0.183   0.450   0.900   1.816   3.833   7.680   0.185
10 Sun 4/110 (01)      0.183   0.450   0.983   1.933   3.950   8.017   0.195
11 Sun 4/110 (01)      0.183   0.483   0.967   1.967   3.967   8.033   0.197
12 HP 9000/370         0.180   0.460   0.940   1.940   4.540   9.560   0.205
13 Sun 4/110 (02)      0.200   0.533   1.100   2.217   4.517   9.183   0.224
14 Sun 4/280           0.200   0.517   1.100   2.267   5.381  13.217   0.241
15 HP 9000/350         0.260   0.640   1.300   2.640   5.340  10.760   0.267
16 68020 14.3MHz Assem 0.260   0.642   1.410   2.850   5.758  11.632   0.279
17 Sun 3/280           0.250   0.650   1.300   2.967   7.800  17.400   0.313
18 Sun 3/60            0.333   0.833   1.700   3.450   6.917  13.933   0.347
19 VAX 8600            0.255   0.788   1.778   3.850   8.150  17.883   0.353
20 Amiga w/LUCAS 68020 0.354   0.868   1.752   3.540   7.160  14.500   0.360
21 VAX 8600            0.283   0.782   1.778   3.800   8.233  18.400   0.361
22 Sun 3/280           0.317   0.783   1.567   3.600   8.383  18.750   0.362
23 Amiga w/LUCAS 68020 0.372   0.902   1.826   3.700   7.460  15.140   0.376
24 Amiga w/LUCAS 68020 0.436   1.072   2.164   4.379   8.820  17.758   0.444
25 Sun 3/50  (01)      0.450   1.117   2.233   4.533   9.200  18.633   0.461
26 Sun 3/50  (02)      0.450   1.083   2.250   4.567   9.217  18.600   0.462
27 Sun 386i/250 (01)   0.450   1.150   2.317   4.750   9.650  19.800   0.478
28 Sun 386i/250 (02)   0.450   1.133   2.367   4.783   9.800  19.983   0.481
29 Amiga 2000 w/CSA020 0.470   1.162   2.348   4.760   9.580  19.300   0.481
30 Sun 386i/250 (03)   0.450   1.167   2.383   4.850   9.883  20.233   0.486
31 Amiga 2500 w/CBM020 0.480   1.277   2.480   4.961  10.062  21.156   0.510
32 Sun 386i/250 (02)   0.500   1.250   2.550   5.200  10.617  21.683   0.522
33 Sun 386i/250 (04)   0.533   1.350   2.850   5.733  11.883  24.167   0.576
34 Amiga 2000 w/CSA020 0.695   1.742   3.477   7.055  14.234  28.719   0.714
35 Micronics 80386     1.310   3.350   6.760  ------  ------  ------   1.354
36 Micronics 80386     1.370   3.300   6.810  ------  ------  ------   1.357
37 Micronics 80386     1.380   3.510   7.090  ------  ------  ------   1.415
38 Micronics 80386     1.370   3.520   7.080  ------  ------  ------   1.426
39 Micronics 80386     1.590   3.850   7.850  ------  ------  ------   1.576
40 Micronics 80386     1.540   3.900   7.860  ------  ------  ------   1.580
41 Micronics 80386     1.710   4.230   8.570  ------  ------  ------   1.725
42 Micronics 80386     1.710   4.290   8.620  ------  ------  ------   1.734
43 Micronics 80386     2.090   5.270  10.660  ------  ------  ------   2.148
44 IBM XT w/Hauppauge  2.300   5.820  11.750  ------  ------  ------   2.223
45 Micronics 80386     2.300   5.650  11.480  ------  ------  ------   2.317
46 Amiga 2000          2.297   5.699  11.539  23.340  47.180  95.262   2.362
47 Zenith Z-248        4.830  12.030  24.380  ------  ------  ------   4.902
48 Zenith Z-248        5.600  13.900  28.170  ------  ------  ------   5.669



 (1) Amdahl 5890, Using GCC (compiled with 'gcc -S -O -DUNIX nsieve.c').
     From Chuck Simmons, 24 Aug 1988. 

 (2) Amdahl 5890-300E, SYS V Unix, compiled with 'cc -O -DUNIX nsieve.c'.
     From Chuck Simmons, 03 Sep 1988. 
 
 (3) AMD 29000 at 25 MHz.  Branch Target Cache (BTC) was ON.  Metaware 
     High C 29000 V2.1 with -O option. No effective memory wait states
     (There was one wait state apparently, but it was 'hidden' by the
     pre-fetching).  Memory was all physical (i.e., No Cacheing).
     From Trevor Marshall, BIX 'supermicros/bench #925', 07 Sep 1988.

 (4) MIPS R2000 in M/120, 16.7 MHz, 128K Cache, low-latency memory system.
     From John Mashey at MIPS, Sunnyvale CA, 25 Aug 1988.

 (5) This is the Amiga 14.32 MHz Assembly Result scaled to 25 MHz.
     There are for sure 25, 33, and 40 MHz 68020 systems, but I have not
     as yet obtained any real results from these systems. The results
     will of course depend upon the Cache type and size as well as the main
     memory speed. YARC Systems I know makes a 40 MHz co-processor board
     for the PC-AT type systems.

 (6) Sun 4/280 SPARC, 16.67 MHz MB86900 CPU. SunOS Release 4.0. This system 
     had a 128 KByte virtual address, write-back data, and instruction Cache.
     Compiled with 'cc -O1 -DUNIX nsieve.c -o nsieve'.  Note that performance
     drops off for the 320K array size.  This system must use relatively slow
     main memory.

 (7) Sun 4/280 SPARC, 16.67 MHz MB86900 CPU. SunOS Release 4.0. It had a
     128 KBytes virtual address, write-back data, and instruction Cache.
     Compiled with 'cc -O2 -DUNIX nsieve.c -o nsieve'.

 (8) Sun 4/280 SPARC, 16.67 MHz MB86900 CPU. SunOS Release 4.0. It had a
     128 KByte virtual address, write-back data, and instruction Cache.
     Compiled with 'cc -O3 -DUNIX nsieve.c -o nsieve'.

 (9) AMD 29000 at 25 MHz.  Branch Target Cache (BTC) was OFF.  Metaware
     High C 29000 V2.1 with -O option. No effective memory wait states
     (There was one wait state apparently, but it was 'hidden' by the
     pre-fetching).  Memory was all physical (i.e., No Cacheing).

(10) Sun 4/110 SPARC (01), 14.28 MHz MB86900 CPU, SunOS Release 4.0.
     System had no specific cache memory.  It used Static-Column DRAM.
     Compiled with 'cc -O2 -DUNIX nsieve.c -o nsieve'.  This was Sun 4/110
     System 01 that I tested (SATURN). This system outperformed the
     Sun 4/280 (which has faster clock speed and large Cache) for the
     larger array sizes ... tells us something about using SCRAM vice
     cache and slow main memory.

(11) Sun 4/110 SPARC (01), 14.28 MHz MB86900 CPU, SunOS Release 4.0.
     System had no specific cache memory.  It used Static-Column DRAM.
     Compiled with 'cc -O1 -DUNIX nsieve.c -o nsieve'.  This was Sun 4/110
     System 01 that I tested (SATURN)

(12) HP 9000/370, MC68030 CPU, 33.00 MHz. I assume the DCache and ICache
     were ON.  Don't know what the Cache size was.  From Bo Thide'
     Swedish Institute of Space Physics, Uppsala Sweden, 31 Jan 1989.
     UUCP: !enea!kuling!irfu!bt

(13) Sun 4/110 SPARC (01), 14.28 MHz MB86900 CPU, SunOS Release 4.0.
     System had no specific cache memory.  It used Static-Column DRAM.
     Compiled with 'cc -O1 -DUNIX nsieve.c -o nsieve'.  This was Sun 4/110
     System 02 that I tested (MARS).

(14) Sun 4/280 SPARC, 16.67 MHz MB86900 CPU. SunOS Release 4.0. It had a
     128 KByte virtual address, write-back data, and instruction Cache.
     Compiled with 'cc -DUNIX nsieve.c -o nsieve'.

(15) HP 9000/350, MC68020 CPU, 25.00 MHz. I assume the ICache was ON.
     Don't know what the Cache size was.  From Bo Thide'
     Swedish Institute of Space Physics, Uppsala Sweden, 31 Jan 1989.
     UUCP: !enea!kuling!irfu!bt

(16) Amiga with CSA 68020 (14.32 MHz), 512K of 32-bit memory (14.32 MHz).
     The 68020 internal chip ICache was ON. No external Cache. RAM was no
     Wait Static RAM.  Hand Optimized Assembly version.  Optimized the
     Sieve array preset loop.  Removed all CMP's.  Minimized the number
     of Branching instructions. Removed all extended addressing mode
     instructions.  No loop unrolling was done.  Otherwise this Assembly
     version works just like the original Sieve.c.

(17) Sun 3/280, 68020 at 25 MHz.  Compiled with 'cc -O -DUNIX nsieve.c'.
     The 68020 ICache was ON. The Sun 3/280 has a 64 KByte virtual address,
     write-back Cache.

(18) Sun 3/60,  68020 at 20 MHz. This system had no Cache. Compiled with
     'cc -O1 -DUNIX nsieve.c -o nsieve'.

(19) VAX 8600, 12.5 MHz.  Compiled with 'cc -O -DUNIX nsieve.c -o nsieve'.
     The VAX 8600 also has a 64KByte Cache (I think).

(20) Amiga with PD LUCAS 68020 board running at 20.00 MHz with one wait
     state 32-bit memory. ICache was ON. Using Manx Aztec C V3.6a with
     cc +L +ff nsieve.c. From Brad Fowles (March 89).

(21) VAX 8600, 12.5 MHz.  Compiled with 'cc -DUNIX nsieve.c -o nsieve'.
     The VAX 8600 also has a 64KByte Cache (I think).

(22) Sun 3/280, 68020 at 25 MHz.  Compiled with 'cc -DUNIX nsieve.c'.
     The 68020 internal ICache was ON. The Sun 3/280 has a 64KByte Cache.

(23) Amiga PD 68020 LUCAS board running at 16.00 MHz using Manx Aztec C
     V3.6a with cc +L +ff nsieve.c. ICache was on. From Brad Fowles.

(24) Amiga PD 68020 LUCAS board running at 14.32 MHz using Manx Aztec C
     V3.6a with cc +L +ff nsieve.c. ICache was ON. From Brad Fowles.

(25) Sun 3/50,  68020 at 15 MHz. Compiled with 'cc -O1 -DUNIX nsieve.c'.
     The 68020 internal ICache was ON.  This system had no external
     Cache.  This was the the first Sun 3/50 tested (VENUS). Sun UNIX 4.2
     Release 3.4.

(26) Sun 3/50,  68020 at 15 MHz. Compiled with 'cc -O1 -DUNIX nsieve.c'.
     The 68020 internal ICache was ON.  This system had no external
     Cache.  This was the the second Sun 3/50 tested (MERCURY).
     Sun UNIX 4.2 Release 3.4.

(27) Sun 386i/250 with 25 MHz 80386 CPU. XP Cache Memory.  Compiled with
     'cc -O -DUNIX nsieve.c -o nsieve'.  SunOS Release 4.0.1. This was
     Sun 386i/250 system 01 tested (PLUTO).

(28) Sun 386i/250 with 25 MHz 80386 CPU. XP Cache Memory.
     Compiled with 'cc -O -DUNIX nsieve.c -o nsieve'. SunOS Release 4.0.1.
     This was SUn 386i/250 system 02 tested (RIGEL).

(29) Amiga with 68020 at 14.32 MHz, 32-bit memory at 14.32 MHz.  Compiled
     with Manx Aztec C V3.4B using 'cc +2 +L +ff nsieve.c'.  The ICache
     was ON. Used a Computer System Associates (CSA) 68020 CPU board and
     512K 32-bit Static RAM board.

(30) Sun 386i/250 with 25 MHz 80386 CPU. XP Cache Memory.  Compiled with
     'cc -O -DUNIX nsieve.c -o nsieve'.  SunOS Release 4.0.1.
     This was Sun 386i/250 system 03 tested (NEMESIS).

(31) Amiga 2500 with CBM 2620 68020 coprocessor board (14.32 MHz).
     ICache was ON. Manx Aztec C V3.6a using 'cc +2 +L +ff nsieve'.
     From Brad Fowles (March 1989).

(32) Sun 386i/250 with 25 MHz 80386 CPU. XP Cache Memory.
     Compiled with 'cc -DUNIX nsieve.c -o nsieve'. SunOS Release 4.0.1.
     This was Sun 386i/250 system 02 tested (RIGEL).

(33) Sun 386i/250 with 25 MHz 80386 CPU. XP Cache Memory.  Compiled with
     'cc -O -DUNIX nsieve.c -o nsieve'.  SunOS Release 4.0.1.
     This was Sun 386i/250 system 04 tested (URANUS).

(34) Amiga with 68020 at 14.32 MHz, 32-bit memory at 14.32 MHz.  Compiled
     with Manx Aztec C V3.4B using 'cc +2 +L +ff nsieve.c'.  The ICache
     was OFF. CSA 68020 Coprocessor board with 512k 32-bit RAM board.

(35) Micronics 20.00 MHz 80386 AT-Clone. 2 MegaBytes of 80 nsec 32-bit
     RAM.  Microsoft C Version 5.10. 'Medium' model. Max Optimization:
     [/Oailt /Gs]. From Mike Slifcak. Note MS C did not generate specific
     optimized 80386 Code! See Sun 386i results for 80386 code results.

(36) Micronics 20.00 MHz 80386 AT-Clone. 2 MegaBytes of 80 nsec 32-bit
     RAM.  Microsoft C Version 5.10. 'Small' model. Max Optimization:
     [/Oailt /Gs]. From Mike Slifcak. Note: MSC V5.10 did NOT generate
     specific 80386 code or optimizations!

(37) Micronics 20.00 MHz 80386 AT-Clone. 2 MegaBytes of 80 nsec 32-bit
     RAM.  Microsoft C Version 5.10. 'Compact' model. Max Optimization:
     [/Oailt /Gs]. From Mike Slifcak. Note: MSC V5.10 did NOT generate
     specific 80386 code or optimizations!

(38) Micronics 20.00 MHz 80386 AT-Clone. 2 MegaBytes of 80 nsec 32-bit
     RAM.  Microsoft C Version 5.10. 'Large' model. Max Optimization:
     [/Oailt /Gs]. From Mike Slifcak. Note: MSC V5.10 did NOT generate
     specific 80386 code or optimizations.

(39) Micronics 20.00 MHz 80386 AT-Clone. 2 MegaBytes of 80 nsec 32-bit
     RAM.  Microsoft C Version 5.10. 'Medium' model. No Optimization.
     From Mike Slifcak. Note: MSV V5.10 did NOT generate specific
     80386 code.

(40) Micronics 20.00 MHz 80386 AT-Clone. 2 MegaBytes of 80 nsec 32-bit
     RAM.  Microsoft C Version 5.10. 'Small' model. No Optimization.
     From Mike Slifcak. Note: MSC V5.10 did NOT generate specific
     80386 code.

(41) Micronics 20.00 MHz 80386 AT-Clone. 2 MegaBytes of 80 nsec 32-bit
     RAM.  Microsoft C Version 5.10. 'Compact' model. No Optimization.
     From Mike Slifcak. Note: MSC V5.10 did NOT generate specific
     80386 code.

(42) Micronics 20.00 MHz 80386 AT-Clone. 2 MegaBytes of 80 nsec 32-bit
     RAM.  Microsoft C Version 5.10. 'Large' model. No Optimization.
     From Mike Slifcak. Note: MSC V5.10 did NOT generate specific
     80386 code.

(43) Micronics 20.00 MHz 80386 AT-Clone. 2 MegaBytes of 80 nsec 32-bit
     RAM.  Microsoft C Version 5.10. 'Huge' model. No Optimization.
     From Mike Slifcak. Note: MSC V5.10 did NOT generate specific
     80386 code.

(44) IBM XT with Hauppauge 80386 MotherBoard. 386 Modular BIOS V3.03a.
     16 MHz 80386 with 16 MHz 32-bit RAM.  Turbo C V1.0. 'Medium' Model
     with Max Optimization. Turbo C V1.0 did not generate 80386 specific
     code!

(45) Micronics 20.00 MHz 80386 AT-Clone. 2 MegaBytes of 80 nsec 32-bit
     RAM.  Microsoft C Version 5.10. 'Huge' model. Max Optimization:
     [/Oailt /Gs]. From Mike Slifcak. Note: MSC V5.10 did NOT generate
     specific 80386 code or optimizations.

(46) Amiga with 68000 at  7.16 MHz, 16-bit memory at  7.16 MHz.  Compiled
     with Manx Aztec C V3.4B using 'cc +L +ff nsieve.c'.

(47) Zenith Z-248, 80286 at 8.00 MHz.  Turbo C V1.0 with 'small' option set.
     Compiled for 'speed'.  Used Registers, register optimization, and
     jump optimization.

(48) Zenith Z-248, 80286 at 8.00 MHz.  Turbo C V1.0 'huge' option set.
     Compiled for 'speed', used registers, register optimization, and jump
     optimization.

Al Aburto.
aburto@marlin.nosc.mil.UUCP
'ala' on BIX