aburto@marlin.NOSC.MIL (Alfred A. Aburto) (09/23/88)
Guess I goofed it up !!! Previous posting was NSIEVE results of course. Al Aburto aburto@marlin.nosc.mil.UUCP
aburto@marlin.NOSC.MIL (Alfred A. Aburto) (09/23/88)
--------- I Goofed up the previous posting!!!! Should have been NSIEVE results. Al Aburto aburto@marlin.nosc.mil.UUCP
aburto@marlin.NOSC.MIL (Alfred A. Aburto) (09/23/88)
---------- These are the NSIEVE (Sieve Of Eratosthenes) results I have at this time. I have also updated NSIEVE.c. Added 'free(ptr)' to the SIEVE() routine. The program was not freeing allocated memory previously. Added error checks based on the number of primes found for each array size. Program will not bomb if 'malloc()' returns null pointer. Also added timer routine for Microsoft C. I didn't change the Unix timing routines as I think it is probably better to have the user confirm/input the right 'HZ' values and this is usually in the 'times()' documentation file. Also while <sys/param.h> should contain the right 'HZ' or 'COUNTS' values this may not always be the case (neither HZ or COUNTS were defined in our system so I had to input it anyway). Sorry about the 'Primes/sec' output but some people seem to prefer this over just the RunTime output. So anyway there is a 'Primes/sec' output now (calculated as Primes/sec = 1899 / ( Average RunTime(sec) ) ). I'll repost NSIEVE week. NSIEVE (Scaled to 10 Iterations): Array Size --------------------RunTime(sec)---------------------------- (Bytes) 1 2 3 4 5 6 Amdahl Amdahl McCray MIPS McCray Sun 3/280 5890 5890-300E Amd 29000 R2000 AMD 29000 68020 (gcc) (cc) BTC ON M/120 BTC OFF (cc) 8191 0.033 0.050 0.116 0.130 0.183 0.267 10000 0.050 0.083 0.150 0.150 0.200 0.300 20000 0.117 0.133 0.300 0.320 0.450 0.650 40000 0.200 0.300 0.616 0.630 0.900 1.333 80000 0.483 0.683 1.233 1.270 1.816 2.917 160000 1.200 1.533 2.633 2.580 3.833 7.833 320000 2.583 3.333 5.300 5.570 7.680 17.600 Average RunTime With Respect to the 8191 size array: 0.049 0.067 0.126 0.131 0.185 0.315 Primes/sec: 38755 28343 15071 14496 10265 6029 Array Size ----------------------RunTime(sec)------------------------------ (Bytes) 7 8 9 10 11 VAX 8600 Turbo-Amiga Amiga Z-248 Z-248 (12.5 MHz) (14.32 MHz) (7.16 MHz) (8.00 MHz) (8.00 MHz) 68020 68000 80286 80286 (small) (huge) 8191 0.267 0.480 2.297 4.830 5.660 10000 0.383 0.582 2.801 5.930 6.970 20000 0.800 1.180 5.699 12.030 14.170 40000 1.767 2.359 11.539 24.380 28.670 80000 3.800 4.820 23.340 ------ ------ 160000 8.167 9.726 47.180 ------ ------ 320000 17.733 19.660 95.262 ------ ------ Average RunTime With Respect to the 8191 size Array: 0.362 0.489 2.362 4.902 5.761 Primes/sec: 5245 3883 804 387 330 (1) Amdahl 5890, Using GCC (compiled with 'gcc -S -O -DUNIX nsieve.c'). From Chuck Simmons at Amdahl, Sunnyvale CA. (2) Amdahl 5890-300E, SYS V Unix, cc -O nsieve.c From Chuck Simmons at Amdahl, Sunnyvale CA. (3) AMD 29000 at 25 MHz. Branch Target Cache (BTC) was ON. Metaware High C 29000 V2.1 with -O option. No effective memory wait states. Memory was all physical (i.e., No cacheing). From Trevor Marshall, BIX 'supermicros/bench #925', 07 Sep 1988. (4) MIPS R2000 in M/120, 16.7 MHz, 128K Cache, low-latency memory system. From John Mashey at MIPS, Sunnyvale CA. (5) AMD 29000 at 25 MHz. Branch Target Cache (BTC) was OFF. Metaware High C 29000 V2.1 with -O option. No effective memory wait states. Memory was all physical (i.e., No Cacheing). (6) SUN 3/280, 68020 at 25 MHz. Compiled with 'cc -O nsieve.c'. The ICache was ON. (7) VAX 8600, 12.5 MHz. Compiled with 'cc -O nsieve.c'. (8) Amiga with 68020 at 14.32 MHz, 32-bit memory at 14.32 MHz. Compiled with Manx Aztec C V3.4B using 'cc +2 +L +ff nsieve.c'. The ICache was ON. (9) Amiga with 68000 at 7.16 MHz, 16-bit memory at 7.16 MHz. Compiled with Manx Aztec C V3.4B using 'cc +L +ff nsieve.c'. (10) Zenith Z-248, 80286 at 8.00 MHz. Turbo C with 'small' option set. Compiled for 'speed'. Used Registers, register optimization, and jump optimization. (11) Zenith Z-248, 80286 at 8.00 MHz. Turbo C V1.0 'huge' option set. Compiled for 'speed', used registers, register optimization, and jump optimization. Al Aburto. aburto@marlin.nosc.mil.UUCP 'ala' on BIX
aburto@marlin.NOSC.MIL (Alfred A. Aburto) (05/18/89)
---------- NSIEVE C and Assembly Results For Various Array Sizes (17 May 1989). NSIEVE is the SIEVE program, but run with various array sizes (up to 320K using malloc() ). All Results Scaled to 10 Iterations. The last column in the table below shows the run time also scaled with respect to the 8191 array size. There is interesting information in these results. Primarily differences due to Cache Type, size, and main memory speed are noted as we go to the larger array sizes (things we don't see with the standard Sieve and Dhrystone results). ---------- Array Size (KiloBytes) ------------ Average 8.191K 20K 40K 80K 160K 320K WRT SYSTEM -------------- Run Time (sec) ---------------- 8191 Byte Array. 1 Amdahl 5890 0.033 0.117 0.200 0.483 1.200 2.583 0.049 2 Amdahl 5890-300E 0.050 0.133 0.300 0.683 1.533 3.333 0.067 3 McCray AMD29000 0.116 0.300 0.616 1.233 2.633 5.300 0.126 4 MIPS R2000 0.130 0.320 0.630 1.270 2.580 5.570 0.131 5 68020 25.0MHz Assem 0.149 0.368 0.808 1.632 3.300 6.663 0.160 6 Sun 4/280 0.117 0.350 0.717 1.433 3.567 10.200 0.162 7 Sun 4/280 0.133 0.350 0.700 1.417 3.583 10.250 0.164 8 Sun 4/280 0.133 0.350 0.700 1.433 3.583 10.200 0.164 9 McCray AMD29000 0.183 0.450 0.900 1.816 3.833 7.680 0.185 10 Sun 4/110 (01) 0.183 0.450 0.983 1.933 3.950 8.017 0.195 11 Sun 4/110 (01) 0.183 0.483 0.967 1.967 3.967 8.033 0.197 12 HP 9000/370 0.180 0.460 0.940 1.940 4.540 9.560 0.205 13 Sun 4/110 (02) 0.200 0.533 1.100 2.217 4.517 9.183 0.224 14 Sun 4/280 0.200 0.517 1.100 2.267 5.381 13.217 0.241 15 HP 9000/350 0.260 0.640 1.300 2.640 5.340 10.760 0.267 16 68020 14.3MHz Assem 0.260 0.642 1.410 2.850 5.758 11.632 0.279 17 Sun 3/280 0.250 0.650 1.300 2.967 7.800 17.400 0.313 18 Sun 3/60 0.333 0.833 1.700 3.450 6.917 13.933 0.347 19 VAX 8600 0.255 0.788 1.778 3.850 8.150 17.883 0.353 20 Amiga w/LUCAS 68020 0.354 0.868 1.752 3.540 7.160 14.500 0.360 21 VAX 8600 0.283 0.782 1.778 3.800 8.233 18.400 0.361 22 Sun 3/280 0.317 0.783 1.567 3.600 8.383 18.750 0.362 23 Amiga w/LUCAS 68020 0.372 0.902 1.826 3.700 7.460 15.140 0.376 24 Amiga w/LUCAS 68020 0.436 1.072 2.164 4.379 8.820 17.758 0.444 25 Sun 3/50 (01) 0.450 1.117 2.233 4.533 9.200 18.633 0.461 26 Sun 3/50 (02) 0.450 1.083 2.250 4.567 9.217 18.600 0.462 27 Sun 386i/250 (01) 0.450 1.150 2.317 4.750 9.650 19.800 0.478 28 Sun 386i/250 (02) 0.450 1.133 2.367 4.783 9.800 19.983 0.481 29 Amiga 2000 w/CSA020 0.470 1.162 2.348 4.760 9.580 19.300 0.481 30 Sun 386i/250 (03) 0.450 1.167 2.383 4.850 9.883 20.233 0.486 31 Amiga 2500 w/CBM020 0.480 1.277 2.480 4.961 10.062 21.156 0.510 32 Sun 386i/250 (02) 0.500 1.250 2.550 5.200 10.617 21.683 0.522 33 Sun 386i/250 (04) 0.533 1.350 2.850 5.733 11.883 24.167 0.576 34 Amiga 2000 w/CSA020 0.695 1.742 3.477 7.055 14.234 28.719 0.714 35 Micronics 80386 1.310 3.350 6.760 ------ ------ ------ 1.354 36 Micronics 80386 1.370 3.300 6.810 ------ ------ ------ 1.357 37 Micronics 80386 1.380 3.510 7.090 ------ ------ ------ 1.415 38 Micronics 80386 1.370 3.520 7.080 ------ ------ ------ 1.426 39 Micronics 80386 1.590 3.850 7.850 ------ ------ ------ 1.576 40 Micronics 80386 1.540 3.900 7.860 ------ ------ ------ 1.580 41 Micronics 80386 1.710 4.230 8.570 ------ ------ ------ 1.725 42 Micronics 80386 1.710 4.290 8.620 ------ ------ ------ 1.734 43 Micronics 80386 2.090 5.270 10.660 ------ ------ ------ 2.148 44 IBM XT w/Hauppauge 2.300 5.820 11.750 ------ ------ ------ 2.223 45 Micronics 80386 2.300 5.650 11.480 ------ ------ ------ 2.317 46 Amiga 2000 2.297 5.699 11.539 23.340 47.180 95.262 2.362 47 Zenith Z-248 4.830 12.030 24.380 ------ ------ ------ 4.902 48 Zenith Z-248 5.600 13.900 28.170 ------ ------ ------ 5.669 (1) Amdahl 5890, Using GCC (compiled with 'gcc -S -O -DUNIX nsieve.c'). From Chuck Simmons, 24 Aug 1988. (2) Amdahl 5890-300E, SYS V Unix, compiled with 'cc -O -DUNIX nsieve.c'. From Chuck Simmons, 03 Sep 1988. (3) AMD 29000 at 25 MHz. Branch Target Cache (BTC) was ON. Metaware High C 29000 V2.1 with -O option. No effective memory wait states (There was one wait state apparently, but it was 'hidden' by the pre-fetching). Memory was all physical (i.e., No Cacheing). From Trevor Marshall, BIX 'supermicros/bench #925', 07 Sep 1988. (4) MIPS R2000 in M/120, 16.7 MHz, 128K Cache, low-latency memory system. From John Mashey at MIPS, Sunnyvale CA, 25 Aug 1988. (5) This is the Amiga 14.32 MHz Assembly Result scaled to 25 MHz. There are for sure 25, 33, and 40 MHz 68020 systems, but I have not as yet obtained any real results from these systems. The results will of course depend upon the Cache type and size as well as the main memory speed. YARC Systems I know makes a 40 MHz co-processor board for the PC-AT type systems. (6) Sun 4/280 SPARC, 16.67 MHz MB86900 CPU. SunOS Release 4.0. This system had a 128 KByte virtual address, write-back data, and instruction Cache. Compiled with 'cc -O1 -DUNIX nsieve.c -o nsieve'. Note that performance drops off for the 320K array size. This system must use relatively slow main memory. (7) Sun 4/280 SPARC, 16.67 MHz MB86900 CPU. SunOS Release 4.0. It had a 128 KBytes virtual address, write-back data, and instruction Cache. Compiled with 'cc -O2 -DUNIX nsieve.c -o nsieve'. (8) Sun 4/280 SPARC, 16.67 MHz MB86900 CPU. SunOS Release 4.0. It had a 128 KByte virtual address, write-back data, and instruction Cache. Compiled with 'cc -O3 -DUNIX nsieve.c -o nsieve'. (9) AMD 29000 at 25 MHz. Branch Target Cache (BTC) was OFF. Metaware High C 29000 V2.1 with -O option. No effective memory wait states (There was one wait state apparently, but it was 'hidden' by the pre-fetching). Memory was all physical (i.e., No Cacheing). (10) Sun 4/110 SPARC (01), 14.28 MHz MB86900 CPU, SunOS Release 4.0. System had no specific cache memory. It used Static-Column DRAM. Compiled with 'cc -O2 -DUNIX nsieve.c -o nsieve'. This was Sun 4/110 System 01 that I tested (SATURN). This system outperformed the Sun 4/280 (which has faster clock speed and large Cache) for the larger array sizes ... tells us something about using SCRAM vice cache and slow main memory. (11) Sun 4/110 SPARC (01), 14.28 MHz MB86900 CPU, SunOS Release 4.0. System had no specific cache memory. It used Static-Column DRAM. Compiled with 'cc -O1 -DUNIX nsieve.c -o nsieve'. This was Sun 4/110 System 01 that I tested (SATURN) (12) HP 9000/370, MC68030 CPU, 33.00 MHz. I assume the DCache and ICache were ON. Don't know what the Cache size was. From Bo Thide' Swedish Institute of Space Physics, Uppsala Sweden, 31 Jan 1989. UUCP: !enea!kuling!irfu!bt (13) Sun 4/110 SPARC (01), 14.28 MHz MB86900 CPU, SunOS Release 4.0. System had no specific cache memory. It used Static-Column DRAM. Compiled with 'cc -O1 -DUNIX nsieve.c -o nsieve'. This was Sun 4/110 System 02 that I tested (MARS). (14) Sun 4/280 SPARC, 16.67 MHz MB86900 CPU. SunOS Release 4.0. It had a 128 KByte virtual address, write-back data, and instruction Cache. Compiled with 'cc -DUNIX nsieve.c -o nsieve'. (15) HP 9000/350, MC68020 CPU, 25.00 MHz. I assume the ICache was ON. Don't know what the Cache size was. From Bo Thide' Swedish Institute of Space Physics, Uppsala Sweden, 31 Jan 1989. UUCP: !enea!kuling!irfu!bt (16) Amiga with CSA 68020 (14.32 MHz), 512K of 32-bit memory (14.32 MHz). The 68020 internal chip ICache was ON. No external Cache. RAM was no Wait Static RAM. Hand Optimized Assembly version. Optimized the Sieve array preset loop. Removed all CMP's. Minimized the number of Branching instructions. Removed all extended addressing mode instructions. No loop unrolling was done. Otherwise this Assembly version works just like the original Sieve.c. (17) Sun 3/280, 68020 at 25 MHz. Compiled with 'cc -O -DUNIX nsieve.c'. The 68020 ICache was ON. The Sun 3/280 has a 64 KByte virtual address, write-back Cache. (18) Sun 3/60, 68020 at 20 MHz. This system had no Cache. Compiled with 'cc -O1 -DUNIX nsieve.c -o nsieve'. (19) VAX 8600, 12.5 MHz. Compiled with 'cc -O -DUNIX nsieve.c -o nsieve'. The VAX 8600 also has a 64KByte Cache (I think). (20) Amiga with PD LUCAS 68020 board running at 20.00 MHz with one wait state 32-bit memory. ICache was ON. Using Manx Aztec C V3.6a with cc +L +ff nsieve.c. From Brad Fowles (March 89). (21) VAX 8600, 12.5 MHz. Compiled with 'cc -DUNIX nsieve.c -o nsieve'. The VAX 8600 also has a 64KByte Cache (I think). (22) Sun 3/280, 68020 at 25 MHz. Compiled with 'cc -DUNIX nsieve.c'. The 68020 internal ICache was ON. The Sun 3/280 has a 64KByte Cache. (23) Amiga PD 68020 LUCAS board running at 16.00 MHz using Manx Aztec C V3.6a with cc +L +ff nsieve.c. ICache was on. From Brad Fowles. (24) Amiga PD 68020 LUCAS board running at 14.32 MHz using Manx Aztec C V3.6a with cc +L +ff nsieve.c. ICache was ON. From Brad Fowles. (25) Sun 3/50, 68020 at 15 MHz. Compiled with 'cc -O1 -DUNIX nsieve.c'. The 68020 internal ICache was ON. This system had no external Cache. This was the the first Sun 3/50 tested (VENUS). Sun UNIX 4.2 Release 3.4. (26) Sun 3/50, 68020 at 15 MHz. Compiled with 'cc -O1 -DUNIX nsieve.c'. The 68020 internal ICache was ON. This system had no external Cache. This was the the second Sun 3/50 tested (MERCURY). Sun UNIX 4.2 Release 3.4. (27) Sun 386i/250 with 25 MHz 80386 CPU. XP Cache Memory. Compiled with 'cc -O -DUNIX nsieve.c -o nsieve'. SunOS Release 4.0.1. This was Sun 386i/250 system 01 tested (PLUTO). (28) Sun 386i/250 with 25 MHz 80386 CPU. XP Cache Memory. Compiled with 'cc -O -DUNIX nsieve.c -o nsieve'. SunOS Release 4.0.1. This was SUn 386i/250 system 02 tested (RIGEL). (29) Amiga with 68020 at 14.32 MHz, 32-bit memory at 14.32 MHz. Compiled with Manx Aztec C V3.4B using 'cc +2 +L +ff nsieve.c'. The ICache was ON. Used a Computer System Associates (CSA) 68020 CPU board and 512K 32-bit Static RAM board. (30) Sun 386i/250 with 25 MHz 80386 CPU. XP Cache Memory. Compiled with 'cc -O -DUNIX nsieve.c -o nsieve'. SunOS Release 4.0.1. This was Sun 386i/250 system 03 tested (NEMESIS). (31) Amiga 2500 with CBM 2620 68020 coprocessor board (14.32 MHz). ICache was ON. Manx Aztec C V3.6a using 'cc +2 +L +ff nsieve'. From Brad Fowles (March 1989). (32) Sun 386i/250 with 25 MHz 80386 CPU. XP Cache Memory. Compiled with 'cc -DUNIX nsieve.c -o nsieve'. SunOS Release 4.0.1. This was Sun 386i/250 system 02 tested (RIGEL). (33) Sun 386i/250 with 25 MHz 80386 CPU. XP Cache Memory. Compiled with 'cc -O -DUNIX nsieve.c -o nsieve'. SunOS Release 4.0.1. This was Sun 386i/250 system 04 tested (URANUS). (34) Amiga with 68020 at 14.32 MHz, 32-bit memory at 14.32 MHz. Compiled with Manx Aztec C V3.4B using 'cc +2 +L +ff nsieve.c'. The ICache was OFF. CSA 68020 Coprocessor board with 512k 32-bit RAM board. (35) Micronics 20.00 MHz 80386 AT-Clone. 2 MegaBytes of 80 nsec 32-bit RAM. Microsoft C Version 5.10. 'Medium' model. Max Optimization: [/Oailt /Gs]. From Mike Slifcak. Note MS C did not generate specific optimized 80386 Code! See Sun 386i results for 80386 code results. (36) Micronics 20.00 MHz 80386 AT-Clone. 2 MegaBytes of 80 nsec 32-bit RAM. Microsoft C Version 5.10. 'Small' model. Max Optimization: [/Oailt /Gs]. From Mike Slifcak. Note: MSC V5.10 did NOT generate specific 80386 code or optimizations! (37) Micronics 20.00 MHz 80386 AT-Clone. 2 MegaBytes of 80 nsec 32-bit RAM. Microsoft C Version 5.10. 'Compact' model. Max Optimization: [/Oailt /Gs]. From Mike Slifcak. Note: MSC V5.10 did NOT generate specific 80386 code or optimizations! (38) Micronics 20.00 MHz 80386 AT-Clone. 2 MegaBytes of 80 nsec 32-bit RAM. Microsoft C Version 5.10. 'Large' model. Max Optimization: [/Oailt /Gs]. From Mike Slifcak. Note: MSC V5.10 did NOT generate specific 80386 code or optimizations. (39) Micronics 20.00 MHz 80386 AT-Clone. 2 MegaBytes of 80 nsec 32-bit RAM. Microsoft C Version 5.10. 'Medium' model. No Optimization. From Mike Slifcak. Note: MSV V5.10 did NOT generate specific 80386 code. (40) Micronics 20.00 MHz 80386 AT-Clone. 2 MegaBytes of 80 nsec 32-bit RAM. Microsoft C Version 5.10. 'Small' model. No Optimization. From Mike Slifcak. Note: MSC V5.10 did NOT generate specific 80386 code. (41) Micronics 20.00 MHz 80386 AT-Clone. 2 MegaBytes of 80 nsec 32-bit RAM. Microsoft C Version 5.10. 'Compact' model. No Optimization. From Mike Slifcak. Note: MSC V5.10 did NOT generate specific 80386 code. (42) Micronics 20.00 MHz 80386 AT-Clone. 2 MegaBytes of 80 nsec 32-bit RAM. Microsoft C Version 5.10. 'Large' model. No Optimization. From Mike Slifcak. Note: MSC V5.10 did NOT generate specific 80386 code. (43) Micronics 20.00 MHz 80386 AT-Clone. 2 MegaBytes of 80 nsec 32-bit RAM. Microsoft C Version 5.10. 'Huge' model. No Optimization. From Mike Slifcak. Note: MSC V5.10 did NOT generate specific 80386 code. (44) IBM XT with Hauppauge 80386 MotherBoard. 386 Modular BIOS V3.03a. 16 MHz 80386 with 16 MHz 32-bit RAM. Turbo C V1.0. 'Medium' Model with Max Optimization. Turbo C V1.0 did not generate 80386 specific code! (45) Micronics 20.00 MHz 80386 AT-Clone. 2 MegaBytes of 80 nsec 32-bit RAM. Microsoft C Version 5.10. 'Huge' model. Max Optimization: [/Oailt /Gs]. From Mike Slifcak. Note: MSC V5.10 did NOT generate specific 80386 code or optimizations. (46) Amiga with 68000 at 7.16 MHz, 16-bit memory at 7.16 MHz. Compiled with Manx Aztec C V3.4B using 'cc +L +ff nsieve.c'. (47) Zenith Z-248, 80286 at 8.00 MHz. Turbo C V1.0 with 'small' option set. Compiled for 'speed'. Used Registers, register optimization, and jump optimization. (48) Zenith Z-248, 80286 at 8.00 MHz. Turbo C V1.0 'huge' option set. Compiled for 'speed', used registers, register optimization, and jump optimization. Al Aburto. aburto@marlin.nosc.mil.UUCP 'ala' on BIX