caf@omen.UUCP (Chuck Forsberg WA7KGX) (11/25/86)
Siev and BCT Benchmarks: Comdex 1986 Edition The Siev benchmark has a new winner: the 25 mHz 68020 powered Definicon Systems ran siev in .34 seconds. This is more than twice as fast as the fastest 386 box in captivity, and mearly as fast as a bug mainframe. An interesting note is the Computer Dynamics 18 mHz 386 System (Intel Motherboard): The benchmark was compiled with both 8086 and 80286 code generation. The 86 siev object file was 113 bytes of executable code, and the 286 file was 108 bytes, a reduction of 5 per cent code space. Yet the 8086 code consistiently ran about four per cent faster than the more compact 286 code! It is also interesting (and disappointing) to note that the Intel 386 Motherboard is 50 per cent slower than an 8mHz AT clone when forced to use 16 bit memory. BCT additions: time bc <<f 2 ^ 4096 f Make bct executable. Clear the screen (no scrolling please). Then run it. Real Time System/comments (ws = wait state(s)) 0:03.6 Amdahl 580 3 users 2/86 ames!aurora!eugene 0:06.6 Gould UTX32 6/84 0:07.6 Vax 8600 running 4.3BSD Beta, 16 Feb 1986 0:09.5 u Sun 3/260 68020 "25 Mhz" BSD 4.3 Unix-EXPO 10/86 0:14.5 u HP 9000/840 Spectrum RISC HP-Unix Unix-EXPO 10/86 0:15 DG/UX 2.01, DG MV/10000SX, 8MB 0:15.7 u Compaq 386 80386 Xenix 5 Unix-EXPO 10/86 0:17.6 u Corvus 386 80386 Xenix 5 Unix-EXPO 10/86 0:31 QIClabs AT 10 mHz 0ws + UniPort Unix SYS V, 8/86 0:33 PC-AT 9.05 mHz + SCO SYS V Xenix 2/86 0:40 Computer Dynamics 386 18 mHz SCO SYS V Xenix 11/86 * * = User programs running from 16 bit AST Advantage! board; special 32 bit memory plug-ins are not yet available for the Intel 386 Motherboard. Sieve benchmark (Slightly modified from Byte Magazine version) 11-25-86 Chuck Forsberg Omen Technology Inc Modifications consist of placing the variables in the "best" place and shortening the variable names to make keyboarding easier. There have been slight differences in the variable names which should not affect the benchmark results. The correct answer is 1899 primes. The order of register declarations is important on some machines. This benchmark gives a shorter program and runs somewhat faster than the original Byte Magazine version due to the register declarations. Size in bytes refers to the main() function text (code) size only. A 32-bit version to use with 8086 type CPU's (to make a fair comparision with 68000 and VAX computers) is shown at the end. siev.c: #define S 8190 char f[S+1]; main() { /* register long i,p,k,c,n; For 32 bit entries for PC */ register int i,p,k,c,n; for (n = 1; n <= 10; n++) { c = 0; for (i = 0; i <= S; i++) f[i] = 1; for (i = 0; i <= S; i++) { if (f[i]) { p = i + i + 3; k = i + p; while (k <= S) { f[k] = 0; k += p; } c++; } } } printf("\n%d primes.\n", c); } Results "32 bit" systems Sorted by Execution real time Compile - Link Execute Real User Real User Bytes System 7.4 .8 .34 .3416 124 Definicom SYS 68020 25mHz SiVlly 11/86 1.8 .31 .37 .2 200 Ahmdal 470-V8 + UTS (160 users) % 25 1.3 0.8 .45 224 Gould SEL 32/87 (loaded) % 27 6.6 1.1 1.0 140 Charles Rivers + Unos % 22 1.8 2.0 1.2 144 Parallel 68k + Zenix # 9 2.4 2.3 2.3 140 SUN 68010 4.2 BSD 6/84 8.1 1.8 2.6 2.2 104 4.2 BSD VAX 11/780 18.2 4.4 2.7 2.5 72 H-P 32 bit mini 14.6 - 3.3 - 148 Sun Microsystems 68k 29.4 4.84 3.44 3.28 305 8mHz 0ws 1mb QIC-AT SCO -M2h 6/86 22 1.8 3.7 1.3 144 Parallel 68k + Zenix %# 26 8.7 5 4.9 148 Cosmos 10 mHz 31 5.5 5.0 4.5 148 Momentum Hawk 32 38 6.8 5.0 3.9 148 CYB Multibox (Sun Board) 17.5 5.0 5.8 5.7 267 9mHz PC-AT Zenix + huge model 5/85 41 7.3 6.0 5.2 148 Lisa + Unisoft 33 9.8 7.0 5.1 136 CIE 680/30 + Regulus 4.03 -L $ 15 3.8 9 6 88 VAX 11/730 + 4.1 BSD 46 2.8 9 7.7 142 CIE 680/20 + Regulus -L 45 8.2 9.0 8.2 128 NS16032 6mHz + "4.1 BSD" && 50 - 30 - 252 IBMPC DOS 2.1 Lattice 2.00 8/84 + Sorted by Compile/link real time Compile - Link Execute Real User Real User Bytes System 1.8 .31 .37 0.2 200 Ahmdal 470-V8 + UTS (160 users) % 7.4 .8 .34 .3416 124 Definicom SYS 68020 25mHz SiVlly 11/86 8.1 1.8 2.6 2.2 104 4.2 BSD VAX 11/780 9 2.4 2.3 2.3 140 SUN 68010 4.2 BSD 6/84 14.6 - 3.3 - 148 Sun Microsystems 68k 15 3.8 9 6 88 VAX 11/730 + 4.1 BSD 17.5 5.0 5.8 5.7 267 9mHz PC-AT Zenix + huge model 5/85 18.2 4.4 2.7 2.5 72 H-P 32 bit mini 22 1.8 2.0 1.2 144 Parallel 68k + Zenix # 22 1.8 3.7 1.3 144 Parallel 68k + Zenix %# 25 1.3 0.8 .45 224 Gould SEL 32/87 (loaded) % 26 8.7 5 4.9 148 Cosmos 10 mHz 27 6.6 1.1 1.0 140 Charles Rivers + Unos % 29.4 4.84 3.44 3.28 305 8mHz 0ws 1mb QIC-AT SCO -M2h 6/86 31 5.5 5.0 4.5 148 Momentum Hawk 32 33 9.8 7.0 5.1 136 CIE 680/30 + Regulus 4.03 -L $ 38 6.8 5.0 3.9 148 CYB Multibox (Sun Board) 41 7.3 6.0 5.2 148 Lisa + Unisoft 45 8.2 9.0 8.2 128 NS16032 6mHz + "4.1 BSD" && 46 2.8 9 7.7 142 CIE 680/20 + Regulus -L 50 - 30 - 252 IBMPC DOS 2.1 Lattice 2.00 8/84 + Results 8-16 bit systems Sorted by real execution time Compile/Link Execute Text (code) Real User Real User Bytes System - - 0.742 - 113 PC Limited 386 Xen/XC -M0 11/86 - - 0.745 - 113 Laser Pacer 386 Xen/XC -M0 11/86 - - 0.78 - 113 CompDyn 386 18mHz Xen/XC -M0 11/86 - - 0.81 - 108 CompDyn 386 18mHz Xen/XC -M2 11/86 - - 0.852 - 113 Data Bank 386 Xen/XC -M0 11/86 - - 0.895 - 113 Kaypro 386 Xen/XC -M0 11/86 - - 0.972 - 113 ALR 386 16mHz Xen/XC -M0 11/86 26.9 4.2 1.51 1.49 108 8mHz 0ws QIC/AT 1mb SCO SYS V -K % - - 1.64 - - Macrotech 7.159mHz 0ws DRC v1.11 5/85 12.4 5.4 1.88 1.85 108 9mHz PC-AT Xenix 1.00 -K % 15.6 5.4 1.9 1.9 113 9mHz PC-AT Xenix 1.00 18 1.1 2.0 1.5 96 11/70 + V7 (loaded) - - 2.1 - 113 PC-AT 8mHz Microsoft C 3.0 39 - 2.1 - 135 Macrotech 80286 6mHz MPM DRI C 1.11 26 3.7 2.1 2.0 110 Zilog Model 11 11.7 6.2 2.32 2.26 108 Comp.Dynamics 386 18 mHz Xenix 11/86!! 14.8 4.9 2.42 2.39 136 8mHz 0ws QIC/AT 2.5m Microport 1.32 6/86 16 3.3 2.7 2.2 96 Plexus P-25 5 mHz SYS III 1.1 - - 2.8 - 113 PC-AT 6mHz Microsoft C 3.0 - - 2.9 - 126 PC-ATx Mark Williams C DOS 3.0 8/84 @ 9 5.9 3.0 2.6 136 Intel 310 6 mHz 1ws 80286 + Xenix 6/84 - - 3.0 - 161 PC-ATx 8mHz C86 2.2h 12/84 21 6.4 3.2 2.8 98 Plexus P-40 4 mHz 21 - 3.5 - 192 PC-ATx 8mHz Aztec C 1.05i +MASM+LINK 9.3 - 3.73 - 135 PC-ATx DOS 3.0 Lattice 2 8/84 @ 14.4 - 3.73 - 135 PC-ATx DOS 3.0 Lattice 2 8/84 - - 4.22 - 161 PC-ATx DOS 3.0 C86 2.10j 8/84 @ 33 9.8 5.0 3.7 114 CIE 680/30 + Regulus 4.03 16 bit ints 105 - 5.9 - 126 NEC APC CP/M + CC86 one drive ^ 217 - 5.9 - 126 NEC APC CP/M + CC86 two drives ^ 31.7 - 6 - 126 Mark Williams C Sperry PC 7mHz 8/84 50 3.3 7.0 5.7 114 CIE 680/20 + Regulus 95 - 7.8 - 126 Control-C CC86 IBMPC CP/M-86 5"fd ^ - - 8.2 - 126 Mark Williams C IBM PC 8/84 @ 31.1 8.5 8.7 8.0 126 Coherent +IBMPC-XT 46 14.7 9 8.2 122 PC-XT Venix 6/84 49 16.7 9.4 7.9 117 PC-XT SCO XENIX 6/84 75 - 10 - 135 Zenith Z100 5"fd + Lattice C 1.01 ^ - - 10 - 194 Introl 6809 C 2mHz * 110 17.4 10.4 7.7 117 PC-XT 512kb SCO XENIX 11/84 82 - 11 - 135 IBMPC 5"fd PCDOS1.1 + Lattice 1.01 ^ 65 - - - 135 IBMPC 5"fd PCDOS2.0 + Lattice 1.01 & 18.4 - 11 - 135 IBMPC E-disk PCDOS1.1+JEL+Lattice 1.01 39 - 11.1 - 135 IBMPC DOS 2.1 Lattice 2 8/84 @ 32 13 11 .3 112 PC-XT PCIX 6/84 32 5.9 13 12 152 IBM 4954 (Series/1 middle) 37 - 13.4 - 161 IBMPC DOS 2.1 C86 2.10j 8/84 @ 35 - 13.6 - 165 IBMPC DOS 2.1 C86 2.07a @ 38 - 13.6 - 165 IBMPC DOS 2.1 C86 2.00a @ - - 14 - 244 Telecon C on 2mHz 6809 ** - - 22 - - CII C86 1.26 IBMPC CP/M-86 5"fd ^! 17.5 - 23.7 - - Televideo 820H +BDS C + L2 + EX ^ 19 - 31 - - Z89+ CDI MEG-6 +BDS C + L2 + EX ^ Sorted by real compile/link time Compile/Link Execute Text (code) Real User Real User Bytes System 9 5.9 3.0 2.6 136 Intel 310 6 mHz 1ws 80286 + Xenix 6/84 9.3 - 3.73 - 135 PC-ATx DOS 3.0 Lattice 2 8/84 @ 11.7 6.2 2.32 2.26 108 Comp.Dynamics 386 18 mHz Xenix 11/86!! 12.4 5.4 1.88 1.85 108 9mHz PC-AT Xenix 1.00 -K % 14.4 - 3.73 - 135 PC-ATx DOS 3.0 Lattice 2 8/84 14.8 4.9 2.42 2.39 136 8mHz 0ws QIC/AT 2.5m Microport 1.32 6/86 15.6 5.4 1.9 1.9 113 9mHz PC-AT Xenix 1.00 16 3.3 2.7 2.2 96 Plexus P-25 5 mHz SYS III 1.1 17.5 - 23.7 - - Televideo 820H +BDS C + L2 + EX ^ 18 1.1 2.0 1.5 96 11/70 + V7 (loaded) 18.4 - 11 - 135 IBMPC E-disk PCDOS1.1+JEL+Lattice 1.01 19 - 31 - - Z89+ CDI MEG-6 +BDS C + L2 + EX ^ 21 6.4 3.2 2.8 98 Plexus P-40 4 mHz 21 - 3.5 - 192 PC-ATx 8mHz Aztec C 1.05i +MASM+LINK 26 3.7 2.1 2.0 110 Zilog Model 11 26.9 4.2 1.51 1.49 108 8mHz 0ws QIC/AT 1mb SCO SYS V -K % 31.1 8.5 8.7 8.0 126 Coherent +IBMPC-XT 31.7 - 6 - 126 Mark Williams C Sperry PC 7mHz 8/84 32 13 11 .3 112 PC-XT PCIX 6/84 32 5.9 13 12 152 IBM 4954 (Series/1 middle) 33 9.8 5.0 3.7 114 CIE 680/30 + Regulus 4.03 16 bit ints 34 - 13.6 - 165 IBMPC DOS 2.1 C86 2.07a @ 37 - 13.4 - 161 IBMPC DOS 2.1 C86 2.10j 8/84 @ 38 - 13.6 - 165 IBMPC DOS 2.1 C86 2.00a @ 39 - 2.1 - 135 Macrotech 80286 6mHz MPM DRI C 1.11 39 - 11.1 - 135 IBMPC DOS 2.1 Lattice 2 @ 46 14.7 9 8.2 122 PC-XT Venix 6/84 49 16.7 9.4 7.9 117 PC-XT SCO XENIX 6/84 50 3.3 7.0 5.7 114 CIE 680/20 + Regulus 65 - - - 135 IBMPC 5"fd PCDOS2.0 + Lattice 1.01 & 75 - 10 - 135 Zenith Z100 5"fd + Lattice C 1.01 ^ 82 - 11 - 135 IBMPC 5"fd PCDOS1.1 + Lattice 1.01 ^ 95 - 7.8 - 126 Control-C CC86 IBMPC CP/M-86 5"fd ^ 105 - 5.9 - 126 NEC APC CP/M + CC86 one drive ^ 110 17.4 10.4 7.7 117 PC-XT 512kb SCO XENIX 11/84 217 - 5.9 - 126 NEC APC CP/M + CC86 two drives ^ Notes: $ Compiled with "register long" and -L option "for large programs". 16 bit integers vs. 32 bit pointers cause portability problems, especially with printf and scanf control strings. % Outer loop increased to 100 and result times adjusted because of fast execution times indicated. The faster 9 mHz PC-AT times were with an AST Advantage! board (1152k total) and some sticky bits set. !! User programs running from 16 bit memory (AST Advantage!) # Siginificant difference between real and user times not due to system load (user time suspect). && System still being developed. Compiles size dropped from 136 to 128 bytes and times by >40% during the Unicom show. Stay tuned ... ^ Real time measured from beginning of program execution to program finish (omitting loading and reboot time). & Compile/link times; sequenced by "batch" file; 20 buffers, NO verify. Verify adds about 15 seconds. ! A much faster code generator has been promised. * Data from compiler vendor re Byte Magazine version. **Data from compiler vendor re Byte Magazine version. A optimizer under development provides about 30% improvement in code density and execution speed. @ compiler/linker Executables on hard disk, other files on electronic disk IBM PC, DOS 2.1, Maynard WS-1 Hard Disk except for IBMPC-AT(extended) as shown. + Compiled with the Lattice 8086 D model (big data memory, <64k code), and 32 bit index variables. For comparision with 68000 based systems listed as 32 bit systems, as the 68000 compilers listed there use long integers and can address >64k without compiling in special modes. No electronic disk used. -------------- Comment ------------ These times are approximate and may improve as product development proceeds on the newer systems. They should be used as general information regarding the levels of performance possible and not in any specific purchasing decision without independent confirmation. It should be noted that the short loops in this program may penalize highly pipelined machines such as the 16032 more than other (more representative of normal usage) programs. An interesting note is the Computer Dynamics 18 mHz 386 System (Intel Motherboard): The benchmark was compiled with both 8086 and 80286 code generation. The 86 siev object file was 113 bytes of executable code, and the 286 file was 108 bytes, a reduction of 5 per cent code space. Yet the 8086 code consistiently ran about four per cent faster than the more compact 286 code! It is also interesting (and disappointing) to note that the Intel 386 Motherboard is 50 per cent slower than an 8mHz AT clone when forced to use 16 bit memory. The Regulus software is listed with different times in 16 and 32 bit categories because the compiler uses 16 bit integers and defaults to 16 bit addressing except for pointers. On some systems, there is a considerable discrepancy between the real and user times that cannot be explained by other demands on the system. Usually, the real and user times are nearly the same when a cpu bound program is run on an otherwise unloaded Unix(TM) system. The compile/link times are often more significant in predicting how responsive a system will be in a software development context. On BDS C, the variables are made externs to optimize 8080 execution speed and code density. BDS C lacks longs, floats, and some other aspects of C, but it produces reasonable code density and is an excellent compromise for the CP/M environment. Compile times were influenced by the structure of the compiler. The Unix(TM) compilers had up to 5 passes (preprocessor, c0, c1, c2, as) while Lattice and BDS C have but two passes to produce object code (BDS uses no intermediate file). Lattice, C86 and Coherent/CC86 bypass the assembly pass. The compile/link times are affected by the size of the library that must be scanned. This tends to penalize the more nearly complete implementations such as C86 and Coherent/Mark Williams. Some MS-DOS implementations place uninitialized externs in the .exe file. For example, the Lattice C v2.0 results in a 19584 byte .exe file for siev, while the C86 v2.10j .exe file is 9466 bytes! Unix, Zenix, VAX, et al. are trade-marks. From lanl-a!jlg Wed May 11 14:39:55 1983 Subject: sieve Newsgroups: net.micro The sieve program used in BYTE was not really a very good benchmark of larger machines. The problem is, there is too much stuff in the algorithm that is not necessary (ie. never used or printed). A good compiler on a large machine will probably 'optimize' all of this stuff out. The result is that the same algorithm is not performed on each machine. No one with access to both IBM and CRAY machines will really beleave that the IBM numbers in the January BYTE are correct. The CRAY fortran numbers (CFT is not very good at global optimization) are pretty accurate, also a hand coded assembly version of the algorithm (which implements the whole benchmark with nothing optimized out) beats the best IBM numbers by 50%. The advantage of vector arch.... This may not seem to be relevant to the discussion of micros, but the newer machines now and in the near future will be a lot more sophisticated than those most micro fans are familiar with. Be on the lookout for bad benchmarks! Mainframe people have had to face this problem many times. J.L. Giles (...!lasl-a!jlg) From ixn5h!dcn Thu May 12 05:48:56 1983 Subject: Re: Aztec C Review Newsgroups: net.micro.apple I decided to quantify my complaint about the slow compilation of Aztec C on the Apple by running the benchmark program in the January 1983 issue of Byte. I still have the interpreted version, V1.03, with two drives. I was also lazy enough to leave off the comments. The results for the sieve program are: Compile: compile = 1:03 (min:sec) assemble = 0:33 link = 1:17 total = 2:53 Execute: 6:37 or 397 seconds I also tried the Pascal version, with these results: Compile: 0:18 Execute: 8:31 or 511 seconds By comparsion, the Integer BASIC execute time was 1850 seconds and Applesoft BASIC was 2806 seconds. I know Pascal is easy to compile, but should it take so long to compile the C code? The compile times for other machines in the article were an order of magnitude faster, so maybe it's just not optimized for the Apple. I'm looking forward to trying the native- code compiler. Dave Newkirk ihnp4!ixn5h!dcn /* * Huge model siev needed for fair comparision of 16 bit CPU's with 68000 * and VAX types. Compile with huge model. Use 80190 and 80191 for sizes * which gives 14713 primes to make sure it really IS huge model code. */ #include <stdio.h> #define SIZE 8190 #define SIZEPL 8191 char f[SIZEPL]; main() { register long i,p,k; register int c, n; for (n = 1; n <= 10; n++) { c = 0; for (i = 0; i <= SIZE; i++) f[i] = 1; for (i = 0; i <= SIZE; i++) { if (f[i]) { p = i + i + 3; k = i + p; while (k <= SIZE) { f[k] = 0; k += p; } c++; } } } printf("\n%d primes.\n", c); } Chuck Forsberg WA7KGX Author of Pro-YAM communications Tools for PCDOS and Unix ...!tektronix!reed!omen!caf Omen Technology Inc "The High Reliability Software" Voice: 503-621-3406 17505-V Northwest Sauvie Island Road Portland OR 97231 TeleGodzilla BBS: 621-3746 2400/1200 CIS:70007,2304 Genie:CAF Source:TCE022 omen Any ACU 1200 1-503-621-3746 se:--se: link ord: Giznoid in:--in: uucp omen!/usr/spool/uucppublic/FILES lists all uucp-able files, updated hourly
G.MDP@score.stanford.edu (Mike Peeler) (11/29/86)
It's worth noting that optimizing doesn't improve the benchmark. What I'm interested in is how fast TYPICAL CODE runs. Typical code isn't optimal. Typical C code doesn't put all variables in registers. I don't always have source and I don't have the time to hand-optimize every program I run. What I want from a benchmark is a basis for a price-performance decision. If an automatic optimizer is available and typical code can take advantage of it, it's ok if the benchmark uses it. But if typical C programs have sub-optimal data declarations, I want to make my comparison on that basis. Thanks, Mike Peeler <G.MDP@Score.Stanford.EDU> -------
caf@omen.UUCP (Chuck Forsberg WA7KGX) (12/01/86)
In article <1172@brl-adm.ARPA> G.MDP@score.stanford.edu (Mike Peeler) writes:
:It's worth noting that optimizing doesn't improve the benchmark.
:What I'm interested in is how fast TYPICAL CODE runs. Typical
:code isn't optimal. Typical C code doesn't put all variables in
:registers. I don't always have source and I don't have the time
:to hand-optimize every program I run.
:
:What I want from a benchmark is a basis for a price-performance
:decision. If an automatic optimizer is available and typical
:code can take advantage of it, it's ok if the benchmark uses it.
:But if typical C programs have sub-optimal data declarations, I
:want to make my comparison on that basis.
Something I want from a benchmark is the ability to run it on real life
machines. Tales of Intel 386 boards getting 4000 to 6000 on Dhrystone
don't mean much to me when the fastest I can get my Intel 386
motherboard to go is about a third of that (my 9 mHz IBM PC-AT actually
runs faster, and doesn't lock up the keyboard either). It is also
interesting which vendors won't allow you to run your own benchmark at a
show; for example, none of the vendors of 386 accelerator boards would
allow my siev benchmark to run at Comdex. The 386 motherboard machines
that I was allowed to run it on were less than half as fast as the fastest
68k system I tried siev on.
One advantage of the siev benchmark is that it is simple enough to
understand. When making comparisions between different types of
systems, the text size of the main function is useful information. It
is even possible to look at a disassembly of the resultant code and make
some sense out of it, to see if the compiler is using 16 or 32 bit
operations, for example. This last point is especially relevant on 386
and 68k systems, where the program may run in either a 16 or 32 bit
model.
The advantage of the 2^4096 benchmark is that it is easy to type in and
run on a Unix system, even if the compiler is not present. When a 386
Unix system runs it more slowly than a PC-AT, you'd better believe some
tuning needs to be done.