[comp.sys.misc] Siev and 2^4096 Benchmarks Updated

caf@omen.UUCP (Chuck Forsberg WA7KGX) (11/25/86)

Siev and BCT Benchmarks:	Comdex 1986 Edition

The Siev benchmark has a new winner: the 25 mHz 68020 powered Definicon
Systems ran siev in .34 seconds.  This is more than twice as fast as the
fastest 386 box in captivity, and mearly as fast as a bug mainframe. 

An interesting note is the Computer Dynamics 18 mHz 386 System (Intel
Motherboard): The benchmark was compiled with both 8086 and 80286 code
generation.  The 86 siev object file was 113 bytes of executable code,
and the 286 file was 108 bytes, a reduction of 5 per cent code space. 
Yet the 8086 code consistiently ran about four per cent faster than the
more compact 286 code!  It is also interesting (and disappointing) to
note that the Intel 386 Motherboard is 50 per cent slower than an 8mHz
AT clone when forced to use 16 bit memory. 

		BCT additions:

time bc <<f
2 ^ 4096
f

Make bct executable.  Clear the screen (no scrolling please). Then run it.

Real Time	System/comments (ws = wait state(s))

0:03.6		Amdahl 580 3 users 2/86 ames!aurora!eugene
0:06.6		Gould UTX32 6/84
0:07.6		Vax 8600 running 4.3BSD Beta, 16 Feb 1986
0:09.5	u	Sun 3/260 68020 "25 Mhz" BSD 4.3	Unix-EXPO 10/86
0:14.5	u	HP 9000/840  Spectrum RISC HP-Unix	Unix-EXPO 10/86
0:15		DG/UX 2.01, DG MV/10000SX, 8MB
0:15.7	u	Compaq 386      80386 Xenix 5	Unix-EXPO 10/86
0:17.6	u	Corvus 386      80386 Xenix 5	Unix-EXPO 10/86
0:31		QIClabs AT 10 mHz 0ws + UniPort Unix SYS V, 8/86
0:33		PC-AT 9.05 mHz + SCO SYS V Xenix 2/86
0:40		Computer Dynamics 386 18 mHz SCO SYS V Xenix 11/86 *

* = User programs running from 16 bit AST Advantage! board; special 32 bit
memory plug-ins are not yet available for the Intel 386 Motherboard.


	Sieve benchmark (Slightly modified from Byte Magazine version)
		11-25-86 Chuck Forsberg Omen Technology Inc

Modifications consist of placing the variables in the "best" place
and shortening the variable names to make keyboarding easier.  There
have been slight differences in the variable names which should not affect
the benchmark results.  The correct answer is 1899 primes.  The order of
register declarations is important on some machines.  This benchmark gives a
shorter program and runs somewhat faster than the original Byte Magazine
version due to the register declarations.  Size in bytes refers to the main()
function text (code) size only.  A 32-bit version to use with 8086 type CPU's
(to make a fair comparision with 68000 and VAX computers) is shown at the end.

siev.c:
#define S 8190
char f[S+1];
main()
{
/*	register long i,p,k,c,n;	For 32 bit entries for PC */
	register int i,p,k,c,n;
	for (n = 1; n <= 10; n++) {
		c = 0;
		for (i = 0; i <= S; i++) f[i] = 1;
		for (i = 0; i <= S; i++) {
			if (f[i]) {
				p = i + i + 3; k = i + p;
				while (k <= S) { f[k] = 0; k += p; }
				c++;
			}
		}
	}
	printf("\n%d primes.\n", c);
}

		Results "32 bit" systems

		Sorted by Execution real time
Compile - Link		Execute
Real	User	Real	User	Bytes	System

7.4	.8	.34	.3416	124	Definicom SYS 68020 25mHz SiVlly 11/86
1.8	.31	.37	.2	200	Ahmdal 470-V8 + UTS (160 users) %
25	1.3	0.8	.45	224	Gould SEL 32/87 (loaded) %
27	6.6	1.1	1.0	140	Charles Rivers + Unos %
22	1.8	2.0	1.2	144	Parallel 68k + Zenix #
9	2.4	2.3	2.3	140	SUN 68010 4.2 BSD 6/84
8.1	1.8	2.6	2.2	104	4.2 BSD VAX 11/780
18.2	4.4	2.7	2.5	72	H-P 32 bit mini
14.6	-	3.3	-	148	Sun Microsystems 68k
29.4	4.84	3.44	3.28	305	8mHz 0ws 1mb QIC-AT SCO -M2h 6/86
22	1.8	3.7	1.3	144	Parallel 68k + Zenix %#
26	8.7	5	4.9	148	Cosmos 10 mHz
31	5.5	5.0	4.5	148	Momentum Hawk 32
38	6.8	5.0	3.9	148	CYB Multibox (Sun Board)
17.5	5.0	5.8	5.7	267	9mHz PC-AT Zenix + huge model 5/85
41	7.3	6.0	5.2	148	Lisa + Unisoft
33	9.8	7.0	5.1	136	CIE 680/30 + Regulus 4.03 -L $
15	3.8	9	6	88	VAX 11/730 + 4.1 BSD
46	2.8	9	7.7	142	CIE 680/20 + Regulus -L
45	8.2	9.0	8.2	128	NS16032 6mHz + "4.1 BSD" &&
50	-	30	-	252	IBMPC DOS 2.1 Lattice 2.00 8/84 +

		Sorted by Compile/link real time
Compile - Link		Execute
Real	User	Real	User	Bytes	System

1.8	.31	.37	0.2	200	Ahmdal 470-V8 + UTS (160 users) %
7.4	.8	.34	.3416	124	Definicom SYS 68020 25mHz SiVlly 11/86
8.1	1.8	2.6	2.2	104	4.2 BSD VAX 11/780
9	2.4	2.3	2.3	140	SUN 68010 4.2 BSD 6/84
14.6	-	3.3	-	148	Sun Microsystems 68k
15	3.8	9	6	88	VAX 11/730 + 4.1 BSD
17.5	5.0	5.8	5.7	267	9mHz PC-AT Zenix + huge model 5/85
18.2	4.4	2.7	2.5	72	H-P 32 bit mini
22	1.8	2.0	1.2	144	Parallel 68k + Zenix #
22	1.8	3.7	1.3	144	Parallel 68k + Zenix %#
25	1.3	0.8	.45	224	Gould SEL 32/87 (loaded) %
26	8.7	5	4.9	148	Cosmos 10 mHz
27	6.6	1.1	1.0	140	Charles Rivers + Unos %
29.4	4.84	3.44	3.28	305	8mHz 0ws 1mb QIC-AT SCO -M2h 6/86
31	5.5	5.0	4.5	148	Momentum Hawk 32
33	9.8	7.0	5.1	136	CIE 680/30 + Regulus 4.03 -L $
38	6.8	5.0	3.9	148	CYB Multibox (Sun Board)
41	7.3	6.0	5.2	148	Lisa + Unisoft
45	8.2	9.0	8.2	128	NS16032 6mHz + "4.1 BSD" &&
46	2.8	9	7.7	142	CIE 680/20 + Regulus -L
50	-	30	-	252	IBMPC DOS 2.1 Lattice 2.00 8/84 +

		Results 8-16 bit systems

		Sorted by real execution time
Compile/Link	Execute		Text (code)
Real	User	Real	User	Bytes	System


-	-	0.742	-	113	PC Limited 386 Xen/XC -M0 11/86
-	-	0.745	-	113	Laser Pacer 386 Xen/XC -M0 11/86
-	-	0.78	-	113	CompDyn 386 18mHz Xen/XC -M0 11/86
-	-	0.81	-	108	CompDyn 386 18mHz Xen/XC -M2 11/86
-	-	0.852	-	113	Data Bank 386 Xen/XC -M0 11/86
-	-	0.895	-	113	Kaypro 386 Xen/XC -M0 11/86
-	-	0.972	-	113	ALR 386 16mHz Xen/XC -M0 11/86
26.9	4.2	1.51	1.49	108	8mHz 0ws QIC/AT 1mb SCO SYS V -K %
-	-	1.64	-	-	Macrotech 7.159mHz 0ws DRC v1.11 5/85
12.4	5.4	1.88	1.85	108	9mHz PC-AT Xenix 1.00 -K %
15.6	5.4	1.9	1.9	113	9mHz PC-AT Xenix 1.00
18	1.1	2.0	1.5	96	11/70 + V7 (loaded)
-	-	2.1	-	113	PC-AT 8mHz Microsoft C 3.0
39	-	2.1	-	135	Macrotech 80286 6mHz MPM DRI C 1.11
26	3.7	2.1	2.0	110	Zilog Model 11
11.7	6.2	2.32	2.26	108	Comp.Dynamics 386 18 mHz Xenix 11/86!!
14.8	4.9	2.42	2.39	136	8mHz 0ws QIC/AT 2.5m Microport 1.32 6/86
16	3.3	2.7	2.2	96	Plexus P-25 5 mHz SYS III 1.1
-	-	2.8	-	113	PC-AT 6mHz Microsoft C 3.0
-	-	2.9	-	126	PC-ATx Mark Williams C DOS 3.0 8/84 @
9	5.9	3.0	2.6	136	Intel 310 6 mHz 1ws 80286 + Xenix 6/84
-	-	3.0	-	161	PC-ATx 8mHz C86 2.2h 12/84
21	6.4	3.2	2.8	98	Plexus P-40 4 mHz
21	-	3.5	-	192	PC-ATx 8mHz Aztec C 1.05i +MASM+LINK
9.3	-	3.73	-	135	PC-ATx DOS 3.0 Lattice 2 8/84 @
14.4	-	3.73	-	135	PC-ATx DOS 3.0 Lattice 2 8/84
-	-	4.22	-	161	PC-ATx DOS 3.0 C86 2.10j 8/84 @
33	9.8	5.0	3.7	114	CIE 680/30 + Regulus 4.03 16 bit ints
105	-	5.9	-	126	NEC APC CP/M + CC86 one drive ^
217	-	5.9	-	126	NEC APC CP/M + CC86 two drives ^
31.7	-	6	-	126	Mark Williams C Sperry PC 7mHz 8/84
50	3.3	7.0	5.7	114	CIE 680/20 + Regulus
95	-	7.8	-	126	Control-C CC86 IBMPC CP/M-86 5"fd ^
-	-	8.2	-	126	Mark Williams C IBM PC  8/84 @
31.1	8.5	8.7	8.0	126	Coherent +IBMPC-XT
46	14.7	9	8.2	122	PC-XT Venix 6/84
49	16.7	9.4	7.9	117	PC-XT SCO XENIX 6/84
75	-	10	-	135	Zenith Z100 5"fd + Lattice C 1.01 ^
-	-	10	-	194	Introl 6809 C 2mHz *
110	17.4	10.4	7.7	117	PC-XT 512kb SCO XENIX 11/84
82	-	11	-	135	IBMPC 5"fd PCDOS1.1 + Lattice 1.01 ^
65	-	-	-	135	IBMPC 5"fd PCDOS2.0 + Lattice 1.01 &
18.4	-	11	-	135	IBMPC E-disk PCDOS1.1+JEL+Lattice 1.01
39	-	11.1	-	135	IBMPC DOS 2.1 Lattice 2 8/84 @
32	13	11	.3	112	PC-XT PCIX 6/84
32	5.9	13	12	152	IBM 4954 (Series/1 middle)
37	-	13.4	-	161	IBMPC DOS 2.1 C86 2.10j 8/84 @
35	-	13.6	-	165	IBMPC DOS 2.1 C86 2.07a @
38	-	13.6	-	165	IBMPC DOS 2.1 C86 2.00a @
-	-	14	-	244	Telecon C on 2mHz 6809 **
-	-	22	-	-	CII C86 1.26 IBMPC CP/M-86 5"fd ^!
17.5	-	23.7	-	-	Televideo 820H +BDS C + L2 + EX ^
19	-	31	-	-	Z89+ CDI MEG-6 +BDS C + L2 + EX ^

		Sorted by real compile/link time
Compile/Link	Execute		Text (code)
Real	User	Real	User	Bytes	System

9	5.9	3.0	2.6	136	Intel 310 6 mHz 1ws 80286 + Xenix 6/84
9.3	-	3.73	-	135	PC-ATx DOS 3.0 Lattice 2 8/84 @
11.7	6.2	2.32	2.26	108	Comp.Dynamics 386 18 mHz Xenix 11/86!!
12.4	5.4	1.88	1.85	108	9mHz PC-AT Xenix 1.00 -K %
14.4	-	3.73	-	135	PC-ATx DOS 3.0 Lattice 2 8/84
14.8	4.9	2.42	2.39	136	8mHz 0ws QIC/AT 2.5m Microport 1.32 6/86
15.6	5.4	1.9	1.9	113	9mHz PC-AT Xenix 1.00
16	3.3	2.7	2.2	96	Plexus P-25 5 mHz SYS III 1.1
17.5	-	23.7	-	-	Televideo 820H +BDS C + L2 + EX ^
18	1.1	2.0	1.5	96	11/70 + V7 (loaded)
18.4	-	11	-	135	IBMPC E-disk PCDOS1.1+JEL+Lattice 1.01
19	-	31	-	-	Z89+ CDI MEG-6 +BDS C + L2 + EX ^
21	6.4	3.2	2.8	98	Plexus P-40 4 mHz
21	-	3.5	-	192	PC-ATx 8mHz Aztec C 1.05i +MASM+LINK
26	3.7	2.1	2.0	110	Zilog Model 11
26.9	4.2	1.51	1.49	108	8mHz 0ws QIC/AT 1mb SCO SYS V -K %
31.1	8.5	8.7	8.0	126	Coherent +IBMPC-XT
31.7	-	6	-	126	Mark Williams C Sperry PC 7mHz 8/84
32	13	11	.3	112	PC-XT PCIX 6/84
32	5.9	13	12	152	IBM 4954 (Series/1 middle)
33	9.8	5.0	3.7	114	CIE 680/30 + Regulus 4.03 16 bit ints
34	-	13.6	-	165	IBMPC DOS 2.1 C86 2.07a @
37	-	13.4	-	161	IBMPC DOS 2.1 C86 2.10j 8/84 @
38	-	13.6	-	165	IBMPC DOS 2.1 C86 2.00a @
39	-	2.1	-	135	Macrotech 80286 6mHz MPM DRI C 1.11
39	-	11.1	-	135	IBMPC DOS 2.1 Lattice 2 @
46	14.7	9	8.2	122	PC-XT Venix 6/84
49	16.7	9.4	7.9	117	PC-XT SCO XENIX 6/84
50	3.3	7.0	5.7	114	CIE 680/20 + Regulus
65	-	-	-	135	IBMPC 5"fd PCDOS2.0 + Lattice 1.01 &
75	-	10	-	135	Zenith Z100 5"fd + Lattice C 1.01 ^
82	-	11	-	135	IBMPC 5"fd PCDOS1.1 + Lattice 1.01 ^
95	-	7.8	-	126	Control-C CC86 IBMPC CP/M-86 5"fd ^
105	-	5.9	-	126	NEC APC CP/M + CC86 one drive ^
110	17.4	10.4	7.7	117	PC-XT 512kb SCO XENIX 11/84
217	-	5.9	-	126	NEC APC CP/M + CC86 two drives ^

Notes:
	$ Compiled with "register long" and -L option "for large programs".
16 bit integers vs. 32 bit pointers cause portability problems, especially
with printf and scanf control strings.
	% Outer loop increased to 100 and result times adjusted because of
fast execution times indicated.

	The faster 9 mHz PC-AT times were with an AST Advantage! board
(1152k total) and some sticky bits set.

	!! User programs running from 16 bit memory (AST Advantage!)

	# Siginificant difference between real and user times not due
to system load (user time suspect).

	&& System still being developed. Compiles size dropped from 136
to 128 bytes and times by >40% during the Unicom show.  Stay tuned ...

	^ Real time measured from beginning of program execution to program
finish (omitting loading and reboot time).

	& Compile/link times; sequenced by "batch" file; 20 buffers, NO verify.
Verify adds about 15 seconds.

	! A much faster code generator has been promised.

	* Data from compiler vendor re Byte Magazine version.

	**Data from compiler vendor re Byte Magazine version.
A optimizer under development provides about 30% improvement in code density
and execution speed.

	@ compiler/linker Executables on hard disk, other files on
electronic disk IBM PC, DOS 2.1, Maynard WS-1 Hard Disk except for
IBMPC-AT(extended) as shown.

	+ Compiled with the Lattice 8086 D model (big data memory,
<64k code), and 32 bit index variables.
For comparision with 68000 based systems listed as 32 bit systems, as
the 68000 compilers listed there use long integers and can address >64k
without compiling in special modes.  No electronic disk used.

		-------------- Comment ------------

These times are approximate and may improve as product development
proceeds on the newer systems.  They should be used as general
information regarding the levels of performance possible and not in any
specific purchasing decision without independent confirmation.  It
should be noted that the short loops in this program may penalize highly
pipelined machines such as the 16032 more than other (more representative
of normal usage) programs. 

An interesting note is the Computer Dynamics 18 mHz 386 System (Intel
Motherboard): The benchmark was compiled with both 8086 and 80286 code
generation.  The 86 siev object file was 113 bytes of executable code,
and the 286 file was 108 bytes, a reduction of 5 per cent code space. 
Yet the 8086 code consistiently ran about four per cent faster than the
more compact 286 code!  It is also interesting (and disappointing) to
note that the Intel 386 Motherboard is 50 per cent slower than an 8mHz
AT clone when forced to use 16 bit memory. 

The Regulus software is listed with different times in 16 and 32 bit
categories because the compiler uses 16 bit integers and defaults to
16 bit addressing except for pointers.

On some systems, there is a considerable discrepancy between the real
and user times that cannot be explained by other demands on the
system.  Usually, the real and user times are nearly the same when a
cpu bound program is run on an otherwise unloaded Unix(TM) system.

The compile/link times are often more significant in predicting how
responsive a system will be in a software development context.

On BDS C, the variables are made externs to optimize 8080 execution
speed and code density.  BDS C lacks longs, floats, and some other
aspects of C, but it produces reasonable code density and is an
excellent compromise for the CP/M environment.

Compile times were influenced by the structure of the compiler.  The
Unix(TM) compilers had up to 5 passes (preprocessor, c0, c1, c2, as) while
Lattice and BDS C have but two passes to produce object code (BDS uses
no intermediate file).  Lattice, C86 and Coherent/CC86 bypass the assembly
pass.

The compile/link times are affected by the size of the library
that must be scanned.  This tends to penalize the more nearly complete
implementations such as C86 and Coherent/Mark Williams.

Some MS-DOS implementations place uninitialized externs in the .exe
file.  For example, the Lattice C v2.0 results in a 19584 byte .exe
file for siev, while the C86 v2.10j .exe file is 9466 bytes!

Unix, Zenix, VAX, et al. are trade-marks.
From lanl-a!jlg Wed May 11 14:39:55 1983
Subject: sieve
Newsgroups: net.micro

The sieve program used in BYTE was not really a very good benchmark of
larger machines.  The problem is, there is too much stuff in the algorithm
that is not necessary (ie. never used or printed).  A good compiler on a
large machine will probably 'optimize' all of this stuff out.  The result
is that the same algorithm is not performed on each machine.  

No one with access to both IBM and CRAY machines will really beleave that
the IBM numbers in the January BYTE are correct.  The CRAY fortran numbers
(CFT is not very good at global optimization) are pretty accurate,  also
a hand coded assembly version of the algorithm (which implements the whole
benchmark with nothing optimized out) beats the best IBM numbers by 50%.
The advantage of vector arch....    

This may not seem to be relevant to the discussion of micros, but the newer
machines now and in the near future will be a lot more sophisticated than
those most micro fans are familiar with.  Be on the lookout for bad benchmarks!
Mainframe people have had to face this problem many times.

                           J.L. Giles
                           (...!lasl-a!jlg)

From ixn5h!dcn Thu May 12 05:48:56 1983
Subject: Re: Aztec C Review
Newsgroups: net.micro.apple


	I decided to quantify my complaint about the slow compilation
	of Aztec C on the Apple by running the benchmark program in
	the January 1983 issue of Byte.  I still have the interpreted
	version, V1.03, with two drives.  I was also lazy enough to 
	leave off the comments.  The results for the sieve program are:

		Compile:	compile  = 1:03 (min:sec)
				assemble = 0:33
				link     = 1:17
				total    = 2:53
		Execute:	6:37 or 397 seconds

	I also tried the Pascal version, with these results:

		Compile: 0:18
		Execute: 8:31 or 511 seconds

	By comparsion, the Integer BASIC execute time was 1850 seconds
	and Applesoft BASIC was 2806 seconds.

	I know Pascal is easy to compile, but should it take so long to
	compile the C code?  The compile times for other machines in the
	article were an order of magnitude faster, so maybe it's just not
	optimized for the Apple.  I'm looking forward to trying the native-
	code compiler.
					Dave Newkirk
					ihnp4!ixn5h!dcn

/*
 * Huge model siev needed for fair comparision of 16 bit CPU's with 68000
 * and VAX types.  Compile with huge model.  Use 80190 and 80191 for sizes
 * which gives 14713 primes to make sure it really IS huge model code.
 */
#include <stdio.h>

#define SIZE 8190
#define SIZEPL 8191
char f[SIZEPL];
main()
{
	register long i,p,k;
	register int c, n;

	for (n = 1; n <= 10; n++) {
		c = 0;
		for (i = 0; i <= SIZE; i++) f[i] = 1;
		for (i = 0; i <= SIZE; i++) {
			if (f[i]) {
				p = i + i + 3; k = i + p;
				while (k <= SIZE) { f[k] = 0; k += p; }
				c++;
			}
		}
	}
	printf("\n%d primes.\n", c);
}

Chuck Forsberg WA7KGX Author of Pro-YAM communications Tools for PCDOS and Unix
...!tektronix!reed!omen!caf  Omen Technology Inc "The High Reliability Software"
  Voice: 503-621-3406  17505-V Northwest Sauvie Island Road Portland OR 97231
TeleGodzilla BBS: 621-3746 2400/1200  CIS:70007,2304  Genie:CAF  Source:TCE022
  omen Any ACU 1200 1-503-621-3746 se:--se: link ord: Giznoid in:--in: uucp
  omen!/usr/spool/uucppublic/FILES lists all uucp-able files, updated hourly

G.MDP@score.stanford.edu (Mike Peeler) (11/29/86)

It's worth noting that optimizing doesn't improve the benchmark.
What I'm interested in is how fast TYPICAL CODE runs.  Typical
code isn't optimal.  Typical C code doesn't put all variables in
registers.  I don't always have source and I don't have the time
to hand-optimize every program I run.

What I want from a benchmark is a basis for a price-performance
decision.  If an automatic optimizer is available and typical
code can take advantage of it, it's ok if the benchmark uses it.
But if typical C programs have sub-optimal data declarations, I
want to make my comparison on that basis.

Thanks,
   Mike Peeler <G.MDP@Score.Stanford.EDU>
-------

caf@omen.UUCP (Chuck Forsberg WA7KGX) (12/01/86)

In article <1172@brl-adm.ARPA> G.MDP@score.stanford.edu (Mike Peeler) writes:
:It's worth noting that optimizing doesn't improve the benchmark.
:What I'm interested in is how fast TYPICAL CODE runs.  Typical
:code isn't optimal.  Typical C code doesn't put all variables in
:registers.  I don't always have source and I don't have the time
:to hand-optimize every program I run.
:
:What I want from a benchmark is a basis for a price-performance
:decision.  If an automatic optimizer is available and typical
:code can take advantage of it, it's ok if the benchmark uses it.
:But if typical C programs have sub-optimal data declarations, I
:want to make my comparison on that basis.

Something I want from a benchmark is the ability to run it on real life
machines.  Tales of Intel 386 boards getting 4000 to 6000 on Dhrystone
don't mean much to me when the fastest I can get my Intel 386
motherboard to go is about a third of that (my 9 mHz IBM PC-AT actually
runs faster, and doesn't lock up the keyboard either).  It is also
interesting which vendors won't allow you to run your own benchmark at a
show; for example, none of the vendors of 386 accelerator boards would
allow my siev benchmark to run at Comdex.  The 386 motherboard machines
that I was allowed to run it on were less than half as fast as the fastest
68k system I tried siev on.

One advantage of the siev benchmark is that it is simple enough to
understand.  When making comparisions between different types of
systems, the text size of the main function is useful information.  It
is even possible to look at a disassembly of the resultant code and make
some sense out of it, to see if the compiler is using 16 or 32 bit
operations, for example.  This last point is especially relevant on 386
and 68k systems, where the program may run in either a 16 or 32 bit
model. 


The advantage of the 2^4096 benchmark is that it is easy to type in and
run on a Unix system, even if the compiler is not present.  When a 386
Unix system runs it more slowly than a PC-AT, you'd better believe some
tuning needs to be done.