[comp.unix] Workstation benchmarks

raveling@isi.edu (Paul Raveling) (07/14/90)

	Last week I ran some benchmarks with interesting results
	for evaluating some combinations of workstations, operating systems,
	and C compilers.  Following the formfeed below is a rather long
	report on the results.  The two systems being most seriously
	compared were an HP 9000/370 and a Sun 4, but various results
	are included for a Sun 3 and a VAX 8650.

	Beyond some obvious conclusions about which {hardware/OS/compiler}
	is fastest in various circumstances, one result that I find
	interesting supports an old hypothesis of mine in the area
	of OS theory.  This hypothesis is essentially that context switch
	overhead is the principle determinant of OS performance in the
	presence of a typical multi-process workload.


	Please note that this is cross-posted to several newsgroups
	that may have an interest in the machines, OS's, and compilers
	that were compared.  It would be appropriate to edit the
	Newsgroups line in any followups.  Also, please be aware that
	I don't subscribe to most of these newsgroups; the best way
	to get a question to me would be by email or by a followup
	to comp.sys.hp.


----------------
Paul Raveling
Raveling@isi.edu

	Last week I ran two suites of benchmarks to compare various
	combinations of workstation hardware, operating systems, and
	C compilers.  Emphasis was on:

		**  HP 9000/370 vs Sun 4 vs Sun 3
		**  HP-UX vs BSD
		**  Native C compilers versus gcc

	One suite was the small collection that I've been using for
	a couple years, the other is the BYTE UNIX benchmarks published
	recently on comp.sources.unix.


	Some Conclusions
	----------------

	--  Comparisons between HP-UX and BSD on HP 9000/370's
	    indicate that BSD is generally much faster.  The main
	    differences are in speed of context switching and i/o.

	--  Context switch overhead is probably a key determinant
	    of overall system performance.  The HP-UX/BSD comparison
	    shows strong similarity between relative speed ratios for
	    BYTE's system loading test and context switch benchmarks;
	    the same correlation does not apply well to other low level
	    benchmarks.

	--  The C compiler that produced the fastest code at maximum
	    optimization was the vendor's C compiler on both the
	    HP 9000/370 and the Sun 4.  However, gcc may produce
	    faster floating point code on the Sun 4.

	--  Processor speed tests show that the HP 9000/370 and
	    Sun 4 are about equally matched, except in two areas:
	    The Sun 4 is faster in floating point and recursion.

	--  BYTE's I/O throughput tests showed that the Sun 4 was
	    surprisingly slow.  Both the HP and a Sun 3 were faster.


	Measured Results
	----------------

	All results that follow are expressed as relative speed ratios
	based on some measured quantity:  User process time, system time,
	real time, or i/o rates.

		1	is assigned to the fastest measured result.
		n	means "n times slower than fastest"; "n" is
			expressed to 2 fractional digits (e.g. "1.23")

	I.e., the lower the speed ratio, the faster the performance.

	In a few cases two or more different machines/systems/compilers
	produced a dead tie for the fastest measured result.  In this
	case both show "1" as their relative speed ratio.  "1.00"
	indicates a speed very slightly slower than the fastest,
	for which the ratio rounds to 1.00.


	1.  Best optimizing compiler:

	    On HP 9000/370's it was HP-UX's compiler.  On Sun 4's it
	    was Sun's, except that gcc was better in BYTE's floating point
	    math tests.  Measured results were user process time, and
	    on benchmarks marked with "(r)", the "{dhry/whet}stones/second"
	    rating reported by the benchmark.


	    Compilers on the HP were:

		"HP-UX cc":	Native compiler from HP-UX 6.5
		"gcc":		gcc 1.37.1
		"BSD cc":	gcc 1.34, as supplied by Utah for BSD

	    Compilers on the Sun 4 were:

		"Sun cc":	Native compiler from SunOs 4.0.3
		"gcc":		gcc 1.37.1


			     HP 9000/370		   Sun 4
	Benchmark      HP-UX cc	 gcc   BSD cc	       Sun cc	 gcc
	---------      --------	 ---   ------	       ------	 ---

	dhrystone	1	1.26	1.19		1	1.17
	dhrystone(r)	1	1.25	1.43		1	1.17
	whetstone	1	1.15	1.10		1.01	1
	whetstone(r)	1	1.16	1.07		1.01	1
	tak		1	2.10	2.06		1	1.24
	dhrystone2a(r)	1	1.15	1.41		1	1.76
	dhrystone2b(r)	1	1.14	1.39		1	1.79
	arithoh		1	2.18	1.76		1	1
	register	1.01	1.02	1		1      10.51
	short		1.12	1	1.00		1.00	1
	int		1.01	1.03	1		1	1.03
	long		1.01	1.03	1		1	1.02
	float		1.07	1	1.60		2.42	1
	double		1	1.10	1.04		1.14	1
	tower of hanoi	1	1.83	1.83		1	1



	2.  Relative processing [hardware] speeds:

	    These results also are based on user process time.
	    For the HP and Sun 4, the measurements used are those for
	    whichever compiler's executable was fastest.  Only the
	    installed "cc" was used on the Sun 3 and the VAX.

	    This doesn't precisely show relative hardware speed
	    because it's at the mercy of the available C compilers.


	Benchmark     HP 9K/370	Sun 4	Sun 3  VAX 8650	
	---------     --------- -----	-----  --------

	dhrystone	1.18	1	5.02	2.44
	dhrystone(r)	1.18	1	5.03	2.45
	whetstone	1	1.01   23.65	2.62
	whetstone(r)	1	1      24.34	3.77
	tak		1.88	1	3.91	3.47
	dhrystone2a(r)	1.13	1	3.73	2.08
	drhystone2b(r)	1.12	1	3.74	2.16
	arithoh		1	1.18	4.12	(Test failed on VAX)
	register	1.38	1.40	2.39	1
	short		1	1.40	2.01	1.16
	int		1.26	1.50	2.17	1
	long		1	1.20	1.73	1.37
	float		2.23	1      67.62	4.91
	double		1.54	1      39.15	2.94
	tower of hanoi	1.50	1	4.25	(Test failed on VAX)



	    See item 4, 3 pages farther on, for a comparison of
	    relative i/o speeds.  These would be largely dependent
	    on hardware, but as  item 3 on the next page shows,
	    choice of operating system is also significant.


	3.  Relative operating system system speeds:

	    Direct comparison of HP-UX 6.5 and BSD 4.3 on identical
	    HP 9000/370's.  Tests included 3 types of benchmarks:

	    --  Low level processor-intensive tests
	    --	Low level i/o-intensive tests
	    --  High level tests of a simulated workload


	Low level processor-intensive tests:


				System Time		 Real Time
				:::::::::::		 :::::::::
	Benchmark		HP-UX	BSD		HP-UX	BSD
	---------		-----	---		-----	---

	pt [context switch]	2.08	1		2.21	1
	iocall			1	1.15		1	1.13
	system call overhead	1	1.19		1	1.19
	pipe throughput		1.33	1		1.28	1
	pipe-based context sw.	2.74	1		2.15	1
	process creation	1.33	1		1.15	1
	execl throughput	1.45	1		1	1.28


	Low level i/o-intensive tests:

	    Filesystem throughput, based on reported KBytes/second.


	Test Time	System		Read	Write	Copy
	---------	------		----	-----	----

	1 sec		HP-UX		1	1.17	1.27
			BSD		1.08	1	1

	10 sec		HP-UX		1.27	1.29	1.91
			BSD		1	1	1

	20 sec		HP-UX		1.48	1.48	1.67
			BSD		1	1	1



	High level tests of a simulated workload:

	    Bourne shell script and UNIX utilities


	Concurrent Background			........Time........
	Processes	System & Compiler	User   System	Real
	---------	-----------------	----   -----	----

	1		HP-UX	cc		1.04	2.09	2.02
			HP-UX	gcc		1	2.22	1.98
			BSD	cc		1.29	1	1

	2		HP-UX	cc		1	2.22	2.15
			HP-UX	gcc		1.02	2.28	2.22
			BSD	cc		1.36	1	1

	4		HP-UX	cc		1.03	2.21	2.43
			HP-UX	gcc		1	2.20	2.12
			BSD	cc		1.32	1	1

	8		HP-UX	cc		1	2.29	2.16
			HP-UX	gcc		1.01	2.30	1.73
			BSD	cc		1.30	1	1



	4.  Net relative OS-related system speeds, comparing different
	    all tested combinations of hardware, OS's, and C compilers.

	Comparisons in the immediately following table are based on
	measured real time, except for the "n-sec" i/o benchmarks.


			     HP 9000/370	   Sun 4       Sun 3	VAX
			:::::::::::::::::::	   :::::       :::::	:::
			   HP-UX	BSD	   SunOS       SunOS	BSD
			:::::::::::	:::	:::::::::::    :::::	:::
	Benchmark	cc	gcc	cc	cc	gcc	cc	cc
	---------	--	---	--	--	---	--	--

	pt		2.39	2.45	1.08	1	1.10	2.29	1.52
	iocall		2.08	2.34	2.34	1.04	1	3.92	2.27
	sys call ovhd	1.12	1.20	1.33	1	1.02	3.61	1.20
	pipe th'put	2.09	3.08	1.63	1.03	1	3.36	1.44
	context sw.	2.15	3.69	1	1	1	2.28	1
	process creat'n	1.37	1.54	1.19	3.46	3.40	7.05	1
	execl th'put	1.01	1.03	1.29	1.84	1.79	3.51	1

	1-sec read	1.03	1.10	1.15	1.26	1.31	1      [0.17]
	1-sec write	1.14	1.21	1	1.40	1.45	1.10   [0.08]
	1-sec copy	1.43	1.15	1	1.31	1.31	1.17   [0.36]
	10-sec read	1.29	1.25	1	2.50	2.25	1.50   [0.13]
	10-sec write	1.29	1.29	1	2.50	2.25	1.50   [0.11]
	10-sec copy	1.87	1.95	1	3.07	2.26	1.65   [0.25]
	20-sec read	1.44	1.53	1	2.55	2.55	1.53   [0.14]
	20-sec write	1.44	1.53	1	2.55	2.55	1.53   [0.12]
	20-sec copy	1.64	1.71	1	2.12	2.25	1.33   [0.27]

	sh+ut load(1)	2.02	1.98	1	1.72	1.51	1.64	1.33
	sh+ut load(2)	2.15	2.22	1	5.20	1.99	1.98	1.29
	sh+ut load(4)	2.43	2.12	1	1.65	1.75	1.99	1.33
	sh+ut load(8)	2.16	1.73	1	1.49	1.47	1.78	1.12




	Hardware Configurations
	-----------------------

	HP 9000/370:	24 MB RAM, 68881 floating point (no FPA)
			I/O via NFS mounts to another HP 9K/370
			on local ethernet

	Sun 4		24 MB RAM, programs loaded from local disk,
			other I/O via NFS mounts to VAX 8650

	Sun 3/80	8 MB RAM, programs loaded from local disk,
			other I/O via NFS mounts to VAX 8650

	VAX 8650	20 MB RAM, I/O to local disk


	Notes
	-----

	1.  All tests were run at least 3 times, and the BYTE benchmarks
	    ran many tests 6 times.  The results reported are mean values
	    for all trials.

	2.  Measurements based on real time should be treated with
	    a bit of suspicion, particularly on the VAX, which supports
	    a substantial amount of activity in both user jobs and
	    NFS i/o.   The BYTE benchmarks reported 95 interactive users
	    when they started on the VAX.

	    **	A notable case is that variance was unusually high for
		the "whetstones per second" rate reported on the VAX.
		However, user process times reported for the same tests
		were much more consistent.

	    The workstations should be fairly safe from loading by local
	    processes, but their i/o speeds are vulnerable to loading on
	    the local ethernet and file servers.

	    **	And yes, gcc-generated code WAS slower by an order of
		magnitude on the Sun 4 "register" benchmark.  This is so
	 	blatantly odd that I repeated both compiling and running
		this test to be sure the numbers were correct and consistent.

	3.  The VAX's I/O was MUCH faster than the workstations,
	    sometimes by up to an order of magnitude.  This may be
	    partly due to use of only local disks rather than NFS-mounted
	    files systems on the VAX.  However, older benchmarks also
	    had suggested that workstations using local disks still
	    offered much less data bandwidth than the VAX.

	    In order to provide a meaningful comparison among workstations
	    for i/o, performance ratio "1" was assigned to the fastest
	    workstation.  This is why the VAX's performance is fractional.

	    **	A particularly interesting result was that i/o to/from
		an NFS-mounted file system was slower on the Sun 4
		than on the Sun 3.  Both machines were using the same
		file system on the same server.