[comp.benchmarks] musbus results for various machines?

schreiber@schreiber.asd.sgi.com (Olivier Schreiber) (03/23/91)

Hi!, I am looking for standard workload multiuser musbus results
for various machines.

Thanks in advance.
--

Olivier Schreiber  Technical Marketing schreiber@sgi.com (415)335 7353 MS/7L580
Silicon Graphics Inc.,  2011 North Shoreline Blvd. Mountain View, Ca 94039-7311
          Better to be rich and healthy than poor and sick.

kenj@yarra.oz.au (Ken McDonell) (03/26/91)

schreiber@schreiber.asd.sgi.com (Olivier Schreiber) writes:

>Hi!, I am looking for standard workload multiuser musbus results
>for various machines.

I think the time has come to repeat some things I first said 7 years
ago (about 3 years after I developed MUSBUS) ...

    Of course I have many sets of results, but these have been obtained as a
    consequence
    (a) of machines we [the CS Department at Monash University] have, or
    (b) consulting to vendors or purchasers in tender procedures [since
	I joined Pyramid full-time in 1988 I have not done much of this
	for systems other than Pyramids! :-)],
	or
    (c) non-disclosure agreements (e.g. products yet to be announced).
    
    I receive many requests for copies of previous results, however
    only those in (a) could be made available, and then I tend NOT to do this
    for the following reasons;

    (0) They are quickly out of date and of little relevance.

    (1) Different h/w configurations, versions of the same Unix port & C
        compilers and versions of the MUSBUS programs themselves vary with
	time to such an extend that labelling one set of figures as from
	Brand X Model Y is misleading to all concerned.

	Getting the best MUSBUS result is not as trivial as maximizing a
	*stone number or SPECmark.  There are questions of price-performance
	-- results from a CPU-bound configuration are very different to the
	results for the same CPU when configured so that the benchmark runs
	disk I/O bound.

	Full disclosure of the configuration and environment requires about
	a screenful of text, and I know that if this were included with the
	results, after a very short time the results would remain but the
	other information would be "lost" in the interests if brevity!

    (2) MUSBUS is intended to be reconfigured in the critical multiuser
        simulated workload test to reflect the work profile of a particular
	user site. 

	Whenever different workloads are used the results cannot be compared.

	MUSBUS was intended as a multiuser benchmark framework -- at the time
	I did not appreciate how difficult most people regarded the task of
	workload definition, and with time most results have been created
	with the default workload I supplied -- this is helping the CS
	Department at Monash choose an appropriate system to meet their
	*1980* needs, but not much else.

	This trap of a default workload plagues most multi-user benchmarks;
	AIM Suite 3, SDE, and Gaede all have mechanisms for defining an
	application-specific workload, but this is rarely done.  People tend
	to use the default workloads without ever asking "does this profile
	look anything like my intended system usage?".

	The most interesting MUSBUS results are the ones that have NOT used
	the default workload, but promulgation of these results would only
	inject further confusion for those looking to compare systems without
	checking the fine print.

    (3) Deliberately the MUSBUS tests are in two distinct categories, raw
        speed and multiuser.  The former are useful for diagnostic purposes
	only and give little useful information for a potential purchaser. 
	The latter test gives good predictions of system performance.  Not
	everyone appreciates this, and some rather silly conclusions have
	been drawn as a result.

	I regret every distributing the raw speed tests, and plan to drop
	them in a future MUSBUS release.

    (4) In MUSBUS there is (quite deliberately) no single figure of merit
	(this is not the holy grail) -- rather we see the effects of
	increasing load (the time-constraining of user input makes the
	system lightly loaded at low levels of concurrency, unlike other
	multiuser benchmarks) as we move towards some form of resource
	depletion.  The level of concurrency is a free variable and there
	is no reason to suspect that useful results will be derived by
	picking the same concurrency levels for all systems.
	
	Given a *set* of results for each system, comparisons between systems
	are difficult, and require selection of a computed metric appropriate
	to one's needs, e.g. CPU time per user in the limit, user load
	at which elapsed time increases by X%, user load at which CPU
	utilization exceeds Y%, aggregate throughput in commands (processes)
	per unit time, ...


This has been something of a philosophical diatribe, but I hope it goes
some way to explaining why I think people should not promulgate MUSBUS
results without paying special attention to the points I have raised.

Since KENBUS is basically the MUSBUS multiuser test (default workload,
no comms I/O -- but that is another story), people should bear in mind
the points I've raised when SPEC Release 2.0 results start to be circulated.
-- 
Ken McDonell			  E-mail:     kenj@pyramid.com kenj@yarra.oz.au
Performance Analysis Group	  Phone:      +61 3 820 0711
Pyramid Technology Corporation	  Disclaimer: I speak for me alone, of course.
Melbourne, Australia