[comp.org.usenix] Benchmarking the 532, 68030, MIPS, 386...at a Usenix!

gnu@hoptoad.uucp (John Gilmore) (05/15/87)

In article <4294@nsc.nsc.com>, grenley@nsc.nsc.com (George Grenley) writes:
> So, here's the deal.  I invite Mot, Intel, and other interested parties
> to work with me in defining some sort of realistic benchmark, which we'll
> run (in public).  I expect to have system level hardware late this year,
> so if we get started now, we'll have very interesting Xmas presents...
> May the best CPU win!  

I'd like to join in the hoopla, rahrah, etc that has followed this
suggestion, and make a further one:

	Let's have the bake-off in the trade show at, say, next Winter
	Usenix.  Probably the actual setup and running of the benchmarks
	can be done a day or two before the show, so the results can be
	printed for distribution, and to give the losers time to think
	up (and print up) good explanations before we descend on them :-).

	Let's also make the same setup of machines available for people
	to run their own benchmarks.  It'd be easiest if they were all
	on a network, of course, though the benchmarks should in
	general be run from local disk to eliminate networking delays.
	Except for multi-CPU machines which want to show off their
	multitasking, only one person should be running on any machine
	at once.  But you could load a tape of benchmarks
	onto a server machine somewhere, then as time became free on
	each machine of interest, rcp over (or cp via NFS) your data,
	compile, and run.  It might even be possible to just have a
	bank of terminals where you could rlogin to each system in
	turn, using some simple scheme to avoid multiple people getting
	through at once, rather than schlepping around the trade show
	floor.  Anyone could verify the benchmark results from the
	bake-off by rerunning them themselves.

	Each such machine should have its full configuration posted
	prominently, with the list price of the configuration, if for
	sale, or its ballpark price and expected availability if the
	machine is not announced or not shipping.  If I go to Usenix
	and run some benchmarks, then go home and buy a system based on
	them, I want to be able to reproduce the configuration that won
	on my purchase order.

	Anybody who's willing to bring a machine to the trade show
	floor (and pay for the booth...) should be able to enter, e.g.
	there's no reason to restrict it to "interesting new micros".
	Of course, the benchmark machines should be available for
	benchmarking full time while the show is open, so the vendors
	should bring a second machine for demos unless they want to
	degrade their benchmark results.  To encourage prototypes to
	appear, there should be no requirement that the stuff be for
	sale yet either.  If they'll bring it, we'll benchmark it!

	Any other Usenix members interested in this?  Think we can
	get the conference committee to go for it?
-- 
Copyright 1987 John Gilmore; you may redistribute only if your recipients may.
(This is an effort to bend Stargate to work with Usenet, not against it.)
{sun,ptsfa,lll-crg,ihnp4,ucbvax}!hoptoad!gnu	       gnu@ingres.berkeley.edu

daveb@rtech.UUCP (05/15/87)

In article <2128@hoptoad.uucp> gnu@hoptoad.uucp (John Gilmore) writes:
>
>	Let's have the bake-off in the trade show at, say, next Winter
>	Usenix.  Probably the actual setup and running of the benchmarks
>	can be done a day or two before the show, so the results can be
>	printed for distribution, and to give the losers time to think
>	up (and print up) good explanations before we descend on them :-).
>
>	Let's also make the same setup of machines available for people
>	to run their own benchmarks...

At last winter's Uniforum, I went around to a number of booths trying to
run the infamous

	/bin/time bc << !
	2^4096
	!

At a distressing number of places the sales creatures in the booth would
say things like, "I don't believe we're interested in running any
benchmarks today.  Let me show you vi."  Now there are some good reasons
for this, but it sure sounded like there was something being hidden.

Problem 1 is getting some benchmarks run.  Problem 2 is trying to get a
straight answer on the price of the system.  What you really want is the
bang/buck of different benchmarks on different boxes.  The results would
be an embarrassing to many people wearing suits, which is why it may be
difficulty to get a lot of cooperation.

-dB

PS:  Given my druthers, I'd like to see:

	* the bc benchmark above
	* Dhrystone
	* Whetstones
	* A paging thrasher.
	* A system call overhead checker (looped getpid()s maybe).
	* A process thrasher.

I'd probably give up on disk speed and tty i/o.

-- 
{amdahl, cbosgd, mtxinu, ptsfa, sun}!rtech!daveb daveb@rtech.uucp

grenley@nsc.nsc.com (George Grenley) (05/15/87)

A while back I invited other chip manufacturers to join me in a CPU
horse race.  I'm happy to see this request is generating interest.
However, I haven't heard much from Mot or Intel.  Are you guys
listening?  Iknow you're out there...

In article <826@rtech.UUCP> daveb@rtech.UUCP (Dave Brower) writes:
>At last winter's Uniforum, I went around to a number of booths trying to
>run the infamous
>	/bin/time bc << !
>	2^4096
>	!
>At a distressing number of places the sales creatures in the booth would
>say things like, "I don't believe we're interested in running any
>benchmarks today.  Let me show you vi."  Now there are some good reasons
>for this, but it sure sounded like there was something being hidden.

No, they just don't want to risk a system crash, or other malicious use.
Most show-people aren't Unix gurus, and are hesitant to let a hacker play
with the system on the show floor.

>Problem 1 is getting some benchmarks run.  Problem 2 is trying to get a
>straight answer on the price of the system.  What you really want is the
>bang/buck of different benchmarks on different boxes.  The results would
>be an embarrassing to many people wearing suits, which is why it may be
>difficulty to get a lot of cooperation.
>PS:  Given my druthers, I'd like to see:
>
>	* the bc benchmark above
>	* Dhrystone
>	* Whetstones
>	* A paging thrasher.
>	* A system call overhead checker (looped getpid()s maybe).
>	* A process thrasher.

The problem you're experiencing in getting what you call straight
answers lies in your methods.  As a working engineer who knows how to
wear a suit and tie (not common, I know, but some of us manage it) who
has logged many hours of booth duty, I sympathize with the booth person
who is hesitent to allow you to run any program they're not familiar with.
There are a lot of *ssholes at tradeshows who delight in trying (and
succeeding, occasionally) in crashing systems.  Personally, I never let
such a person near a machine, period, no matter how much he protests
the "innocence" of his program.

If you want cooperation, I suggest you work with some of the others on
this net (including myself) to define a reasonable benchmark before the
show, run it under realistic conditions, and let the manufacturers
publish the results jointly.  

I have seen more than one instance where one cooperative manufacturer
ran a "real-world" benchmark, only to be pilloried later by having it
compared to some bs benchmark put out by a less scrupulous competitor.

As to your suggested benchmark list:  Dhrystone has come under fire as
not being very reliable, due to compiler optimization problems.  LIkewise,
most Unix machines aren't floating point oriented.

Consider also that those of us who are chip manufacturers are primarily
interested in CPU benchmarks, not system benchmarks.  Unix is NOT the
entire world, yet.

TO BETTER BENCHMARKS!
George

mkhaw@teknowledge-vaxc.ARPA (Michael Khaw) (05/16/87)

In article <826@rtech.UUCP> daveb@rtech.UUCP (Dave Brower) writes:
...
>At last winter's Uniforum, I went around to a number of booths trying to
>run the infamous
>
>	/bin/time bc << !
>	2^4096
>	!
...

What exactly does this exercise (i.e., how does "dc" do "unlimited
precision arithmetic")?.  Why not run

	/bin/time dc << X
	2 4096 ^ p
	X

(possibly less overhead than the "bc" version but 4 more characters to type?)




Mike Khaw
-- 
internet:  mkhaw@teknowledge-vaxc.arpa
usenet:	   {hplabs|sun|ucbvax|decwrl|sri-unix}!mkhaw%teknowledge-vaxc.arpa
USnail:	   Teknowledge Inc, 1850 Embarcadero Rd, POB 10119, Palo Alto, CA 94303

larry@mips.UUCP (05/16/87)

In article <826@rtech.UUCP> daveb@rtech.UUCP (Dave Brower) writes:
>In article <2128@hoptoad.uucp> gnu@hoptoad.uucp (John Gilmore) writes:
>>
>>	Let's have the bake-off in the trade show at, say, next Winter
>>	Usenix.  Probably the actual setup and running of the benchmarks
>>	can be done a day or two before the show, so the results can be
>>	printed for distribution, and to give the losers time to think
>>	up (and print up) good explanations before we descend on them :-).
>>
>>	Let's also make the same setup of machines available for people
>>	to run their own benchmarks...
>
>At last winter's Uniforum, I went around to a number of booths trying to
>run the infamous
>
>	/bin/time bc << !
>	2^4096
>	!
>
>At a distressing number of places the sales creatures in the booth would
>say things like, "I don't believe we're interested in running any
>benchmarks today.  Let me show you vi."  Now there are some good reasons
>for this, but it sure sounded like there was something being hidden.
I think we should aim for the bake-off to be done through respective
engineering staffs. I really like the sales folks but this is really a 
technical endeavor.  Having the benchmarks at a show is a wonderful
idea.  It gives the engineering staffs a chance to explain, brag, boast 
or promise their results to lots of people.  By having each machine start 
with a 'clean' benchmark tape we can remove all doubt about whether everyone
used exactly the same sources and were run under the same conditions.

>Problem 1 is getting some benchmarks run.  Problem 2 is trying to get a
>straight answer on the price of the system.  What you really want is the
>bang/buck of different benchmarks on different boxes.  The results would
>be an embarrassing to many people wearing suits, which is why it may be
>difficulty to get a lot of cooperation.
I think the benchmark should be made available well in advance and be
made available to the 'world'.  There is too much comparison of machines
using different definitions of performance.  This activity would perform
a valuable service for the industry.

>PS:  Given my druthers, I'd like to see:
>
>	* the bc benchmark above
>	* Dhrystone
>	* Whetstones
>	* A paging thrasher.
>	* A system call overhead checker (looped getpid()s maybe).
>	* A process thrasher.
>
>I'd probably give up on disk speed and tty i/o.
The benchmarks should strive to illustrate how real world programs run
on the machines.  Dhrystone, as maligned as it is, is useful only if it is
one of a number of larger programs - we will need to carefully document
the program with a range of optimizations.  A page thrasher would be
wonderful BUT it is highly dependent on I/O system, configurations, page
size, MMU ... in fact so many things that I suspect it wouldn't be
useful.

I encourage the readers of this group to search for real programs that
range from modest to large size (maybe a couple of hundred Kbytes) that
can be run without elaborate setup.  They should be:
	Easily checked for correctness
	Not rely on system files (eg, grep of passwd)
	Not use any system commands, if you want to grep, then the
	  code should be part of the benchmark.
        Be examples of integer, single and double precision float, character
	  oriented, pointer oriented - in short a nice mix of different
	  application areas.
	Run long enough to be meaningful - none of this 0.1u times
	  that have more timing error than meaning.
My suggestions include:
	Common benchmarks	Dhrystone,Whetstone,Linpack,Stanford
	Real Programs		Doduc,Timberwolf,UCB Spice,YACC,C 
				  compiler (from Stallman), 

We should agree ahead of time how the results are to be reported.  I
suggest that we list individual results under specific conditions and have some
weighting method to give a simple result.  Maybe, the organizing group
could select a base machine and weight the values so that the base machine
is one.  The VAX 11/780 is often used for this - so why not use it.

It is very good that non-vendors get involved to make sure that the
fair representation is preserved.  Maybe the Uniforum organizing committee
can help identify the leaders.  Or maybe one of you wants take the
lead.  Prehaps it will be know as the X suite, where X is YOU.  

LETS DO IT...

jack@mcvax.cwi.nl (Jack Jansen) (05/17/87)

In article <396@gumby.UUCP> larry@gumby.UUCP (Larry Weber) writes:
>...  A page thrasher would be
>wonderful BUT it is highly dependent on I/O system, configurations, page
>size, MMU ... in fact so many things that I suspect it wouldn't be
>useful.
For me, it would be useful. If I'm looking for a machine to put 
30 first-year students on, I'm not interested in CPU performance at
all. I just want the system to run fast with 30 vi/cc/a.out users.

As an example: the 3B15 comes out of most benchmarks as a truly
lousy machine. However, something these benchmarks never show is
that the I/O system is very fast. The same more-or-less holds for
VAXen.

My favorite benchmark:
time ex /usr/dict/words <<FOO
10000,10060d
w /tmp/words
FOO
time diff /usr/dict/words /tmp/words >/dev/null

This gives you a reasonable idea of how fast vi starts (*very*
important), and how fast the I/O system (plus disks, etc) is.
-- 
	Jack Jansen, jack@cwi.nl (or jack@mcvax.uucp)
	The shell is my oyster.

spaf@gatech.edu (Gene Spafford) (05/17/87)

Let me make an actual offer:  if suitable benchmarks are available by
February 1988, we can make them available for a "bench-off" (as per
John Gilmore's article, et al) at the 1988 Annual ACM Computer Science
Conference in Atlanta (February 23-25).

I'm on the program committee for the CSC and would be happy to try to
arrange something, either in conjunction with the exhibit or as one of
the program events...but only if there is likely to be vendor
participation.  Contact me if you're interested, either by E-mail or at
(404) 894-3807.  If I don't get evidence of sufficient interest by
about July 15, it won't happen.

...and just wait til you hear some of the other things we've got lined
up for the program!  Mark those dates on your calendar.
-- 
Gene Spafford
Software Engineering Research Center (SERC), Georgia Tech, Atlanta GA 30332
CSNet:	Spaf @ GATech		ARPA:	Spaf@gatech.EDU
uucp:	...!{akgua,decvax,hplabs,ihnp4,linus,seismo,ulysses}!gatech!spaf

newton@cit-vax.Caltech.Edu (Mike Newton) (05/18/87)

	The list of benchmarks given seems to ignore several important 
cases.  One of these -- a large graphics program -- was pointed out by Tim Kay.

	Another thing that I would like to see bench marked is a large AI
program running through either a Lisp or Prolog compiler/interpreter.  I
personally use Prolog a lot -- and often take up running the naive reverse
benchmark on a version of C-Prolog that I have ported to many machines.
While licensing restrictions may prevent this at USENIX, the benchmark has
proved useful.  For example:
	[1] The C-Prolog interpreter was FASTER than the HP prolog
COMPILER on HP 9000/200 series machines!!  (The compiler did have many
nice features though...)
	[2] On many machines there exists a rough correspondance between
Dhrystone benchmarks and LIPS.  However, with increasing pipelines, this
may no longer be true.  Prolog uses LOTS of pointers.

	With micros capable of ``17 Million sustained MIPS'' it would 
probably be possible to write a Prolog compiler that got a fair fraction
of 1 million LIPS  (current systems on vaxen and suns get ~5000 LIPS
interpreted).  This claim of speed would be especailly true on machines
like the proposed AMD 29000 with 64 general use registers -- one of the
main things that slowed down the Prolog compiler I wrote the code generator
for on was the lack of registers on the IBM.  Our compiler ran at about
one million LIPS on a IBM 3090.  It would be a lot more economically
feasible for the average Prolog user to buy a microprocessor based
system than a 3090 :-) !

	Until that time a standard AI program could be:
[1] A mini-prolog interpreter running a parsing problem, or,
[2] xlisp 1.7? running a small program, or...??

- mike

newton@csvax.caltech.edu		818 356 6771 (afternoons, nights)
amdahl!cit-vax!newton			Caltech 256-80,  Pasadena CA  91125

ralph@ralmar.UUCP (Ralph Barker) (05/18/87)

Assuming succes in obtaining manufacturer participation in the proposed
"Bench-Off", what are the thoughts of using Neal Nelson's Business
Benchmark(tm) for this purpose?  

The series of 18 tests (and, multiple copies of each test) which the Nelson
benchmark runs appears to provide an excellent picture of not only the various
aspects of system performance ("raw" computing power, I/O, disk speed,
etc.), but also multi-user performance as well.  The Business Benchmark(tm)
does not cover the graphics or compiler performance (i.e. Prolog) issues
raised earlier in this discussion, but it does seem to cover most of the
other areas.  

Although I have personally used Nelson's Business Benchmark in connection
with a hardware review done for UNIX/World Magazine, and found the results
to be highly useful, others with more extensive benchmarking experience may
have other thoughts on the issue (or the validity of Nelson's approach).
(NOTE:  Due to the number of iterations within each test, Nelson's benchmark
typically takes several hours to run, thus a special subset might be more
appropriate within the context of a Usenix conference.)  

-- 
Ralph Barker, RALMAR Business Systems, 640 So Winchester Blvd, San Jose,CA 95128
uucp: ...{ucbvax,hplabs}!sun!idi---\!ralmar!ralph
      ...pyramid!amdahl!unixprt----/             Voice: (408) 248-8649

johnw@astroatc.UUCP (John F. Wardale) (05/18/87)

In article <396@gumby.UUCP> larry@gumby.UUCP (Larry Weber) writes:
>In article <826@rtech.UUCP> daveb@rtech.UUCP (Dave Brower) writes:

>>	* A paging thrasher.

>  A page thrasher would be
>wonderful BUT it is highly dependent on I/O system, configurations, page
>size, MMU ... in fact so many things that I suspect it wouldn't be
>useful.

Just the opposite!!  People use WHOLE systems, so a benchmark that
loads the WHOLE system IS a very usefull benchmark!!!

before you flame, note:  I agree that a loop like
	for ( i=start; i<end; i+=big ) {
		sum += array[i];
	}
would NOT be very useful, but 50-500 lines of code that includes
computaion and frequent, random accesses to an array of several
megabytes would be GOOD


			John W

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
Name:	John F. Wardale
UUCP:	... {seismo | harvard | ihnp4} !uwvax!astroatc!johnw
arpa:   astroatc!johnw@rsch.wisc.edu
snail:	5800 Cottage Gr. Rd. ;;; Madison WI 53716
audio:	608-221-9001 eXt 110

To err is human, to really foul up world news requires the net!

howard@cpocd2.UUCP (Howard A. Landman) (05/20/87)

Reply-To: howard@cpocd2.UUCP (Howard A. Landman)
Organization: Intel Corp. ASIC Services Organization, Chandler AZ
Lines: 63
Xref: mipos3 comp.arch:1331 comp.org.usenix:182

>In article <826@rtech.UUCP> daveb@rtech.UUCP (Dave Brower) writes:
>>At last winter's Uniforum, I went around to a number of booths trying to
>>run the infamous
>>	/bin/time bc << !
>>	2^4096
>>	!
>>At a distressing number of places the sales creatures in the booth would
>>say things like, "I don't believe we're interested in running any
>>benchmarks today.  Let me show you vi."  Now there are some good reasons
>>for this, but it sure sounded like there was something being hidden.

In article <4329@nsc.nsc.com> grenley@nsc.UUCP (George Grenley) writes:
>No, they just don't want to risk a system crash, or other malicious use.
>Most show-people aren't Unix gurus, and are hesitant to let a hacker play
>with the system on the show floor.
>
>There are a lot of *ssholes at tradeshows who delight in trying (and
>succeeding, occasionally) in crashing systems.  Personally, I never let
>such a person near a machine, period, no matter how much he protests
>the "innocence" of his program.

I'm glad SOMEONE has a way to tell a customer from an *sshole. ;-)

At Electro a few years back, I was allowed to type on a teensy UNIX box in
someone's booth.  I decided to see if there was anyone else logged on (it
was claimed to be a multi-user system, and there were other terminals scattered
about) and try "write" or "talk".  There was only one user logged in, however:
"root".  This kind of idiocy is probably why many "malicious" crashes occur.
Far too many sales people leave themselves logged in as root so they don't
ever run into permission problems.  I was almost tempted to do "rm -r /".

DEC uVAX-IIs come from the factory with an account "field" that has no
password and has root privileges.  I found several machines at DAC last year
which still had that account active, with no password, and random users were
logging in on them.  All it would have taken is "cat /etc/passwd", and ...

Also at last year's DAC, it was lots of fun hanging around the booth of a Very
Very Large Computer Company and watching their so-called RISC workstation
crash.  For example, their csh crashed when fed "set i = 1; @ i++; echo $i",
which should simply echo "2".  (It's the @ i++ that died.  To be fair, the
latest release of their OS fixes this bug.)  And they left one machine dead
with a panic message on its screen for over 10 minutes before one of the
sales people noticed me peering at it; his solution was to stand between me
and the screen!  No *ssholes were required, just bugs!

A computer needs to be *RELIABLE*.  You find out how reliable by, among other
methods, stress testing the system, trying to exercise *ALL* the features,
not just the ones in the canned demo.  If I can crash a system in five minutes
doing things that are normal, legal, and *NECESSARY* for everyday function,
then I know it can't possibly be reliable.  Does this make me malicious?

I am reminded of the account in Richard Feynman's biography of his exploits
as an amateur safecracker.  Once, he told a military officer that his security
procedures were lax, because it was possible to figure out the combination of
a safe by playing with it while the door was open.  He then recommended that
people be told to keep their safes closed except when necessary.  The
officer's solution to the problem was to tell everyone in the facility to
change their lock combination each time Feynman had been seen in their area!
-- 
	Howard A. Landman
	...!intelca!mipos3!cpocd2!howard
	howard%cpocd2%sc.intel.com@RELAY.CS.NET
	"You just ask them?"

davidsen@steinmetz.steinmetz.UUCP (William E. Davidsen Jr) (05/20/87)

In article <401@ralmar.UUCP| ralph@ralmar.UUCP (Ralph Barker) writes:
|
|Assuming succes in obtaining manufacturer participation in the proposed
|"Bench-Off", what are the thoughts of using Neal Nelson's Business
|Benchmark(tm) for this purpose?  

As the developer of a benchmark suite of my own I would love to cast
bricks at the Nelson suite. In truth it's a pretty good set of benchmarks,
and has been run on hundreds of configurations. I agree that it would be
a suitable measure of machines.

After doing benchmarks for about 15 years now, I will assure everyone
that the hard part is not getting reproducable results, but in (a)
deciding how these relate to the problem you want to solve, and (b) getting
people to believe that there is no "one number" which can be used to
characterize performance. If pressed I use the reciprocal of the total
real time to run the suite. It's as good as any other voodoo number...

-- 
bill davidsen			sixhub \	ARPA: wedu@ge-crd.arpa
      ihnp4!seismo!rochester!steinmetz ->  crdos1!davidsen
				chinet /
"Stupidity, like virtue, is its own reward"

mike@hcr.UUCP (Mike Tilson) (05/21/87)

There has been a lot of discussion about running some good benchmarks
at Usenix.  Many people have made suggestions of what should be run,
with no consensus emerging, except a general feeling that benchmarks
at the vendor exhibit would be a good thing, and that a grand comparison
involving all vendors should be conducted.

What people are really trying to do is to generate a single "performance"
number, and to circumvent the "less-than-trustworthy" guys in suits who
prevent ready access to machines at a show.  However, I think a good
benchmark requires a lot of thought, and it can only be run in reference
to an intended application.  That is why some commercial benchmark
products take hours to run -- they gather lots of data that a customer
can interpret depending on the intended application.  (Maybe you want
I/O performance, or MIPs, or FLOPs, or paging rate, etc.)  This kind of
careful analysis can't be done properly on the floor of a trade show.

I think a "Usenix Benchmark Contest" would tend to perpetuate the
misuse of benchmarks, and for that reason I'd suggest it isn't a good idea.
Benchmarks are readily available for use by customers who are serious
about comparing performance.  A trade show benchmark war would encourage
increased marketing hype with very little hard technical information.
I'd like to see less of this at Usenix, rather than more.

/Michael Tilson, HCR Corporation, {utzoo,ihnp4,...}!hcr!mike

ed@plx.UUCP (Ed Chaban) (05/21/87)

In article <4329@nsc.nsc.com>, grenley@nsc.nsc.com (George Grenley) writes:
> A while back I invited other chip manufacturers to join me in a CPU
> horse race.  I'm happy to see this request is generating interest.
> However, I haven't heard much from Mot or Intel.  Are you guys
> listening?  Iknow you're out there...
> 
> >At a distressing number of places the sales creatures in the booth would
> >say things like, "I don't believe we're interested in running any
> >benchmarks today.  Let me show you vi."  Now there are some good reasons
> >for this, but it sure sounded like there was something being hidden.
 
 It seems that Neal Nelson's Benchmark has been getting a *LOT* of
 ink lately.  Most of my observations about RISC and CISC systems
 come from my work with this particular chunk of code.

 Those who have seen it will attest to the almost Fortranlike 
 abuse of GOTOs, but I can't imagine why this should give a CISC
 machine an advantage.

 -ed-

csg@pyramid.UUCP (05/21/87)

I've always used a vendor's willingness to let me hack on their trade-show
machines as an indicator of how solid that machine was. Some vendors (notably
Symmetric, and recently DEC) actually grab people off the floor and encourage
them to come play with their machine. I also recall bringing up Interleaf on
an Apollo DOMAIN/IX node at a show, and subsquently found myself demoing the
product since I knew more about Interleaf than the sales critters did.... 
(Of course, I explained to the observers that Interleaf was also available on
Sun and MicroVAX workstations.... :-))

I also know of at least one person who checks out all the machines and tries
to break security -- an impromptu version of Gould's Secure UNIX challange. 
The variations from vendor to vendor on the show floor are astonishing.

>If I can crash a system in five minutes doing things that are normal, legal,
>and *NECESSARY* for everyday function, then I know it can't possibly be
>reliable.

At the above Apollo demo, I innocently managed to crash the node. That did not
lower my opinion of the system, though; I would rather they encouraged me to
hack on a system that they openly admitted was a pre-release product, instead
of hiding behind a cloak of secrecy and insisting they had a finished product.
(And the bugs I found were fixed before FCS.)

>I am reminded of the account in Richard Feynman's biography of his exploits
>as an amateur safecracker....

This is not at all different from trade-show booths that "blackball" certain
people, knowing they are competitors or crackers. Yes, they really do this.

<csg>

mash@mips.UUCP (John Mashey) (05/22/87)

In article <7387@boring.mcvax.cwi.nl> jack@boring.UUCP (Jack Jansen) writes:
>In article <396@gumby.UUCP> larry@gumby.UUCP (Larry Weber) writes:
>>...  A page thrasher would be
>>wonderful BUT it is highly dependent on ....

>For me, it would be useful. If I'm looking for a machine to put 
>30 first-year students on, I'm not interested in CPU performance at
>all. I just want the system to run fast with 30 vi/cc/a.out users....

>My favorite benchmark:
>time ex /usr/dict/words <<FOO
>10000,10060d
>w /tmp/words
>FOO
>time diff /usr/dict/words /tmp/words >/dev/null

As larry says, real page-thrashers are highly dependent on a lot of attributes.
That doesn't mean they're bad tests, merely that they're extremely hard
to do in a controlled way.  In particular, you often see radically different
results according to buffer cache sizes, for example.

Also, on this test, CPU (user + sys) is about 70-85% of real time, i.e.,
I think you would care about CPU performance, since it's as important
as the disks in the timing.  What sort of numbers were you getting on the 3B15s?
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	{decvax,ucbvax,ihnp4}!decwrl!mips!mash, DDD:  	408-720-1700, x253
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086

roger@celtics.UUCP (05/26/87)

In article <691@cpocd2.UUCP> howard@cpocd2.UUCP (Howard A. Landman) writes:
>Far too many sales people leave themselves logged in as root so they don't
>ever run into permission problems.  I was almost tempted to do "rm -r /".
>
Why is this unreasonable?  It's THEIR demo... if they don't bolt down
a laser printer they're exhibiting and turn their backs, do you have the
right to steal it because it's "unprotected"?  People running a booth at a
trade show are often (a) technically out of their league, and (b) there
to perform sales-oriented activities, which is their skill.  We often 
cannot afford to have heavy tech types in booths; in fact, it's often
counterproductive.  (I think of the technical marketing person who stood
in our booth a few years ago, and when asked: "Do you have NFS?"  "Do
you have LISP?"  "Do you have MACSYMA?"  "Do you have a version of TeX?"
"Do you run GNU Emacs?"... responded, "NO!  These are our products, just 
look at the list."  Made a lot of friends, she did... and, by the way,
all the requested stuff was either about to be released or being worked
on at customer sites...)

I can understand the temptation to exercise known bugs.  But there's no
reason to interfere with people's livelihood when your test is either
destructive or time-wasting.  If you want to test these things, either make
arrangements to do them at a local office or during slow booth-time, or
check with the booth staff and let them know the possible consequences of
your acts.  The public does need to be protected from genuinely bad
products, but the sort of "I'm gonna trash you - you deserve it because
you haven't fixed an obscure bug or you left your system wide open to me"
games often played by hackers who are in an exhibition hall to exhibit
themselves and not to see and evaluate the products legitimately are
just indefensible.  Those hackers generally show themselves off, all
right, in the most appropriate light.

>And they left one machine dead
>with a panic message on its screen for over 10 minutes before one of the
>sales people noticed me peering at it; his solution was to stand between me
>and the screen!  No *ssholes were required, just bugs!
>
Odds are the salesperson COULDN'T reboot the system.  Given a choice
between my reps knowing how to boot my system and knowing how to prospect,
I'll take the latter any day.  You're such a big shot as to take pleasure
in bringing their demo system down, bring it up again... if I owned a
grocery store and you knocked down a display, I'd expect you to at least 
offer to pick it up.

>A computer needs to be *RELIABLE*.  You find out how reliable by, among other
>methods, stress testing the system, trying to exercise *ALL* the features,
>not just the ones in the canned demo.  If I can crash a system in five minutes
>doing things that are normal, legal, and *NECESSARY* for everyday function,
>then I know it can't possibly be reliable.  Does this make me malicious?
>
If you're doing it in a public exhibition, yes.  The point of security is 
to protect systems and data THAT IS REASONABLY AT RISK.  At a show, the
risk is not reasonable; it's imposed by crybabies who have nothing better
to do.  Systems at a trade show are physically secure, in that their owners
control physical access.  If you are granted access, you're a guest, and
should behave like one.  By all means, exercise the systems (within the
time and resource limits given you by the vendor), but if you feel the
urge to destroy, go out and punch a Bo-Bo doll.

-- 
 ///==\\   (No disclaimer - nobody's listening anyway.)
///        Roger B.A. Klorese, CELERITY (Northeast Area)
\\\        40 Speen St., Framingham, MA 01701  +1 617 872-1552
 \\\==//   celtics!roger@seismo.CSS.GOV - seismo!celtics!roger

bzs@bu-cs.UUCP (05/28/87)

Although I have nothing *against* a benchmark suite I still claim that
this is becoming less of an issue when compared against the richness
of the environment. Going full throttle for the flames, I wouldn't
trade my (mere :-) 2MIP Sun3/160 on (beside) my desk for a 10 MIP,
vanilla System V dumb terminal, no network, no job-control
environment, you'd have to pry the Sun out of my dead hands (tho I'd
take a 1 MIP SYSV over a 100MIP VMS system, it's all relative.)

I suspect I'm not alone in my opinion (not the particular systems, but
the idea that the quality of the software is beginning to outweigh
mere speed improvement.)

Think of it this way, I have an IBM3090/200 with two vector processors
(that's around 40MIPs and Cray-1S on floating point) at my disposal,
trust me, it's faster 'n hell, it's astoundingly fast, but the
software environment is so primitive I rarely log into it (and I
certainly know my way around an IBM system, it's not some fundamental
problem.) Oh, some number crunchers use it, and good for them, but boy
is that crowd getting relatively small (there are plenty of number
crunchers around here who would rather wait for their SUN3 or similar
box as far as I can tell.)

How we gonna measure that? I honestly think beyond some lower bounds
software is getting very important (and besides, they go hand in hand
to a great extent, you don't see too many window-oriented systems on
.5 MIP boxes, then again, the Mac comes close and I'd be happy argue
its virtues for getting one thru the day, we have those also, blows
the doors off the 3090 on people-performance for many daily tasks.)

And what about things like upgradeability (like my Encore that I can
jack up to around 40 (parallel) MIPs by just adding CPU boards)? I
know one major vendor who's only idea of an "upgrade" is you throw the
'old' $400K box away and buy a new $800K one...swell. Or a coherent
plan to spread the MIPs into the user's offices?

I still think there's a certain air of unreality to this whole "my
iron is bigger than your iron" thing. Oh, it's important, it's just no
longer a sufficient claim, certainly not enough to sell me on a box.

You can say "well, then assuming the two software environments are
equal..."  But they rarely are, often they're disasterously different
between two boxes.

I agree it's a much harder measure, but is that what we're after? The
cheap shots?

	-Barry Shein, Boston University

ps@celerity.UUCP (Pat Shanahan) (05/28/87)

In article <2128@hoptoad.uucp> gnu@hoptoad.uucp (John Gilmore) writes:
>In article <4294@nsc.nsc.com>, grenley@nsc.nsc.com (George Grenley) writes:
>> So, here's the deal.  I invite Mot, Intel, and other interested parties
>> to work with me in defining some sort of realistic benchmark, which we'll
>> run (in public).  I expect to have system level hardware late this year,
>> so if we get started now, we'll have very interesting Xmas presents...
>> May the best CPU win!  

Defining a realistic benchmark set seems a good objective. I do think that
it is important to realize that performance is a multi-dimensional quantity.
The only situation in which it is realistic to say "machine X is twice as
fast as machine Y" is if X and Y have identical architectures. Any benchmark
that produces a single number is likely to be misleading, unless it is
understood to be very specialized.

>...
>I'd like to join in the hoopla, rahrah, etc that has followed this
>suggestion, and make a further one:
>
>	Let's have the bake-off in the trade show at, say, next Winter
>	Usenix.  Probably the actual setup and running of the benchmarks
>	can be done a day or two before the show, so the results can be
>	printed for distribution, and to give the losers time to think
>	up (and print up) good explanations before we descend on them :-).
>...
>-- 
>Copyright 1987 John Gilmore; you may redistribute only if your recipients may.
>(This is an effort to bend Stargate to work with Usenet, not against it.)
>{sun,ptsfa,lll-crg,ihnp4,ucbvax}!hoptoad!gnu	       gnu@ingres.berkeley.edu


I doubt whether I could get anything unusual run on a system that is going
to a trade show during the last few days before the show. The system would
normally be being loaded with known demo programs and crated up for
transport. It is also a very busy time for the people involved in the show.

I think it would be better to define the benchmark in advance and encourage
manufacturers to run it on a variety of configurations, rather than just on
what happens to be at one show.

I suggest using existing benchmarks where possible. For example, measure
Fortran loop performance with Livermore loops, rather than writing a new
benchmark.
-- 
	ps
	(Pat Shanahan)
	uucp : {decvax!ucbvax || ihnp4 || philabs}!sdcsvax!celerity!ps
	arpa : sdcsvax!celerity!ps@nosc

rcopm@yabbie.oz (Paul Menon) (05/30/87)

> In-reply-to: mike@hcr.UUCP's message of 20 May 87 22:59:51 GMT
> 
> Although I have nothing *against* a benchmark suite I still claim that
> this is becoming less of an issue when compared against the richness
> of the environment. Going full throttle for the flames, I wouldn't
> trade my (mere :-) 2MIP Sun3/160 on (beside) my desk for a 10 MIP,
> vanilla System V dumb terminal, no network, no job-control
> environment, you'd have to pry the Sun out of my dead hands (tho I'd
> take a 1 MIP SYSV over a 100MIP VMS system, it's all relative.)

  I totally agree!  It all boils down do how productive these machines allow
you to be, AND STILL LET YOU SMILE AT THE END OF THE DAY!  That's my definition
of a friendly user interface.  After all, it is *you*, the programmer or
enduser, that is the biggest factor affecting progress.  It doesn't matter 
how hefty a machine you have in front of you, if it don't make you smile, 
it don't make you work.  If you don't work - it don't work.  Simple.
It seems the trend with computers that the more powerful they are, the more
moronic they get.  What good is brawn without brains?  Sure they run heaps
faster, but who develops the programs for them?  Surely we can't count on
all but extinct lanuages (FORTRAN, COBOL) to hang around forever running on 
state of the art hardware?  Then again, I guess it depends on mentality, eh?
  Apologies if people deem my response to be in the wrong newsgroup, but
some of us would be in the dark if it wasn't for bignoses catching the light.


Paul Menon.

    Dept of Communication & Electronic Engineering,
    Royal Melbourne Institute of Technology,
    124 Latrobe St, Melbourne, 3000, Australia
 
ACSnet: rcopm@yabbie             UUCP: ...!seismo!munnari!yabbie.rmit.oz!rcopm
CSNET:  rcopm@yabbie.rmit.oz     ARPA: rcopm%yabbie.rmit.oz@seismo
BITNET: rcopm%yabbie.rmit.oz@CSNET-RELAY
PHONE:  +61 3 660 2619.