[net.arch] Speed is the one true performance metric

ccplumb@watnot.UUCP (Colin Plumb) (11/02/86)

In article <1903@mmintl.UUCP> franka@mmintl.UUCP (Frank Adams) writes:
>>> Comments? What sorts of metrics are important to the people who read
>>> this newsgroup? What kinds of constraints?  How do you buy machines?
>>> If you buy CPU chips, how do you decide what to pick?
>>
>>The metrics I'm interested in measure speed.  (Basically, I'm hooked
>>on fast machines.)  Other constraints are less interesting because:
>>(1) I will buy the fastest machine I can afford, and (2) in terms of
>>architecture, speed is the bottom line -- all else is just
>>mitigating circumstances.
>
>I must disagree.  Reliability is at least as important as speed.

I must disagree.  The the idea is to get as much effective speed out of the
machine as possible.  A machine that is down 50% of the time delivers 1/2
of its operational speed to the user as throughput. Turnaround time (which is
what most people are interested in) will suffer more, under most circumstances.

Still, I'd prefer exclusive use of a big VAX that's down from midnight to noon
to exclusive use of a smaller one that's almost never down.  My only interest
is how fast the machine gets my work done.

           -Colin Plumb (ccplumb@watnot.UUCP)

Will someone tell me why everybod puts disclaimers down here?

eugene@nike.uucp (Eugene Miya N.) (11/03/86)

>> Summary: Words about relability, speed, thruput, etc.

The only problem with speed as the one true performance metric is
Harlan Mills' "Law."  He says that he will take any program and make it
five times faster or use five times less storage (note: but not both).
The problem with speed is the frequent interchangeability with storage.
The problem comes with problems which have massive storage AND speed
requirements.  Yes, getting your job done fast is important, granted,
but people with foresight understand the costs and tradeoffs associated
with "speed at all costs."

We put disclaimers on the bottoms of some our postings 1) for humor, like
line eater quotes at the heads of some messages, 2) because some of us
have been burned [what goes around, comes around].  This latter is quite
serious.  I know enough now to NOT put postings of DEC, Cray, IBM, etc.
internal material even though I've not signed any non-disclosure
agreements.  As Gary Perlman has pointed out, the Net is a great place
to do industrial information gathering (espionage).

--eugene miya
  NASA Ames Research Center

cdshaw@alberta.UUCP (11/03/86)

In article <12142@watnot.UUCP> ccplumb@watnot.UUCP (Colin Plumb) writes:
>In article <1903@mmintl.UUCP> franka@mmintl.UUCP (Frank Adams) writes:
>>>The metrics I'm interested in measure speed.  (Basically, I'm hooked
>>>on fast machines.)  Other constraints are less interesting because:
>>
>>I must disagree.  Reliability is at least as important as speed.
>
>I must disagree.  The the idea is to get as much effective speed out of the
>machine as possible.  A machine that is down 50% of the time delivers 1/2
>of its operational speed to the user as throughput. Turnaround time (which is
>what most people are interested in) will suffer more, under most circumstances.
..whatever that paragraph means.

There are a number of things wrong with this attitude. It is completely bogus 
to assume that "50% downtime" means "up from noon to midnite only". 

A lack of reliability implies that you cannot predict your downtime. 50% 
downtime could easily mean that your "big VAX", which takes 20 minutes
to boot, is ready to accept logins for exactly 1 minute before crashing 
and remaining down for 21 minutes. Machine crashes are a stochastic process, 
and this is no good at all if your probability of failure is high.

Then again, this whole argument is silly. Reliability is in some sense 
orthogonal to the other performance statistics. If your machine crashes
all the time, speed simply doesn't enter into the calculation because
the break it puts in the users' work habits is unacceptable. The probability 
of losing valuable work, computed results, etc. is also too high.
I suppose one could trade off reliability for speed, but most manufacturers
realize that unreliable machines are extremely costly in service and annoyance 
time, and therefore manufacturers try to maximize the reliability.
Unreliable machines are hard to sell.

>           -Colin Plumb (ccplumb@watnot.UUCP)

Chris Shaw    cdshaw@alberta
University of Alberta
CatchPhrase: Bogus as HELL !

singer@spar.SPAR.SLB.COM (David Singer) (11/04/86)

In article <12142@watnot.UUCP> ccplumb@watnot.UUCP (Colin Plumb) writes:
>In article <1903@mmintl.UUCP> franka@mmintl.UUCP (Frank Adams) writes:
>>>The metrics I'm interested in measure speed.  (Basically, I'm hooked
>>>on fast machines.)  Other constraints are less interesting because:
>>>(1) I will buy the fastest machine I can afford, and (2) in terms of
>>>architecture, speed is the bottom line -- all else is just
>>>mitigating circumstances.
>>
>>I must disagree.  Reliability is at least as important as speed.
>
>I must disagree.  The the idea is to get as much effective speed out of the
>machine as possible.  A machine that is down 50% of the time delivers 1/2
>of its operational speed to the user as throughput. Turnaround time (which is
>what most people are interested in) will suffer more, under most circumstances.

I really fail to see how an immediate or rapid answer you can't trust
is of any use at all.

jeffr@sri-spam.istc.sri.com (Jeff Rininger) (11/04/86)

In article <798@spar.SPAR.SLB.COM> singer@spar.UUCP (David Singer) writes:
>I really fail to see how an immediate or rapid answer you can't trust
>is of any use at all.


	In some domains, a rapid, if somewhat uncertain, answer is
	much better than no answer at all.  One example is the
	defensive software for the DARPA "pilot's associate" research.
        I may be able to supply a reference if anyone is interested.

greg@utcsri.UUCP (Gregory Smith) (11/04/86)

In article <798@spar.SPAR.SLB.COM> singer@spar.UUCP (David Singer) writes:
>In article <12142@watnot.UUCP> ccplumb@watnot.UUCP (Colin Plumb) writes:
>>In article <1903@mmintl.UUCP> franka@mmintl.UUCP (Frank Adams) writes:
>>>I must disagree.  Reliability is at least as important as speed.
>>
>>I must disagree.  The the idea is to get as much effective speed out of the
>>machine as possible.  A machine that is down 50% of the time delivers 1/2
>>of its operational speed to the user as throughput. Turnaround time (which is
>>what most people are interested in) will suffer more, under most circumstances.
>
>I really fail to see how an immediate or rapid answer you can't trust
>is of any use at all.

This is silly. Broken computers don't give wrong answers. They crash,
or they log soft errors, or they act flaky. It is almost impossible to
imagine a hardware fault that would have no visible effect other than
to make the 'value' (whatever it may be) of the output wrong.

Of course, floating point hardware is a little different, since it
is used only for numerical calculations which are part of the problem
( as opposed to the CPU alu which is also used for indexing, etc.)
You can always arrange to run an FPU diagnostic every 5 mins if this
is an issue.

There are few things more useless than a computer that executes
instructions correctly 999 times out of 1000....

-- 
----------------------------------------------------------------------
Greg Smith     University of Toronto      UUCP: ..utzoo!utcsri!greg
Have vAX, will hack...

ccplumb@watnot.UUCP (Colin Plumb) (11/04/86)

(in response to my posting)...
>
>I really fail to see how an immediate or rapid answer you can't trust
>is of any use at all.

Hm... you have a point there.  I was thinking of reliability as an up/down
distinction.  You're right that a wrong answer is even worse than no answer.
Still, incorrect answers are rarely hardware faults... They're more usually
software problems.

        -Colin Plumb (ccplumb@watnot.UUCP)

rentsch@unc.UUCP (Tim Rentsch) (11/07/86)

In article <3576@utcsri.UUCP> greg@utcsri.UUCP (Gregory Smith) writes:

> There are few things more useless than a computer that executes
> instructions correctly 999 times out of 1000....

This prompted a memory which I could not resist sharing with
netland.  I know net.arch is not the appropriate place, so followons
to /dev/null, and no flames, ok?

(The following is not original, but I do not remember the source.)

	"The code is 99% debugged...."

	one in every hundred statements is WRONG!

cheers,

txr

paul@unisoft.UUCP (Paul Campbell) (11/09/86)

In article <3576@utcsri.UUCP> greg@utcsri.UUCP (Gregory Smith) writes:
>
>This is silly. Broken computers don't give wrong answers. They crash,
>or they log soft errors, or they act flaky. It is almost impossible to
>imagine a hardware fault that would have no visible effect other than
>to make the 'value' (whatever it may be) of the output wrong.
>

	I guess you asked for it ..... here is my war story. At a previous
job our Burroughs 6700 (remember those? they took up a whole room, about as
fast as a 780) had a problem ... one of the people using the stats packages
checked his work by hand (I guess he didn't trust the machine) turns out that
sometimes it was wrong (but always consistantly wrong for the same input data).
It took months to track it down to the MOD instruction that gave wrong answers
only for some input values. Of course the engineer didn't beleive us until we
could give him really hard proof. He said something like


	"This is silly. Broken computers don't give wrong answers"    (:-)




		Paul Campbell
		ucbvax!unisoft!paul