[comp.arch] Killer Micros and vectorized code

hamrick@convex1.convex.com (Ed Hamrick) (03/15/90)

Mr. Brooks,

I've greatly enjoyed the articles you've written regarding the performance
of "Killer Micros" relative to larger, more costly machines.  Even though
there are exceptions to any general rule, I agree with much of what you've
been saying, but must disagree with the overall conclusion.  The key
generalizations that I agree with are:

(1) The price/performance ratio of a wide range of applications is better
    on smaller machines than larger machines.  This applies primarily
    to applications dominated by scalar code that aren't amenable to
    vectorization or massive parallelism.  This is particularly applicable
    if applications have a locality of reference that can make effective
    use of high-speed cache.

(2) The price per megabyte of disk storage is better for lower-speed and
    lower-density disk drives.

(3) The price per megabyte of memory is better when memory is slower and
    interleaved less.

Many people will argue with all of these generalizations by citing specific
counter-examples, but I believe reasonable people would agree that these
generalizations have some merit.  I also believe that these generalizations
have been valid only in the past five years, and that there have been times
in the past that the opposite has been true.

The conclusion you've reached, and that I must admit I have been tempted to
reach myself over the past few years, is that "No one will survive the
attack of the killer micros!".  As a number of people have pointed out, there
are many factors counterbalancing the price/performance advantage of
smaller systems.  One of the key counter-arguments that a number of people have
made is that machines ought to be judged on price per productivity improvement.
A faster machine gives people higher productivity because of less time
wasted waiting for jobs, and more design cycles that can be performed
in a given time.  Anything that decreases time-to-market or improves
product quality is worth intrinsically more.  This is one of the traditional
justifications for supercomputers.  You noted that a Cray CPU-hour costs
significantly more than people earn per hour, but this doesn't take
into account that companies can significantly improve their time-to-market
and product quality with faster machines, albeit machines that cost more
per unit of useful work.  This may not matter in some application areas
such as computational physics, but a company like Boeing or McDonnell
Douglas can lose billions of dollars if they are six months late with
getting new products designed.  There are also significant cost multipliers
involved in producing a better product - for instance a small increase
in airplane fuel efficiency can result in significantly larger market
share than your competition.  Some people have noted that some companies
are willing to pay almost anything to get the fastest computers, and this
is one of the underlying economic reasons for this willingness.

Big companies and government labs tend to use this rationale to justify
procuring computers based on single-job performance.  However, when you
visit these facilities, generally large Cray sites, the machines are generally
used as large timesharing facilities.  People are finding that machines that
were procured to run large jobs in hours are instead running small jobs in
days.  Further inflaming the problem of having 500 users on a supercomputer is
the tendency of these companies and labs to make the use of these machines
"free".  (Just in passing I'd like to note that the direct result of making
CPU time on Crays "free" is that 90% of the CPU cycles get used by 10% of the
users, which can hurt time-to-market and reduce productivity.  Charging for
CPU time causes a vicious feedback loop where fewer users cause higher costs
which in turn cause fewer users, etc.  The Share Scheduler fixes much of this.)

I've felt for some time that there are fundamental reasons that large
computer system makers are still surviving, and in the case of CONVEX, growing
and prospering.  Even though the argument is made that faster machines improve
time-to-market, they are almost always used as timesharing systems, often
giving no better job turn-around time than workstations.  Some companies are
surviving because of the immense base of existing applications.  Some companies
prosper because of good customer service, some by finding vertical market
segments to dominate.  Every company has unique, non-architectural ways of
marketing products that may not have the best price/performance ratio.

However, I believe that there are several key strategic reasons that larger,
centralized/departmentalized computer systems will in the long run prevail
over the killer micros:

(1) A single computer user usually consumes CPU cycles irregularly.  A user
    often will have short periods of intense computer activity, followed by
    long periods of low utilization.  I've analyzed almost a years worth of
    data from a typical engineering computer system (more than 500,000 data
    samples), and have seen that the number of jobs an individual (or group
    of individuals) runs at a time approximates a Poisson distribution.
    This matches what one would expect intuitively - that even heavily
    loaded systems have some percentage of their CPU cycles that go to the
    null process.  If J is the average number of jobs a person runs at any
    given time, then EXP(-J) is the percentage of wasted CPU cycles on a
    single-user system.  For instance, if someone is performing a task where
    they are running 4 jobs at a time on average (sometimes 6, sometimes 2),
    then the workstation they are using will have EXP(-4) or 2% wasted cycles.
    Similarly, if there is an average of 1 job at a time, there will be 36%
    wasted cycles, and 0.25 jobs results in 78% wasted cycles.  I would
    maintain that the average number of runnable jobs on workstations is less
    than 0.1, resulting in greater than 90% wasted CPU cycles.  This statistical
    character of workloads provides strong economic incentives to people to
    pool their resources and purchase departmentalized/centralized computer
    resources.  A group of 20 people using a single machine will result in
    14% idle CPU time compared with 90% idle CPU time if they use 20
    workstations (assuming each user runs an average of 0.1 jobs at a time).
    This gives a factor of 10 advantage in usable price/performance to the
    centralized/departmentalized machine.

(2) The argument for the centralization/departmentalization of disk resources
    closely parallels the argument for CPU resources.  If each user is given
    dedicated disks on workstations, then significant amounts of total disk
    space and total disk bandwidth goes to waste.  There is significant
    economic incentive to centralizing/departmentalizing disk storage for
    this reason, as well as other reasons relating to data security and
    data archiving.

(3) I would maintain that the amount of memory needed by a job is roughly
    proportional to the amount of CPU time needed to run the job.  This is
    a very imprecise correlation, but is true to some degree across a wide
    range of problems.  I would also maintain that if an N-Megabyte program
    takes M seconds to run in N megabytes of physical memory, then it will
    take approximately 6*M seconds to run in N/2 megabytes of physical memory.
    This factor of 6 performance degradation holds true for a wide range of
    large memory application programs.  This gives a strong economic incentive
    to users to centralize/departmentalize their memory, and run large memory
    jobs in series.  For instance, assume two workstation users each have
    64 MBytes of memory and need to run 128 MByte jobs.  Assume these jobs
    take 12 hours apiece when run in 64 MBytes.  If the two workstation users
    put all 128 MBytes of memory on one workstation, and junked the second
    workstation, they could get both jobs done in 4 hours (2 hours per job)
    by running the two jobs in series on the large-memory workstation.  There
    is an additional economic incentive to centralizing memory that comes from
    the statistical nature of memory utilization by a group of users.  Using
    similar arguments to (1) above, you can easily show that a computing
    architecture with centralized/departmentalized high-speed memory is much
    more cost effective than distributing memory across multiple workstations.

Obviously, there is much more involved in selecting the optimal computing
architecture for a given workload.  Just as I disagree with you that simple
measures of price/performance will predict the success or demise of a product,
many people would probably maintain that my arguments about centralizing
compute/disk/memory resources are also simplistic.  There are many counter
arguments favoring distributed computing solutions, and many more arguments
favoring centralization.  The main point I wanted to make in this note is
that simple price/performance measures are poor predictors of the long-term
viability of a company's products.  I'm sure that most readers of this
newsgroup could post a long list of companies that had/have excellent
price/performance but that are/will be out of business.

Regards,
Ed Hamrick  (hamrick@convex.com)
Area Systems Engineer
CONVEX Computer Corporation

wsd@cs.brown.edu (Wm. Scott `Spot' Draves) (03/15/90)

In article <100598@convex.convex.com> hamrick@convex1.convex.com (Ed Hamrick) writes:

   ...

   However, I believe that there are several key strategic reasons that larger,
   centralized/departmentalized computer systems will in the long run prevail
   over the killer micros:

   (1) A single computer user usually consumes CPU cycles irregularly.  A user
       often will have short periods of intense computer activity, followed by
       long periods of low utilization.  

       [ personal workstations' CPU's are underutilized 
         compared to centralized CPUs ]

This is true today, but I think it will change.  Some (many?)
applications can be distributed over a network of workstations.  With
the right software this can be nearly transparent to both the person
getting the work done, and to those whose workstation's cycles are
being "borrowed".

   ...

   Regards,
   Ed Hamrick  (hamrick@convex.com)
   Area Systems Engineer
   CONVEX Computer Corporation


Scott Draves			Space... The Final Frontier
wsd@cs.brown.edu
uunet!brunix!wsd
Box 2555 Brown U Prov RI 02912

brooks@maddog.llnl.gov (Eugene Brooks) (03/17/90)

In article <100598@convex.convex.com> hamrick@convex1.convex.com (Ed Hamrick)
writes a long article discussing the problems of memory and disk resource
distribution and low processor utilization in "single user systems."

I hope that no one took my articles as an inference that I think that
single user systems are a good thing, I agree with Ed's position completely.
I have utilization data for a large population of single user workstations
at LLNL, on the order of 300 work stations, and the data is so compelling
with regard to the "utilization argument" that I have been requested not
to distribute it.  Companies with a large population of work stations
should use the "rup" command to collect similar data, first sitting down
before looking at the results.  You will be completely shocked to see how
low the processor utilization of single user work stations are.  The
small size of the utilization factor completely negates the cost performance
edge of the Killer Micro inside it.  This is not, however, an argument against
the Killer Micros themselves.  It is an argument against single user workstations
that spend almost ALL their time in the kernel idle loop, or the X screen lock
display program as is often the case.

Computers are best utilized as shared resources, your Killer Micros should
be many to a box and sitting in the computer room where the fan noise does
not drive you nuts.  This is where I keep MY Killer Micros.

The sentence I have often used, "No one will survive the attack of the
Killer Micros," is not to be misinterpreted as "No one will survive the
attach of the Killer Single User WorkStations."  The single user workstations
are indeed Killers, but they are essentially wasted computer resources.
Corporate America will eventually catch on to this and switch to
X display stations and efficiently shared computer resources.

To use the "efficient utilization argument" to support the notion that
low volume custom processor architectures might possibly survive the
attach of the Killer Micros is pretty foolish, however.  Ed, would you
care to run the network simulator and Monte Carlo code I posted results
of on the Convex C210, and post the results to this group?  I won't
ruin the surprise by telling you how it is going to come out...

Perhaps we can get the fellows at Alliant to do the same with their new
28 processor Killer Micro powered machine.  That i860 is definitely a
Killer Micro. After we compare single CPU performances, perhaps we could
then run the MIMD parallel versions on the Convex C240 and the Alliant 28
processor Killer Micro powered box.  Yes, there are MIMD parallel versions
of both codes which could probably be made to run on both machines.


         NO ONE WILL SURVIVE THE ATTACK OF THE KILLER MICROS!


brooks@maddog.llnl.gov, brooks@maddog.uucp

jkrueger@dgis.dtic.dla.mil (Jon) (03/19/90)

brooks@maddog.llnl.gov (Eugene Brooks) writes:

>You will be completely shocked to see how
>low the processor utilization of single user work stations are.  The
>small size of the utilization factor completely negates the cost performance
>edge of the Killer Micro inside it.

This is quite correct, and therefore we should stop using personal
automobiles, too.  Instead we should use taxis, car pools, and
other forms of better sharing the same basic hardware.  This will
increase the <10% utilization of most cars.

OK, ob. smiley.  Yes, we like having our own cars, and we like having
our own local source of computation, and we're going to continue
to choose this whenever we have a choice.  It's a fact of life.

The point that you can "switch between processors in milliseconds", is
quite correct and equally compelling when applied on the other side of
the arguement.  When the individual grants use of his processor to
others, more generally when individuals share their processors with
each other, they make use of millisecond-flexible sharing, but retain
control of local resources.

No question about it, you waste a lot of resources by keeping them
isolated and idle.  The point here is that this isn't a technology
decision, it's a policy decision.  The ability for the individual
to have 100% of his local computational power available to him
on demand is a policy widely favored by individuals.  The ability
to get the most computation per dollar is a policy widely favored
by central planners.

No one argues that these policies are in any way compatible.  They both
exist, and each drives a different kind of purchase decision.  Neither
has anything to do with how you build technology.  Both have much to do
with you how you buy it, and rather little to do with computer
architecture, at this late date.

-- Jon
-- 
Jonathan Krueger    jkrueger@dtic.dla.mil   uunet!dgis!jkrueger
The Philip Morris Companies, Inc: without question the strongest
and best argument for an anti-flag-waving amendment.

peter@ficc.uu.net (Peter da Silva) (03/20/90)

Even if your MIPS in the workstation are wasted, it might still be worthwhile
to put them there. It all depends on how much the MIPS cost. Certainly if you
have a 20 MIPS processor in a $2000 box, it really doesn't matter that you
only need a 0.5 MIPS processor in a $1800 box... the marginal cost of the
extra 19.5 MIPS is low enough that you might as well get them. If they go
to waste 95% of the time, who cares?

Too bad you can't run NeWS on it and really benefit from those extra server
CPU cycles...
-- 
 _--_|\  `-_-' Peter da Silva. +1 713 274 5180. <peter@ficc.uu.net>.
/      \  'U`
\_.--._/
      v

bs@linus.UUCP (Robert D. Silverman) (03/20/90)

In article <798@dgis.dtic.dla.mil> jkrueger@dgis.dtic.dla.mil (Jon) writes:
:brooks@maddog.llnl.gov (Eugene Brooks) writes:
:
:>You will be completely shocked to see how
:>low the processor utilization of single user work stations are.  The
:>small size of the utilization factor completely negates the cost performance
:>edge of the Killer Micro inside it.
:
:This is quite correct, and therefore we should stop using personal
:automobiles, too.  Instead we should use taxis, car pools, and
:other forms of better sharing the same basic hardware.  This will
:increase the <10% utilization of most cars.
:
:OK, ob. smiley.  Yes, we like having our own cars, and we like having
:our own local source of computation, and we're going to continue
:to choose this whenever we have a choice.  It's a fact of life.
 
I invite everyone to read the following paper:

Robert Silverman & Sidney Stuart
"A Network Batching System for Parallel Processing"
Software Practice & Experience Vol 19, #12, pp. 1163-1174

We describe a system that allowed us to soak up all the excess
processing time on a SUN network, while not impairing the interactive
use of workstations.

-- 
Bob Silverman
#include <std.disclaimer>
Mitre Corporation, Bedford, MA 01730
"You can lead a horse's ass to knowledge, but you can't make him think"

sysmgr@KING.ENG.UMD.EDU (Doug Mohney) (03/20/90)

In article <52661@lll-winken.LLNL.GOV>, brooks@maddog.llnl.gov (Eugene Brooks)
writes:

>The sentence I have often used, "No one will survive the attack of the
>Killer Micros," is not to be misinterpreted as "No one will survive the
>attach of the Killer Single User WorkStations."  The single user workstations
>are indeed Killers, but they are essentially wasted computer resources.
>Corporate America will eventually catch on to this and switch to
>X display stations and efficiently shared computer resources.

By the time you buy a loaded X-terminal with 4MB of RAM and a large
screen, you might as well pay the $2K extra for a small swappin 'disk and a
full-blown CPU. The jury (at least MY jury) is still out on X-terminals.

If shared resources are such wonderful critters, how come multiuser Macs
aren't popular? Or '386es? You could conceivably hang multiple terminals
from a '386 or '486 box, but I haven't heard of people rushing out to do so.

Long live the revolution; I'll figure out what to do with all the MIPS later.
Solbourne is supposed to be coming out with a 40MIPS/10K workstation by
the end of the year. Batten the hatches folks; life is going to get more,
not less, interesting....

lamaster@ames.arc.nasa.gov (Hugh LaMaster) (03/20/90)

In article <798@dgis.dtic.dla.mil> jkrueger@dgis.dtic.dla.mil (Jon) writes:
>brooks@maddog.llnl.gov (Eugene Brooks) writes:

<Various arguments deleted.>

Eugene Brooks first argument in this thread, many months ago, was that
commodity micros are going to become the basic building blocks of *most*
systems; he supplied some rather humorous descriptions of how microprocessor
based systems are now as fast, or faster, than some of the fastest systems
based on specially designed CPUs: e.g. Cray.

Then, he added a correction stating that he was *not* arguing that desktop
*systems* were going to replace all other systems.

Various polemics followed :-)

****************************************************

The question: what is a more optimal *system*: a network of personal
workstations or fewer more centralized servers giving individual users
X terminals?

Answer: (Mine, of course): *it depends*.  On the same campus, some
people are better served by one model of computing, some by another.  In
my experience, it depends on a number of factors:

1) Availability on the existing staff of *experienced system administrators*.
If you already have someone, you have more choices.  If you don't, usually
a more centralized system will serve you better, because someone else will
do all the system care and feeding.  Someone who is an expert may be able
to take care of the job with only a little overhead; a novice may get 
consumed by it.

2) Whether or not *time critical* work is done.  Most people believe, and
rightly so, in my experience, that time critical data acquisition and
analysis *cannot* be done reliably on shared resources.  It just doesn't
cut it to say that you are losing $10,000 an hour on an expensive test
because the shared compute resource is saturated.

3) The nature of the job, and what kind of *networking* resources it
demands.  "MIPS", whatever they are, are almost free these days for most
general purpose computing.  But, *systems* are not free.  Systems require
memory, access to data, and may require moving data across various low
bandwidth and expensive channels, like networks.  The cost of the 
processors is a relatively low part of the overall system cost these
days for many systems.  You have to look at entire picture.  It may 
be cheaper to keep many processors idle most of the time if it means
better *network utilization*, because networks are a more costly resource
than "MIPS" in today's typical computing environment.

4) The cost of coordinating work.  People time costs money, and the more
distant someone is in an organization, the harder it is to share common
resources.  This isn't "wasteful"; *time is money*.  Anyone's job is to
get results.  The cost of coordinating with a lot of people to use
hardware "efficiently" may be *much* greater than the cost of the "wasted"
hardware.  This is particulary true if computing resources are on the
critical path in any project.  Project management of large projects is
tough.  Why let computer utilization become an issue:  the goal is to
do the most with the least, and overall cost and efficiency are much
more important than the utilization of any one resource, including office
space, computers, etc.   That being said, it usually isn't the issue for 
most numerical simulations, where the speed of the hardware determines 
what is possible, and efficient utilization of scarce resources, like
Cray memory and memory and I/O bandwidth is a day to day task.

********************************************************


So, I don't think there is one answer for what is most cost effective.
It all depends on the job at hand.  That is why you need engineers, to
figure out how to get something done as cheaply as possible.  You don't
need a "policy" to decide that desktop systems, file servers, or
supercomputers are best: you need to study the job at hand and figure out
how to do it best.  Whoops: back to work :-)


********************************************************
********************************************************


Speaking of architectural issues, how is the BBN TC 2000 working out?
It should be a perfect example of Killer Micros in action.  But,
I was rather surprised that the TC 2000 Butterfly switch is only 8 bits (!)
wide and only supports a maximum memory bandwidth of 2.4 GBytes/sec
for a 63 processor system.  A Cray Y-MP has about 40 GBytes/sec of total
memory bandwidth, for reference.

 

  Hugh LaMaster, M/S 233-9,  UUCP ames!lamaster
  NASA Ames Research Center  ARPA lamaster@ames.arc.nasa.gov
  Moffett Field, CA 94035     
  Phone:  (415)604-6117

philip@Kermit.Stanford.EDU (Philip Machanick) (03/20/90)

In article <00933EBB.E972FCA0@KING.ENG.UMD.EDU>, sysmgr@KING.ENG.UMD.EDU
(Doug Mohney) writes:

> If shared resources are such wonderful critters, how come multiuser Macs
> aren't popular? Or '386es? You could conceivably hang multiple terminals
> from a '386 or '486 box, but I haven't heard of people rushing out to do so.

Predictable response time...This is also (one of the reasons, anyway) why
Apple does not support pre-emptive multi-tasking. I'm using a 16Mbyte
DECstation 3100 and despite the faster processor, it doesn't compare with
a 68030 Mac on user interface reponsiveness. And the DECstation is hardly ever
used by other users. Moral of the story? A multi-tasking OS with virtual memory
etc. has its price. Of course, if you aren't doing much "interactive" stuff
(e.g., large-scale compiles or number crunching), the trade-offs are
different. I would go with a Mac as a user interface engine (scrap the
X-terminal idea), with a networked high-speed machine (or machines) to do the
number crunching, large-scale file system, database etc.

Philip Machanick
philip@pescadero.stanford.edu

hamrick@convex1.convex.com (Ed Hamrick) (03/20/90)

Mr. Brooks,

I read your recent article regarding killer micros with great interest.
I'd like to comment on a few of the points you made below:

In article <52661@lll-winken.LLNL.GOV> brooks@maddog.llnl.gov (Eugene Brooks)
> Computers are best utilized as shared resources, your Killer Micros should
> be many to a box and sitting in the computer room where the fan noise does
> not drive you nuts.  This is where I keep MY Killer Micros.

I received a lot of mail regarding this very point, and you were one of the
few people who agreed with me.  I'd like to qualify this point by saying that
too much centralization is inefficient also.  A good rule of thumb is to
centralize to the point where 50% to 80% of the compute cycles are used.
Sharing at the departmental level also alleviates many of the problems of
corporate-wide centralization.

A much more interesting subject is the one you raise below:

In article <52661@lll-winken.LLNL.GOV> brooks@maddog.llnl.gov (Eugene Brooks)
> To use the "efficient utilization argument" to support the notion that
> low volume custom processor architectures might possibly survive the
> attach of the Killer Micros is pretty foolish, however.  Ed, would you
> care to run the network simulator and Monte Carlo code I posted results
> of on the Convex C210, and post the results to this group?  I won't
> ruin the surprise by telling you how it is going to come out...

I'd be happy to run these programs on a C210.  I think you'd find that
the C210 does much better than the 25 MHz clock would otherwise lead
you to predict.  However, most of CONVEX's customers purchase our
machines for more than the excellent scalar performance - a large
number of important scientific and engineering applications require
high speed vector performance along with large memory, 2 GByte virtual
address space, and high-speed I/O.

It would be interesting to see the performance of these scalar codes
on various architectures, relative to the clock speed of the machines
implementing these architectures, especially the Cray numbers.

The cost of processors is a very small part of the total cost of a
departmental compute server.  How much do you think Alliant pays for
the 8 i860 chips in their low-end $500K product?  The design of the
memory system is the dominant factor in system performance and system
cost for departmental supercomputers.

There is no question that all computer vendors will some day implement their
particular architectures in a small number of chips.  The only question
is when.  Making this decision too early might cause you to make premature
architectural trade-offs in order to reduce the number of gates needed
for today's chips.  For example, the i860 uses reciprocal approximation
for the divide and square root functions.  If space for more gates had
been available, the i860 might have been implemented differently.

> Perhaps we can get the fellows at Alliant to do the same with their new
> 28 processor Killer Micro powered machine.  That i860 is definitely a
> Killer Micro. After we compare single CPU performances, perhaps we could
> then run the MIMD parallel versions on the Convex C240 and the Alliant 28
> processor Killer Micro powered box.  Yes, there are MIMD parallel versions
> of both codes which could probably be made to run on both machines.

If you have a chance, ask the Alliant people what their Linpack 100x100
performance is, and see how well it scales up to 28 processors.  Try to
get real runs, not estimates.  I'd also be curious about main memory
bandwidth (not crossbar bandwidth).  Information like number of banks,
number of bytes read per bank access, and bank cycle time would be
particularly interesting.  It would also be useful to run the MIMD versions
of your codes on both the Alliant and the C240, and compare the parallel
speed-ups.  It would also be revealing to run MIMD scalar codes (and vector
codes) that have a low cache hit rate on both the Alliant and CONVEX.

As an aside, I was curious why you were asked not to release information
about the low utilization of the 300 workstations you mentioned.  I can't think
of any reason Livermore wouldn't want this information publicly available,
since this is likely to be true of any organization using large numbers of
single user workstations.  It would do a great service to people considering
lots of single-user killer micros to have this data publicly available.

Regards,
Ed Hamrick

davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) (03/20/90)

In article <00933EBB.E972FCA0@KING.ENG.UMD.EDU> sysmgr@KING.ENG.UMD.EDU (Doug Mohney) writes:

| If shared resources are such wonderful critters, how come multiuser Macs
| aren't popular? Or '386es? You could conceivably hang multiple terminals
| from a '386 or '486 box, but I haven't heard of people rushing out to do so.

  You haven't been listening. A 386 box is about 3x the original VAX,
and will happily support 8 users with the response you would like, or 32
with response slightly better than the old VAX did under that load.
There are MANY of these systems sitting in offices running Xenix and
supporting 4-8 users.

  Because they're so checp people usually buy another rather than load
them to death, but a 386 will do reasonable well even with load average
up around six, providing you have enough memory.
-- 
bill davidsen	(davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen)
            "Stupidity, like virtue, is its own reward" -me

peter@ficc.uu.net (Peter da Silva) (03/20/90)

> Predictable response time...This is also (one of the reasons, anyway) why
> Apple does not support pre-emptive multi-tasking.

Pre-emptive miltitasking has nothing to do with it. The Amiga O/S uses
pre-emptive multitasking, and I'll put it up for speed and efficiency against
that Mac's system software any day. The first time I used a Mac II with a color
screen it was distinctly less peppy than my Amiga 1000 with a 7.16 MHz 68000.

Well, this was a predictable response. :->
-- 
 _--_|\  `-_-' Peter da Silva. +1 713 274 5180. <peter@ficc.uu.net>.
/      \  'U`
\_.--._/
      v

ktl@wag240.caltech.edu (Kian-Tat Lim) (03/21/90)

In article <100701@convex.convex.com>, hamrick@convex1 (Ed Hamrick) writes
[in reference to the Alliant FX/2800]:
>If you have a chance, ask the Alliant people what their Linpack 100x100
>performance is, and see how well it scales up to 28 processors.  Try to
>get real runs, not estimates.  I'd also be curious about main memory
>bandwidth (not crossbar bandwidth).  Information like number of banks,
>number of bytes read per bank access, and bank cycle time would be
>particularly interesting.

From publicly-available Alliant literature:

	MEMORY SYSTEM
	Cache size 512KB per module, 4MB max
	Processor to Cache Bandwidth: 1.28GB/sec [through the crossbar]
	Maximum Physical Memory: 1GB
	Interleaving: 16-way on a single board
	Memory Bus Bandwidth: 640MB/sec

I believe that Alliant has run 100x100 Linpack on a 28 processor
system, but I'm not sure if that figure has been made public.  It's
probably obvious that it won't be 28 times the raw i860 number (11
MFLOPS).
-- 
Kian-Tat Lim (ktl@wagvax.caltech.edu, KTL @ CITCHEM.BITNET, GEnie: K.LIM1)
Perl is the Swiss Army chainsaw [of Unix programming]. -- Dave Platt's friend

daveh@cbmvax.commodore.com (Dave Haynie) (03/21/90)

In article <1990Mar19.234839.13829@Neon.Stanford.EDU> philip@pescadero.stanford.edu writes:

>Predictable response time...This is also (one of the reasons, anyway) why
>Apple does not support pre-emptive multi-tasking. 

Pre-emptive multitasking has nothing at all to do with predictable response time.

>I'm using a 16Mbyte DECstation 3100 and despite the faster processor, it doesn't compare 
>with a 68030 Mac on user interface reponsiveness. And the DECstation is hardly ever
>used by other users. 

The way UNIX implements its multitasking has everything to do with the
unpredictable response time you get on UNIX workstations.  Same reason the
NeXT box has a "jumpy" feel to the user. 

I use two non-UNIX systems with pre-emptive multitasking -- Apollos (under
Aegis, or DomainOS, or whatever they call it these days) and Amigas.  Both
of these systems, especially the Amiga, are extremely responsive.  In fact,
moreso than the Mac.  For example, on the Amiga, the main things governing
user-interaction, such as mouse and keyboard response, are interrupt driven
and managed by a high priority task.  The user interface also runs at a
higher priority than the average user task.  So when you start that 64k x
64k spreadsheet to recalculating, you don't have the mouse drop dead, and
you can still move windows around. 

What makes the difference is real time response, an operating systems
issue, but not the same thing as pre-emptive multitasking. 

>Moral of the story? A multi-tasking OS with virtual memory etc. has its price. 

The real moral of the story is that operating systems originally designed
for multi-user operation with users hooked in via serial line text
terminals may not provide the best feel when adapted for use as the
operating system for GUI based, single-user workstations.  At least not
without a great deal of rethinking, which apparently hasn't yet been
completed by most of the folks building these systems. 

>Philip Machanick
>philip@pescadero.stanford.edu

-- 
Dave Haynie Commodore-Amiga (Systems Engineering) "The Crew That Never Rests"
   {uunet|pyramid|rutgers}!cbmvax!daveh      PLINK: hazy     BIX: hazy
                    Too much of everything is just enough

rogerk@mips.COM (Roger B.A. Klorese) (03/21/90)

In article <798@dgis.dtic.dla.mil> jkrueger@dgis.dtic.dla.mil (Jon) writes:
>This is quite correct, and therefore we should stop using personal
>automobiles, too.  Instead we should use taxis, car pools, and
>other forms of better sharing the same basic hardware.  This will
>increase the <10% utilization of most cars.

If you will follow transportation debate, you will find that there are
many voices agreeing with your strawman.  The difference is that, unlike
in the computing world, the networking, connectivity and flexibility of
mass transportation is unsatisfactory in most areas.
-- 
ROGER B.A. KLORESE      MIPS Computer Systems, Inc.      phone: +1 408 720-2939
MS 4-02    928 E. Arques Ave.  Sunnyvale, CA  94086             rogerk@mips.COM
{ames,decwrl,pyramid}!mips!rogerk                                 "I'm the NLA"
"Two guys, one cart, fresh pasta... *you* figure it out." -- Suzanne Sugarbaker

ktl@wag240.caltech.edu (Kian-Tat Lim) (03/21/90)

In article <14357@cit-vax.Caltech.Edu>, ktl@wag240 (Kian-Tat Lim) writes:
>I believe that Alliant has run 100x100 Linpack on a 28 processor
>system, but I'm not sure if that figure has been made public.  It's
>probably obvious that it won't be 28 times the raw i860 number (11
>MFLOPS).

I've been told that the numbers are public:

Alliant FX/2808 (8 processors, 4 in one cluster):
LINPACK DP 100x100:	 20
	 1000x1000:	220

Alliant FX/2828 (28 processors, 14 in one cluster):
LINPACK DP 100x100:	 42
	 1000x1000:	720
-- 
Kian-Tat Lim (ktl@wagvax.caltech.edu, KTL @ CITCHEM.BITNET, GEnie: K.LIM1)
Perl is the Swiss Army chainsaw [of Unix programming]. -- Dave Platt's friend

sysmgr@KING.ENG.UMD.EDU (Doug Mohney) (03/21/90)

In article <2168@crdos1.crd.ge.COM>, davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) writes:
>  .............. A 386 box is about 3x the original VAX,
>and will happily support 8 users with the response you would like, or 32
>with response slightly better than the old VAX did under that load.
>There are MANY of these systems sitting in offices running Xenix and
>supporting 4-8 users.
>
Sure. There are many more people who don't run Xenix and are running
Novell with Ethernet or Token Ring. Or even Banyon Vines, for that matte

			Doug

cik@l.cc.purdue.edu (Herman Rubin) (03/21/90)

In article <37193@mips.mips.COM>, rogerk@mips.COM (Roger B.A. Klorese) writes:
> In article <798@dgis.dtic.dla.mil> jkrueger@dgis.dtic.dla.mil (Jon) writes:
> >This is quite correct, and therefore we should stop using personal
> >automobiles, too.  Instead we should use taxis, car pools, and
> >other forms of better sharing the same basic hardware.  This will
> >increase the <10% utilization of most cars.
 
> If you will follow transportation debate, you will find that there are
> many voices agreeing with your strawman.  The difference is that, unlike
> in the computing world, the networking, connectivity and flexibility of
> mass transportation is unsatisfactory in most areas.

I have opposed sharing in the transportation debate, and I oppose it here.

In the computing world, the networking, connectivity and flexibility of
sharing non-specific resources is unsatisfactory in most areas.  Other than
such things as text files in ASCII, nothing is easily shared unless the same
machine, or at best the same type of machine, is used, and it may even be
necessary to use the same language.  Even different compilers for the same
language can give problems.  A Maserati and a Yugo are more similar than
different computers.
-- 
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907
Phone: (317)494-6054
hrubin@l.cc.purdue.edu (Internet, bitnet, UUCP)

jesup@cbmvax.commodore.com (Randell Jesup) (03/21/90)

In article <1990Mar19.234839.13829@Neon.Stanford.EDU> philip@pescadero.stanford.edu writes:
>In article <00933EBB.E972FCA0@KING.ENG.UMD.EDU>, sysmgr@KING.ENG.UMD.EDU
>(Doug Mohney) writes:
>
>> If shared resources are such wonderful critters, how come multiuser Macs
>> aren't popular? Or '386es? You could conceivably hang multiple terminals
>> from a '386 or '486 box, but I haven't heard of people rushing out to do so.
>
>Predictable response time...This is also (one of the reasons, anyway) why
>Apple does not support pre-emptive multi-tasking. I'm using a 16Mbyte
>DECstation 3100 and despite the faster processor, it doesn't compare with
>a 68030 Mac on user interface reponsiveness. And the DECstation is hardly ever
>used by other users. Moral of the story? A multi-tasking OS with virtual memory
>etc. has its price.

	You're arguing a deficiency of most Unixes, not of multi-tasking
per se.  A good counter-example is the Amiga - preemptive multitasking but
provides excellent response time even on a lowly 68000.  Most Unixes are
not optimized for user response time, their schedulers just weren't designed
with that as a major consideration.  On an Amiga, the highest-priority task
gets 100% of available cycles, or round-robins with tasks of the same
priority, on a many-times-per-second basis.  Combined with interrupt and
DMA driven IO, this produces very fast user response times.  Light-weight
tasks (faster task-switching) helps here also.

	I suspect the main reason Apple hasn't gone preemptive is that their
system was designed so that preemption would be a massive problem, at best.
All those "low-memory-globals", etc that programs modify  would cause major
havoc to support, or require massive changes of the "rules", making most
applications that had been written correctly become "broken".

	Those of us in the micro market often have to bow and scrape to
the Great God of Compatibility.  :-(  We here at Commodore have been stuck
with our own early design decisions in some cases.

	There are other ways to improve user response time, most of them
"classical".  Stratus VOS (last I looked) bumped the priority of a task that
just got input from a user temporarily.  This improves the "feel" of 
responsiveness.

-- 
Randell Jesup, Keeper of AmigaDos, Commodore Engineering.
{uunet|rutgers}!cbmvax!jesup, jesup@cbmvax.cbm.commodore.com  BIX: rjesup  
Common phrase heard at Amiga Devcon '89: "It's in there!"

brooks@maddog.llnl.gov (Eugene Brooks) (03/21/90)

In article <100701@convex.convex.com> hamrick@convex1.convex.com (Ed Hamrick) writes:
>I'd be happy to run these programs on a C210.  I think you'd find that
>the C210 does much better than the 25 MHz clock would otherwise lead
>you to predict.

A friendly fellow on the Internet has taken care of this for you.
I won't use his name to protect the innocent!

The score for the network simulator SIM was 31% of IBM 530 performance.
The score for the Monte Carlo was 46% of IBM 530 performance.
The Convex C2 looks pretty good relative to the XMP, given the price,
but its performance pales against any Killer Micro.

Both programs were compiled with -O2.  The clock speed of the 530
is the same as that of the C210, I would say that the IBM is doing
something nice.  The Convex compilers are nothing to sneeze at.

brooks@maddog.llnl.gov, brooks@maddog.uucp