[comp.arch] LINPACK 1000x1000 MFLOPS per $$$

mccalpin@pereland.cms.udel.edu (John D. McCalpin) (07/19/90)

Stardent has recently been running an advertisement in Supercomputing
Review which uses the measure of LINPACK 1000x1000 MFLOPS per Dollar
to evaluate several computers --- specifically the IBM 3090/180VF, the
Alliant FX/80, the SGI 4D/240, and the Stardent 3040.

Just for the hell of it, I decided to put the IBM Power Station 320
and the Cray Y/MP on the chart, and have reproduced the expanded chart
below.

	Machine			MFLOPS		Price		Ratio
	--------------------------------------------------------------
   ****	IBM Power Station 320	13.26		  13,000	 37.0 ****
	Stardent 3040		77		 162,500	 17.0
	SGI 4D/240		17		 158,000	  3.9
	Alliant FX/80		69		 650,000	  3.8
   ****	Cray Y/MP-8	      2144	      25,000,000	  3.1 ****
   **** Cray Y/MP	       300	       4,000,000	  2.7 ****
	IBM 3090/180VF		92	       3,300,000	  1.0
	--------------------------------------------------------------

Those of you who are long-term readers of this group will know that I
have never been too supportive of Eugene Brooks "Attack of the Killer
Micros" thesis.  I am now.

For fully vectorizable application codes I can get 1/25 of a Cray Y/MP
for under $10,000 (with University discounts).  This is equivalent to
one Cray Y/MP hour/calendar day, or 30 Cray hours/month, or 360 Cray
hours/year.  I don't believe that I can get allocations that large at
the national supercomputing centers, and if I did, then having the
calculations done locally would still be an advantage.

Notes:

(1) The 13.26 MFLOPS on the IBM 320 was using an 8-column block-mode
solver written by Earl Killian at MIPS (earl@mips.com).  The standard
version of LINPACK with unrolled BLAS runs at 8.4 MFLOPS.

(2) The $13,000 configuration includes no monitor or graphics adapter,
etc.  It is strictly a server, configured with 16 MB RAM and 120 MB
disk.  NFS is used to store results directly onto my graphics
workstation.  IBM's prices for memory are a bit steep --- almost
$600/MB (list) --- but several 3rd-parties are already at work on
cloning the memory boards, which should drop the price to well under
$200/MB.  The machine can be configured with up to 32 MB using 1 Mbit
technology and 128 MB using 4 Mbit technology.

(3) The number of 1/25 of a Cray comes from the average of the
performance of two fully vectorizable three-dimensional ocean
circulation models that I have run on the Cray and the IBM 320.  Both
models run at speeds in excess of 120 MFLOPS on one cpu of the Cray.
One code runs at about 1/30 of the Cray and the other at 1/20.
A less vectorizable two-dimensional ocean circulation model runs at
1/3 of the Cray's performance level!!!
--
John D. McCalpin			mccalpin@perelandra.cms.udel.edu
Assistant Professor			mccalpin@vax1.udel.edu
College of Marine Studies, U. Del.	J.MCCALPIN/OMNET

bloepfe@ethz.UUCP (Bruno Loepfe) (07/19/90)

In article <MCCALPIN.90Jul18175935@pereland.cms.udel.edu> mccalpin@pereland.cms.udel.edu (John D. McCalpin) writes:
>Stardent has recently been running an advertisement in Supercomputing
>Review which uses the measure of LINPACK 1000x1000 MFLOPS per Dollar
>to evaluate several computers --- specifically the IBM 3090/180VF, the
>Alliant FX/80, the SGI 4D/240, and the Stardent 3040.
>
> [chart deleted]
>
>Those of you who are long-term readers of this group will know that I
>have never been too supportive of Eugene Brooks "Attack of the Killer
>Micros" thesis.  I am now.
>

If you do this kind of calculations, it turns out, that a BYCICLE gives you
the most mph per $$$. How does it come, that nevertheless (almost) everybody
drives a CAR ? This fact makes me think, that some other arguments have to be
considered as well, i.e. it's probably not feasible to watch the "real world"
through a filter "$$$ per <something>" only, where <something> is your
favorite attribute of a computer (or anything else)...

-------------------------------------------------------------------------------
Bruno Loepfe                     	u36@czheth5a.bitnet
Computing Center                 	loepfe@rz.id.ethz.ch
Federal Institute of Technology  	bloepfe@ethz.uucp
Zuerich, Switzerland		 	..!uunet!mcsun!ethz!bloepfe (UUCP)

mccalpin@pereland.cms.udel.edu (John D. McCalpin) (07/19/90)

In article <> mccalpin@perelandra.cms.udel.edu I wrote about

>[using] the measure of LINPACK 1000x1000 MFLOPS per Dollar
>to evaluate several computers --- specifically the IBM 3090/180VF, the
>Alliant FX/80, the SGI 4D/240, and the Stardent 3040.

In article <5180@athz.UUCP> bloepfe@ethz.uucp (Bruno Loepfe) replies:

>If you do this kind of calculations, it turns out, that a BYCICLE gives you
>the most mph per $$$. How does it come, that nevertheless (almost) everybody
>drives a CAR ? This fact makes me think, that some other arguments have to be
>considered as well, i.e. it's probably not feasible to watch the "real world"
>through a filter "$$$ per <something>" only, where <something> is your
>favorite attribute of a computer (or anything else)...
>Bruno Loepfe                     	u36@czheth5a.bitnet

Apparently I did not make myself clear. I certainly am taking more
into account than simple price/performance, as I stated in the
previous posting and which has been the main point of my earlier
postings on the subject of "Killer Micros". 

What is novel about this machine is that now:
	(1) I can get the machine for under $10,000
	(2) I can run my *fully vectorizable* application codes
	    at speeds that give me the equivalent of more Cray
	    time than I can reasonably expect to have access to.

So on applications for which the Cray is *most efficient*, the IBM
machine provides an effective throughput (measured in calculations per
calendar month, or some similar units) that exceeds what I can get
access to on a Cray.

I suspect that most of us also have computer allocations which are
equivalent to less than 30 Cray Y/MP hours per month.

This in no way diminishes the usefulness of supercomputers.  I can
still get 1500 MFLOPS performance levels on one of my codes on an
8-cpu Y/MP.  What is does shift is the *length* of the jobs for which
the faster machine is required.  Since I work on projects with annual
sorts of time scales, and am willing to run a calculation for 6 months
or so, the Cray is only going to be required if I need more than 180
Cray hours in a 6-month period.  

There are a number of "Grand Challenge" sorts of projects that require
that sort of investment in time, but the dividing line of what
projects can be done in my office vs what projects must be done at a
remote supercomputer site is shifting rapidly toward the largest of
projects.

--
John D. McCalpin			mccalpin@perelandra.cms.udel.edu
Assistant Professor			mccalpin@vax1.udel.edu
College of Marine Studies, U. Del.	J.MCCALPIN/OMNET

tif@doorstop.austin.ibm.com (Paul Chamberlain) (07/20/90)

In article <5180@ethz.UUCP> bloepfe@bernina.ethz.ch.UUCP (Bruno Loepfe) writes:
>In article <MCCALPIN.90Jul18175935@pereland.cms.udel.edu> mccalpin@pereland.cms.udel.edu (John D. McCalpin) writes:
>>Stardent has recently been running an advertisement in Supercomputing
>>Review which uses the measure of LINPACK 1000x1000 MFLOPS per Dollar ...
>... it's probably not feasible to watch the "real world"
>through a filter "$$$ per <something>" only, where <something> is your
>favorite attribute of a computer (or anything else)...

For this MFLOPS per Dollar to be useful you obviously have to begin
by eliminating those with less MFLOPS than you HAVE to have, and
by eliminating those with more DOLLARS that you can afford.  My
problem is that at that stage, I end up with the null set.

Paul Chamberlain | I do NOT represent IBM         tif@doorstop, sc30661@ausvm6
512/838-7008     | ...!cs.utexas.edu!ibmaus!auschs!doorstop.austin.ibm.com!tif

davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) (07/20/90)

In article <MCCALPIN.90Jul18175935@pereland.cms.udel.edu> mccalpin@pereland.cms.udel.edu (John D. McCalpin) writes:

| (2) The $13,000 configuration includes no monitor or graphics adapter,
| etc.  It is strictly a server, configured with 16 MB RAM and 120 MB
| disk.  NFS is used to store results directly onto my graphics
| workstation.  

  You have defined the solution by picking the dataset... You are
talking about a tiny problem here, not at all typical of what is run on
a Cray. Certainly there are problems requiring lots of CPU and tiny
memory, and it's nice that you have one. Workstations are good at that.
We run dedicated troff servers here, and they're workstations, too.

  If you define the dataset to be typical Cray size, say 500MB, the
workstation becomes impractical. And if you assume non-vectorable very
large problems the Cray2 has the edge in scalar speed.

  This is a lot like saying that you want to haul a bag of groceries at
100mph, and therefore sports cars are killing trucks. You have a sports
car problem here, and your solution is cost effective. So? We still need
trucks.
-- 
bill davidsen	(davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen)
            "Stupidity, like virtue, is its own reward" -me

mccalpin@perelandra.cms.udel.edu (John D. McCalpin) (07/20/90)

In article <> mccalpin@pereland.cms.udel.edu I wrote about MFLOPS/$:
 
>| (2) The $13,000 configuration includes no monitor or graphics adapter,
>| etc.  It is strictly a server, configured with 16 MB RAM and 120 MB
>| disk.  NFS is used to store results directly onto my graphics
>| workstation.  
 
In article <> davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) replies:
>   You have defined the solution by picking the dataset... You are
> talking about a tiny problem here, not at all typical of what is run on
> a Cray. Certainly there are problems requiring lots of CPU and tiny
> memory, and it's nice that you have one. Workstations are good at that.
> We run dedicated troff servers here, and they're workstations, too.

The configuration that I quoted has a rather small memory by current
supercomputer standards, but 2 MW (64-bit) is hardly "tiny".  As soon
as 3rd-party vendors start delivering memory boards at competitive
prices, the machine will be upgradable to 4 MW (64-bit) for about
$2000.  Since the machine was designed to accept 4 Mbit technology, it
is possible to configure it with up to 16 MW of memory.   I expect
that it will be a few months before IBM releases any boards based on 4
Mbit chips, and then a few more months before clones are available
from 3rd parties.  Estimated cost for a full 128 MB = 16 MW is about
$20,000 in addition to the base price of $8700 for the machine.

Since Cray is still selling lots of Y/MP's with 32 MW memories, it is
hardly fair to criticize a single-user workstation on that account.

As far as disk storage goes, I have 1.5 GB of disk space on my
graphics workstation, and will soon have a 2.3 GB tape drive.  So
manipulating 500 MB datasets (see below) is entirely practical.

The whole setup:
	Silicon Graphics 4D/25TG
		1.5 GB disk (2x760MB)
		2.3 GB tape
		32 MB RAM
		150 MB tape
	IBM 320 server
		16 MB RAM
		120 MB disk
is under $50,000 at University prices.

>   If you define the dataset to be typical Cray size, say 500MB, the
> workstation becomes impractical. And if you assume non-vectorable very
> large problems the Cray2 has the edge in scalar speed.

How did you decide that 500 MB was a "typical" Cray dataset?  There is
such a large variety of jobs that are run on Crays that defining a
"typical" job seems counter-productive.  There are *many* important
problems which are cpu-intensive that can fit comfortably into
machines with 2, 4, 8, or 16 MW of memory.  After all, Cray has only
been shipping X and Y machines with more than 8 MW of memory for about
2 years now.

Concerning the Cray-2 --- if the job *absolutely requires* at least
256 MW of real memory, then there are not many options (though I
believe that the Convex C-240 can be configured with 256 MW at
considerably less cost).  On the other hand, it might be more
cost-effective in the longer term to spend the programmer salary
required to port the application to run out-of-core on a much cheaper
machine. 

>   This is a lot like saying that you want to haul a bag of groceries at
> 100mph, and therefore sports cars are killing trucks. You have a sports
> car problem here, and your solution is cost effective. So? We still need
> trucks.

I said precisely the same thing at the end of my original posting.
However, I disagree that memory size is the primary dividing line
between jobs which require supercomputers and those which do not.
The IBM 320 that I described only has 2 memory board slots.  The
server configurations have more slots and can be configured with up to
512 MB = 64 MW of RAM (depending on the model) using 4 Mbit
technology.  Since most 8-cpu Y/MP's are shipping with memories of
this same size, it hardly seems like a clear distinction.

As other people have pointed out, the choice of a computational
platform is a multivariate constrained optimization problem.  Some of
the constraints are:
	(1) The cost must be within the available budget.
	    This includes the cost of porting the code as well.
	(2) The wall-clock turnaround must be within the limits
	    of the research project.
	(3) Point (2) usually requires sufficient memory to make
	    the problem core-containable.
	(4) Sufficient mass storage space and access speed must be
	    available to save intermediate and permanent results
	    without slowing down the calculation past the constraints
	    of point (2).

An anecdote:
  I recently submitted a proposal to the NSF to do some cpu-intensive
studies of the equations governing a theoretical two-dimensional
ocean.  The calculations are estimated to require 200 hours of Cray
Y/MP time.  I don't consider this a trivial expenditure....
With an IBM 320, I would probably be able to finish all of the 
calculations before the proposal even completes the review process!

> bill davidsen	(davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen)
> 	    "Stupidity, like virtue, is its own reward" -me
--
John D. McCalpin			mccalpin@perelandra.cms.udel.edu
Assistant Professor			mccalpin@vax1.udel.edu
College of Marine Studies, U. Del.	J.MCCALPIN/OMNET

dhinds@portia.Stanford.EDU (David Hinds) (07/21/90)

In article <2349@crdos1.crd.ge.COM> davidsen@crdos1.crd.ge.com (bill davidsen) writes:
>In article <MCCALPIN.90Jul18175935@pereland.cms.udel.edu> mccalpin@pereland.cms.udel.edu (John D. McCalpin) writes:
>
>| (2) The $13,000 configuration includes no monitor or graphics adapter,
>| etc.  It is strictly a server, configured with 16 MB RAM and 120 MB
>| disk.  NFS is used to store results directly onto my graphics
>| workstation.  
>
>  You have defined the solution by picking the dataset... You are
>talking about a tiny problem here, not at all typical of what is run on
>a Cray. Certainly there are problems requiring lots of CPU and tiny
>memory, and it's nice that you have one. Workstations are good at that.
>We run dedicated troff servers here, and they're workstations, too.

    I think you underestimate the amount of code that falls in your
"workstation" catagory that is run on Crays.  In my field (biochemistry),
things like protein structure prediction, drug design, and molecular
dynamics calculations are routinely done on Crays by many people.  Our
lab recently got an SGI 4D/240 system to do our MD work.  We get much
more done than we ever could when we were using a Cray.  Our code would
easily fit in this RS6000 configuration, and would probably beat the 4D.

 -David Hinds
  dhinds@popserver.stanford.edu

elm@sprite.Berkeley.EDU (ethan miller) (07/21/90)

In article <MCCALPIN.90Jul20102428@pereland.cms.udel.edu>,
mccalpin@perelandra.cms.udel.edu (John D. McCalpin) writes:
%In article <> mccalpin@pereland.cms.udel.edu I wrote about MFLOPS/$:
%The configuration that I quoted has a rather small memory by current
%supercomputer standards, but 2 MW (64-bit) is hardly "tiny".

Problem #1 shows up right now.  You compare price/performance for the $13000
machine, and then turn around and say that you'd actually need to pay two
or three times that for a machine that will do what you want, or even
what you claim.

%Estimated cost for a full 128 MB = 16 MW is about
%$20,000 in addition to the base price of $8700 for the machine.

So let's assume you only get 8MW of memory, at a total price of about $20000
for the machine.  You've cut your advantage in half, and all you've done
is buy more memory.  Now start buying enough disk space for that simulation
data....

%Since Cray is still selling lots of Y/MP's with 32 MW memories, it is
%hardly fair to criticize a single-user workstation on that account.

Cray's I/O system is able to handle much more "paging," where programmers
shuttle data in and out to fit in a tiny memory space.  Can the PowerStation
accommodate this?  It's not just a question of I/O bus bandwidth; the disks
must be able to keep up as well.

%As far as disk storage goes, I have 1.5 GB of disk space on my
%graphics workstation, and will soon have a 2.3 GB tape drive.  So
%manipulating 500 MB datasets (see below) is entirely practical.
%
%The whole setup:
%[... setup deleted; see referenced article]
%is under $50,000 at University prices.

So now we're up to $50000 for the configuration that you're racing against
a Cray.  Suddenly, the killer micro isn't as killer.  Of course, if all
you want is lots of MIPS and MFLOPS, and you don't need much memory,
you're still OK.  However, the original cost/performance ratio has just
dropped by 4 or 5 times because you've added enough components to make
a real system.

%	(1) The cost must be within the available budget.
%	    This includes the cost of porting the code as well.

Is it any harder to port code to the Cray than to other machines?  How
about other supercomputers, such as Convex?  There will certainly be
porting costs, but I don't think they'll be much worse for a supercomputer
than for any other computer.  Please correct me if I'm wrong on this,
though.

%	(2) The wall-clock turnaround must be within the limits
%	    of the research project.

If you suffer a 25 to 1 slowdown of CPU time, that will change turnaround
times from overnight to one month.  That's a big difference.

%	(3) Point (2) usually requires sufficient memory to make
%	    the problem core-containable.

Not necessarily, especially if you're running on a computer with lots of
I/O bandwidth (assuming you have the devices to feed it).  There are also
quite a few simulations that aren't core-containable on any Y-MP.  What
then?

%	(4) Sufficient mass storage space and access speed must be
%	    available to save intermediate and permanent results
%	    without slowing down the calculation past the constraints
%	    of point (2).

It is this element that can contribute lots of cost to a computer system.

%  I recently submitted a proposal to the NSF to do some cpu-intensive
%studies of the equations governing a theoretical two-dimensional
%ocean.  The calculations are estimated to require 200 hours of Cray
%Y/MP time.  I don't consider this a trivial expenditure....
%With an IBM 320, I would probably be able to finish all of the 
%calculations before the proposal even completes the review process!

Really?  That's 5000 hours on a PowerStation (using the 25/1 ratio from
the table).  That's about 200 days, assuming you use every single CPU
cycle on the machine.  Since you'll be doing some I/O, though, I'd be
surprised to see better than 50-75% utilization, which brings total
running time to close to a year.  Granted, it's cheaper than Cray time,
but is it practical to wait a year for a single simulation to finish?

There are simulations that can run on a workstation instead of a
supercomputer.  These tend to be smaller simulations, though, for
memory, disk/tape storage, and CPU speed reasons.  You can probably
increase one, perhaps two dimensions of these axes and stay with a
workstation.  Once you increase all three, you have a supercomputer,
or pretty close to it.

ethan
=================================
ethan miller--cs grad student   elm@sprite.berkeley.edu
#include <std/disclaimer.h>     {...}!ucbvax!sprite!elm
Witty signature line condemned due to major quake damage.

mccalpin@perelandra.cms.udel.edu (John D. McCalpin) (07/21/90)

In article <MCCALPIN.90Jul20102428@pereland.cms.udel.edu>,
mccalpin@perelandra.cms.udel.edu (John D. McCalpin) I wrote:
> [on the topic of KILLER MICROS]
> %	(1) The cost must be within the available budget.
> %	    This includes the cost of porting the code as well.
In article <37683@ucbvax.BERKELEY.EDU> elm@sprite.Berkeley.EDU (ethan
miller)
asks:
> Is it any harder to port code to the Cray than to other machines?  How
> about other supercomputers, such as Convex?  There will certainly be
> porting costs, but I don't think they'll be much worse for a supercomputer
> than for any other computer.  Please correct me if I'm wrong on this,
> though.

Actually, this was a veiled reference to another "Killer Micro", the
Intel i860, which is already notorious for the considerable
programming effort required to obtain good performance (relative
to its own theoretical peak performance).  The i860 system benchmarks
that I have seen to date suggest that at 33 MHz it is slower on
compiled Fortran than the IBM 320.  These configurations are also more
expensive than the IBM.

> %	(2) The wall-clock turnaround must be within the limits
> %	    of the research project.
> 
> If you suffer a 25 to 1 slowdown of CPU time, that will change turnaround
> times from overnight to one month.  That's a big difference.

I have already addressed this at some length in other postings.  Of
course, if you gave me a whole Cray I could get my answers faster.
But in the real world, lots of people share these big machines and
most users receive allocations that come out on the order of one cpu
hour per day.  So it is the *total productivity* summed over some
months that is the appropriate metric for this part of the problem.
With respect to this metric, a machine which is 1/25 as fast as the
Cray, but dedicated to my job, is competitive.

> [my comments on running the equivalent of a 200 hour Cray job
>  on an IBM 320 deleted]

> Really?  That's 5000 hours on a PowerStation (using the 25/1 ratio from
> the table).  That's about 200 days, assuming you use every single CPU
> cycle on the machine.  Since you'll be doing some I/O, though, I'd be
> surprised to see better than 50-75% utilization, which brings total
> running time to close to a year.  Granted, it's cheaper than Cray time,
> but is it practical to wait a year for a single simulation to finish?

200 days is a good starting estimate.  I actually get more like 95%
cpu utilization on this job, so figure 7 months calendar time.  The
interesting aspect is that the whole process of proposal review and
funding takes about 6 months from the time the proposal is submitted
to the time that the money is available. If you include the time spent
actually writing the proposal, then 7 months on the calendar is not at
all unreasonable an estimate.  Then once the proposal is funded, I
have to make another proposal to the NSF supercomputer centers and
once that is funded I have to fight the queues to get my job through.
Each of these phases will take several months as well.  So call it 13
months from start to finish --- assuming that everything goes well and
that proposals (remember that there are two of them now) do not have
to get re-written, etc.

In light of this, waiting 7 months for a dedicated server to finish is
not so silly as it first sounded.

Another point is that the time on the Cray has a real world cost of
several hundred dollars per hour, split sort of evenly between
depreciation and operations/maintenance/utilities.  At $500/hour, the
$200 hour Cray calculation is costing us taxpayers about $100,000 ---
not counting my salary while I write the proposals, the time spent by
the mail reviewers and the panels reviewing the proposals, the
salaries of the administrators and paper-pushers, etc, etc, etc....
This is compared to about the use of about $20,000 of hardware for
about 1/4 of its useful life span.

And as a final point, recall that the codes in question are fairly
well optimized for the Cray, running at >120 MFLOPS sustained speeds.
The ratio of Cray Y/MP to IBM 320 performance is much poorer than 25:1
for less well-vectorized codes (my example from the earlier posting
was 3:1 for another of my FP-intensive applications with a scalar
bottleneck). 
--
John D. McCalpin			mccalpin@perelandra.cms.udel.edu
Assistant Professor			mccalpin@vax1.udel.edu
College of Marine Studies, U. Del.	J.MCCALPIN/OMNET

jkrueger@dgis.dtic.dla.mil (Jon) (07/23/90)

mccalpin@perelandra.cms.udel.edu (John D. McCalpin) I wrote:
>> [on the topic of KILLER MICROS]
> %	(1) The cost must be within the available budget.
> %	    This includes the cost of porting the code as well.

Two data points on that.  I ported a standalone 500 line C program to
the RS6000.  Its performance was about 10 times what I got on a
VAX-11/780 running 4.3 BSD UNIX.  I also ported a 3 line awk script to
the RS6000.  Its performance was zero: it hung.  The process went fully
computable and couldn't be killed.  Performance estimate: (10+0)/2 =
5 times a VAX-11/780.  (You wanted a geometric mean?)

There was no floating point in the C program, and probably none in the
awk.  The RS6000's excellent floating point performance did not help it
here.  Reliable system software would have helped its performance.

-- Jon
-- 
Jonathan Krueger    jkrueger@dtic.dla.mil   uunet!dgis!jkrueger
Drop in next time you're in the tri-planet area!

kahn@batcomputer.tn.cornell.edu (Shahin Kahn) (07/25/90)

As far as the 'supercomputer' users go, I think
Those who do not have workstations, will have workstations, and
as long as they dont have one, they will complain about supercomputers.
Onec they get one, they will continue to use supercomputers.  This time happily!

Consider this my 'law'!  I dont want to re-open the discussions of
killer-micros (with whose sensationalistic aspects I strongly disagree!)
nor the 'central vs. distributed' debate.

I actually have a presentation on these very subjects and a brief
write-up, informal.  Let me know and I will send a copy.  It is a little
long and it has (a minimal amount of) advertizements in it, so I wont post it.

The gist of it is that there are ways to get your price-performance and keep 
it for a good few years and move with the high-end technology and maintain
binary compatability on a standard open multivendor platform.
Too good to be true?  Not if you have the funding!  And not if you are
willing to not have the bestest technology every 8-10 months!

Shahin Khan.

leadley@uhura.cc.rochester.edu (Scott Leadley) (07/25/90)

In article <MCCALPIN.90Jul20220234@pereland.cms.udel.edu> mccalpin@perelandra.cms.udel.edu (John D. McCalpin) writes:
...
>In article <37683@ucbvax.BERKELEY.EDU> elm@sprite.Berkeley.EDU (ethan miller) asks:
...
>> Really?  That's 5000 hours on a PowerStation (using the 25/1 ratio from
>> the table).  That's about 200 days, assuming you use every single CPU
>> cycle on the machine.  Since you'll be doing some I/O, though, I'd be
>> surprised to see better than 50-75% utilization, which brings total
>> running time to close to a year.  Granted, it's cheaper than Cray time,
>> but is it practical to wait a year for a single simulation to finish?
>
>200 days is a good starting estimate.  I actually get more like 95%
>cpu utilization on this job, so figure 7 months calendar time.  The
>interesting aspect is that the whole process of proposal review and
>funding takes about 6 months from the time the proposal is submitted
>to the time that the money is available. If you include the time spent
>actually writing the proposal, then 7 months on the calendar is not at
>all unreasonable an estimate.
...
>In light of this, waiting 7 months for a dedicated server to finish is
>not so silly as it first sounded.

	What is the mean time between crashes on your PowerStation?  What is
the mean time between power glitches?  (Maybe a UPS would be a good
investment.)  I agree that having a personal PowerStation is the most cost
effective per MFLOP for your purposes.  However, I don't think that is the only
measure of worth in this debate.  Here are some other measures of worth:

	- is the simulation amenable to checkpointing?  This is important if
		you have to worry about the mean time between failure of the
		system as a whole.
	- how much effort does it take on the part of the researcher to restart
		from a checkpoint?  This can be a significant time sink with
		some programs.
	- does the work require substantial support by a human?  Mounting
		and unmounting tapes, for example.
	- is the system hardware and software reliable when used for your
		purposes?  All systems require some administrative effort.
		However, having just one user and a well-behaved task can
		reduce this effort to close to nothing.
	- does use of the system require interacting with other people?  Having
		to coordinate your work with other people is sometimes a a
		hassle (depending on the people or the situation).  Having to
		conform to the procedures of a bureaucracy is always a hassle.
	- how easy is it to recover from a system failure?  If the researcher
		is not familar with the system and doesn't have access to the
		services of a knowledgeable systems support person, system
		failures (due to crashes, file corruption, inept system
		administration, etc.) can be catastrophic.

	I'm playing devil's advocate here, in a similar situation I'd make the
same choice you did (and probably quicker too since I am a fan of decentralized
computing).
-- 
					Scott Leadley - leadley@cc.rochester.edu

mccalpin@perelandra.cms.udel.edu (John D. McCalpin) (07/26/90)

In article <MCCALPIN.90Jul20220234@pereland.cms.udel.edu> 
mccalpin@perelandra.cms.udel.edu (John D. McCalpin) I wrote about
running a simulation on a "Killer Micro" for 7 months instead of
waiting even longer for equivalent tiome to become available on a Cray.

In response, 
In article <8576@ur-cc.UUCP> leadley@uhura.cc.rochester.edu (Scott Leadley) asks:
> 
> 	What is the mean time between crashes on your PowerStation?  What is
> the mean time between power glitches?  (Maybe a UPS would be a good
> investment.)  I agree that having a personal PowerStation is the most cost
> effective per MFLOP for your purposes.  However, I don't think that is the only
> measure of worth in this debate.  Here are some other measures of worth:
>
> 	- is the simulation amenable to checkpointing?  This is important if
> 		you have to worry about the mean time between failure of the
> 		system as a whole.
   ==> Like most computational fluid dynamics codes, checkpointing is
	trivial.  All that is required is writing out all the fields at
	the current and previous time levels, and then restarting using
	those fields.

> 	- how much effort does it take on the part of the researcher to restart
> 		from a checkpoint?  This can be a significant time sink with
> 		some programs.
   ==> It is trivial to make the saving of checkpoint/restart data
	automatic.  It is not difficult to make a 'cron' entry that will
	automagically restart the code from the most recent complete 
	checkpoint if it is not running.

> 	- does the work require substantial support by a human?  Mounting
> 		and unmounting tapes, for example.
   ==> Archiving results to tape is necessary, but it with sufficient local
	disk, this can be done once every week or so.

> 	- is the system hardware and software reliable when used for your
> 		purposes?  All systems require some administrative effort.
> 		However, having just one user and a well-behaved task can
> 		reduce this effort to close to nothing.
   ==> Since I already have a personal graphics workstation, the
	added administration required is minimal.

> 	- does use of the system require interacting with other people?  Having
> 		to coordinate your work with other people is sometimes a a
> 		hassle (depending on the people or the situation).  Having to
> 		conform to the procedures of a bureaucracy is always a hassle.
   ==> That's why it is so nice that the entry cost is only about $10,000.
	I buy it for myself and don't have to share it with anyone.
	Other "Killer Micros" that have been proposed as competitive with
	supercomputers have been *much* more expensive -- typically $50,000 
	or more for Stardent 3010, SGI 4D/2x0, or other systems.

> 	- how easy is it to recover from a system failure?  If the researcher
> 		is not familar with the system and doesn't have access to the
> 		services of a knowledgeable systems support person, system
> 		failures (due to crashes, file corruption, inept system
> 		administration, etc.) can be catastrophic.
   ==> Well, I am an experienced system administrator from my graduate
	student days, so I don't expect any trouble there.  The IBM AIX
	filesystem is supposed to be much less corruptible than the standard
	System V or BSD filesystems, though I have not looked into that 
	in detail yet.

> 	I'm playing devil's advocate here, in a similar situation I'd make the
> same choice you did (and probably quicker too since I am a fan of decentralized
> computing).
> -- 
> 					Scott Leadley - leadley@cc.rochester.edu
--
John D. McCalpin			mccalpin@perelandra.cms.udel.edu
Assistant Professor			mccalpin@vax1.udel.edu
College of Marine Studies, U. Del.	J.MCCALPIN/OMNET