[comp.arch] chewing up mips with graphics

chuck@amdahl.amdahl.com (Charles Simmons) (06/10/87)

In my recent posting, I forgot to mention that I was exposed to video
tapes of computer generated animation by Eugene Miya at NASA.  Eugene
also mentions that I didn't nearly cover the possible applications
for high-powered computers running graphics programs.  Some of the
other applications he mentions are weather forecasting, medical imaging,
CAD/CAM, computer enhancement of satellite photos, and I think there
were one or two more that I'm still forgetting.

Eugene also asks me to mention the upcoming SIGGRAPH meeting
to be held on Thursday, June 18, 1987 at the Exploratorium in San
Francisco.  The speaker will be Alvy Ray Smith, the vice president
of Pixar.  He will be talking on the history of computer graphics.
A recent posting (5 June 87) in comp.graphics gives more details.

-- Chuck

cowburn@rocky2.UUCP (06/10/87)

In article <8270@amdahl.amdahl.com>, chuck@amdahl.amdahl.com (Charles Simmons) writes:
> In my recent posting, I forgot to mention that I was exposed to video
> tapes of computer generated animation by Eugene Miya at NASA.  Eugene
> also mentions that I didn't nearly cover the possible applications
> for high-powered computers running graphics programs.  Some of the
> other applications he mentions are weather forecasting, medical imaging,
> CAD/CAM, computer enhancement of satellite photos, and I think there
> were one or two more that I'm still forgetting.
> 

perhaps another point to consider is that the need for graphics in the
applications mentioned is still just the first part of the
interpretative effort which has yet to be reduced satisfactorily to
machine activity.  Oil companies like to look at the maps before oil
wells are dug, the day when the Cray spits out "Drill Here" ( and is
accepted) is distant; the radiologist still likes to hold the film to
the lighbox, and no doubt we would all be unhappy with some SUN in the
operating room directing the surgeon where to make the first incision
for the appendectomy.  Yet these tasks are clearly reducible to machine
activity by appropriate scene analysis and "AI"-style incorporation of
preexisting expertise.  There is, of course, some severe lag in
incorporating even existing algorithms into satisfactory code for
supercomputers.

oconnor@sunray.steinmetz (Dennis Oconnor) (06/12/87)

[ Are there really line eaters? ]

Enuff on how graphics will chew MIPS. It will, but not like SIMULATION !
A revolution is occurring today in engineering, where more and more
computers, chemicals, parts and systems are thoroughly SIMULATED before
even a first prototype is built. On the horizon are even MORE complex
simulations, like the effect of drugs on biological systems, and
accurate faster-than-life atmospheric simulation. Even really simple
simulations ( like 100k-transistor CPUs ) eat MIPS. Simulation will
gobble up parrallism as well - what a bargain !

When it comes to producing new products in competive arenas, simulation
has been shown to be faster and cheaper than traditional methods. The
more MIPS you have, the more complex a system people will be trying
to simulate. So bring on the MIPS ! ( MFLOPS too, of course. )
--
	Dennis O'Connor 	oconnor@sungoddess.steinmetz.UUCP ??
				ARPA: OCONNORDM@ge-crd.com
 "Everything { used_to_have | has | will_have } a niche, even my opinions."

dzzr@beta.UUCP (06/15/87)

Does somebody REALLY think we are going to run out of work before
we run out of MIPS? Fat chance.

My group is developing large-scale discrete event simulations in
KEE (Knowledge Engineering Environment - an AI shell) with lots of
extra LISP code defining events. We are running on Symbolics 36xx machines. 
Unfortunately, we have run the technology into the ground several
times in the last few years (run times growing out of bound).
Luckily, every time we ended up against the wall, technology would
spring to the rescue with some kind of hardware speed-up or
another.

One of our current projects is causing smoke to pour out of a poor
overworked 3600: lots of events (10's of thousands in a run) PLUS
lots of graphics. Just when things were beginning to look bleak
(again) along will come relief Real Soon Now in the form of faster
hardware. Symbolics announced a new product called the Ivory
LISP-on-a-chip architecture with potential speed increases of
3-5X. Texas Instruments has announced a similar VLSI
LISP-on-a-chip product. In another hardware arena the SUN4 RISC
machine sounds promising.

We already have plans for all of these new, faster machines!

--Doug

wood@dg_rtp.UUCP (Tom Wood) (06/16/87)

The recent discussion on how to feed your poor starving CPU leaves me
a bit in the dark.  What architectural features help in dealing with
these various applications (graphics, simulation, object oriented AI,
CAD)?

For sake of discussion, I propose we normalize the O/S to Unix.  Are
these applications implemented using multiple processes?  If so, a
multiprocessor could be of help.  If not, it seems you'ld have to try
to build a "Cray": one fast monolithic processor.

Personally, I believe the 90% solution to obtaining parallelism is to
take advantage of multiple independent computations.  (It's much easier
to make 100 compiles go 100 times faster by using 100 machines than it
is to make 1 machine go 100 times faster on each compile.)

So how many fast CPUs can these applications benefit from?  My guess is
that most would have a hard time using more than 1, and that some may
be able to use 2 or so.  What would you do (short of rewriting the
application) if you were given a machine with 10-15 fast CPUs to put
on your desk?
-- 
			Tom Wood             (919) 248-6067
			Data General, Research Triangle Park, NC
			{the known world}!mcnc!rti-sel!dg_rtp!wood

franka@mmintl.UUCP (Frank Adams) (06/16/87)

In article <6328@beta.UUCP> dzzr@beta.UUCP (Douglas J Roberts) writes:
>Does somebody REALLY think we are going to run out of work before
>we run out of MIPS? Fat chance.

To be fair, the person who presented that view stated that there are plenty
of applications which will use all the processing power available, and then
some.  The question is, can we make effective use of greatly increased
processing power in common computing contexts -- the PCs on people's desks;
the Apollo and Sun workstations; the departmental VAX?

I still think the answer is yes.  It is true that a 100-fold increase in
power in the next 3 months would create a temporary overcapacity; one it
might take a couple of years to make much use of.  But I see no sign of such
a quantum leap.  As long as progress continues at its current (amazing)
rate, we will be able to use what becomes available in relatively short
order.
-- 

Frank Adams                           ihnp4!philabs!pwa-b!mmintl!franka
Ashton-Tate          52 Oakland Ave North         E. Hartford, CT 06108

herndon@umn-cs.UUCP (06/16/87)

In article <2120@dg_rtp.UUCP>, wood@dg_rtp.UUCP (Tom Wood) writes:
>                ...  What architectural features help in dealing with
> these various applications (graphics, simulation, object oriented AI,
> CAD)?
>                             ...   If not, it seems you'ld have to try
> to build a "Cray": one fast monolithic processor.

  Cray has been selling multiprocessor systems for a few years now.
Even the machines Seymour Cray designed while at CDC were multiprocessors
(central processor and lots of little peripheral processors) and have
come in dual central processor models for some time.

> 
> So how many fast CPUs can these applications benefit from?  My guess is
> that most would have a hard time using more than 1, and that some may
> be able to use 2 or so.
> 			Tom Wood             (919) 248-6067
  Many applications would benefit significantly from parallelism.
I'm informed that ray-tracing graphics is trivially parallelizable
to many many machines.  Many types of simulation are also parallelizable
(weather prediction, thermal simulations, flow simulation, ...).
Most physical simulations can be mapped onto a planar 3D mesh for
significant speed gain.
  Many applications are not easily serialized, but a surprising
number are parallelizable.  Rethinking algorithms for massive
parallelism often yields surprinsing speedups.  See the December
1986 CACM article on the connection machine for some interesting
parallel algorithms.

-- 
Robert Herndon				Dept. of Computer Science,
...!ihnp4!umn-cs!herndon		Univ. of Minnesota,
herndon@umn-cs.ARPA			136 Lind Hall, 207 Church St. SE
herndon.umn-cs@csnet-relay.ARPA		Minneapolis, MN  55455

eugene@pioneer.UUCP (06/18/87)

In article <1658@umn-cs.UUCP> herndon@umn-cs.UUCP (Robert Herndon) writes:
>
>  Cray has been selling multiprocessor systems for a few years now.
>Even the machines Seymour Cray designed while at CDC were multiprocessors
>(central processor and lots of little peripheral processors) and have
>come in dual central processor models for some time.

Evans and Sutherland has also been selling high performance multiprocessor
graphics systems for years as well.  In some ways they outperform Crays
in floating point multiplication (send inquires to esvax).

>  Many applications would benefit significantly from parallelism.
>I'm informed that ray-tracing graphics is trivially parallelizable

Trivial eh?  This is not a pursuit game.  Trivial ray examples
parallelize trivially.  You only need look at the world around you.

>to many many machines.  Many types of simulation are also parallelizable
>(weather prediction, thermal simulations, flow simulation, ...).
>Most physical simulations can be mapped onto a planar 3D mesh for
>significant speed gain.

While you use the word "most" understand, many 3-D simulations and
analyses use higher-order dimensions to hold other useful data: alpha
channel for transparency in graphics systems, homogeneous coordinate
systems, audio channel sync stuff, and so forth.  At LANL and LLNL there
are 7-D problems which is limited to the FORTRAN.  Don't restrict
your view to 3-D only because you can conceptualize in 3-D.

From the Rock of Ages Home for Retired Hackers:

--eugene miya
  NASA Ames Research Center
  eugene@ames-aurora.ARPA
  "You trust the `reply' command with all those different mailers out there?"
  "Send mail, avoid follow-ups.  If enough, I'll summarize."
  {hplabs,hao,ihnp4,decwrl,allegra,tektronix,menlo70}!ames!aurora!eugene

grunwald@uiucdcsm.cs.uiuc.edu (06/19/87)

I have to agree with Eugene Miya on the ``ray tracing is trivally parallel only
if you do trival things''.

It's true that you can just sub-divide your view space. I moved a ray tracer
to a hypercube and got almost linear speedup. However, the ``trival'' part
dies soon. Doing larger scenes (which is what's desired anyway)
requires sub-dividing your data-space as well. Suddenly, your problem is
no longer as simple as it used to be & you start to make trade-offs between
replication of data & how much data & speed-up & messaging.

But, an hour or two is faster than waiting a day for the vax to crank out a
picture.

fay@encore.UUCP (Peter Fay) (06/19/87)

In article <6240@steinmetz.steinmetz.UUCP> OCONNORDM@ge-crd.com@ARPA (Dennis Oconnor) writes:
>
>Enuff on how graphics will chew MIPS. It will, but not like SIMULATION !
>A revolution is occurring today in engineering, where more and more
>computers, chemicals, parts and systems are thoroughly SIMULATED before
>even a first prototype is built. On the horizon are even MORE complex
>simulations, like the effect of drugs on biological systems, and
>accurate faster-than-life atmospheric simulation. Even really simple
>simulations ( like 100k-transistor CPUs ) eat MIPS. Simulation will
>gobble up parrallism as well - what a bargain !
>
>When it comes to producing new products in competive arenas, simulation
>has been shown to be faster and cheaper than traditional methods. The
>more MIPS you have, the more complex a system people will be trying
>to simulate. So bring on the MIPS ! ( MFLOPS too, of course. )

I was wondering how long it would be before someone said this. There is no
question in my mind that this will become a very huge market. Imagine
simulating cache coherency of a 1000-MIP machine with 128 cpus and
hierarchical caches. And that's only what we're doing today, not a few
years from now.

There's no question that this current work would be impossible without
nice machines like our 35-MIPS multiprocessor. (Now we can run the Mach
operating system - picture one Mach simulation task running with 2000
light-weight threads of control instead of one Unix job creating 2000
processes.)

Of course it would be nice to have 30 times our current power, which is
why we're designing it.



				peter fay
				research
				encore computer

{allegra|decvax|ihnp4|linus|princeton|pur-ee|talcott}!encore!fay
				fay@multimax.arpa

bacon@alberta.UUCP (Dave Bacon) (06/23/87)

In article <2194@mmintl.UUCP>, franka@mmintl.UUCP (Frank Adams) writes:
> In article <6328@beta.UUCP> dzzr@beta.UUCP (Douglas J Roberts) writes:
> >Does somebody REALLY think we are going to run out of work before
> >we run out of MIPS? Fat chance.
> ....  It is true that a 100-fold increase in
> power in the next 3 months would create a temporary overcapacity; one it
> might take a couple of years to make much use of.  But I see no sign of such

	If the production of such machines were limited (which could
be expected with such a quantum leap). I think the one could easily
find the people who could use such machines, especially if they were 
priced in the same range as current machines and were capable of using
current software with minimal change. We had one person here, who used
~100 hours of cray time creating a molecular animation sequence, and
we would be QUIT happy to be able to do things like that in (almost)
real time.

  Granted: not EVERYBODY could make reasonable use of such machines but, if 
you give me an immensely powerful machine, I'm sure I could find at least
a dozen people who would be happy to use it within a week (assuming the 
software could be ported over that fast).

    Stephen Samuel {ihnp4,seismo!mnetor,vax135}!alberta!edm!steve

gwu@vlsi.cs.cmu.edu (George Wu) (06/23/87)

     I doubt a 100-fold increase in computing power would create an
over-capacity. If someone did come out with such a machine, he'd charge a
bundle and clean-up! Like today's supercomputers, or maybe more like the
supercomputers a few years ago, it'd be very expensive to get time on such
a machine, let alone purchase one. Access would be limited to a few customers,
those who really need the power and can afford to pay for it.

     Of course, if enough different vendors came out with such machines,
forcing the dollars per MIPs down, maybe there would be such excess capacity.

						George

eugene@pioneer.arpa (Eugene Miya N.) (06/23/87)

>In article <2194@mmintl.UUCP>, franka@mmintl.UUCP (Frank Adams) writes:
>> In article <6328@beta.UUCP> dzzr@beta.UUCP (Douglas J Roberts) writes:
>> >Does somebody REALLY think we are going to run out of work before
>> >we run out of MIPS? Fat chance.
>> ....  It is true that a 100-fold increase in
>> power in the next 3 months would create a temporary overcapacity; one it
>> might take a couple of years to make much use of.  But I see no sign of such
>

Actually, I have given some thought to this temporary overcapacity as a
useful measure.  It took 3 weeks before one of our X-MPs (A 4-CPU
upgrade) became saturated.  I think this might give a useful measurement
on the derviative of user usage.  Other sites using other smaller CPUs
reported at SIGMETRICS that their machine became saturated at the 3
month mark.  This makes a lot of assumptions, but I feel that it makes
a somewhat useful subjective measure.

From the Rock of Ages Home for Retired Hackers:

--eugene miya
  NASA Ames Research Center
  eugene@ames-aurora.ARPA
  "You trust the `reply' command with all those different mailers out there?"
  "Send mail, avoid follow-ups.  If enough, I'll summarize."
  {hplabs,hao,ihnp4,decwrl,allegra,tektronix,menlo70}!ames!aurora!eugene

johnw@astroatc.UUCP (John F. Wardale) (06/24/87)

In article <2120@dg_rtp.UUCP> wood@dg_rtp.UUCP (Tom Wood) writes:
>Personally, I believe the 90% solution to obtaining parallelism is to
>take advantage of multiple independent computations.  (It's much easier
>to make 100 compiles go 100 times faster by using 100 machines than it
>is to make 1 machine go 100 times faster on each compile.)

This may be true, but for most "real" problems, some well know
person determined that the average code spends 90% of its time
executing 10% of its code.

This and other related studys show that a large fraction of
problems that have no, or very limited parrallelism.

A couple mounths ago there was some discussion of somebody's
challenge (with a moderate cash prize) .... As I recall, you had
to speed up a general problem (not limited to HIS problem set, but
he could reject anything that was "embarrisingly parrallel" [like
the 100 compiles example]) by a factor of 100, and you could use
as many processors as you wanted.   Did anyone save any of these?
Has anyone won the prize yet?




			John W

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
Name:	John F. Wardale
UUCP:	... {seismo | harvard | ihnp4} !uwvax!astroatc!johnw
arpa:   astroatc!johnw@rsch.wisc.edu
snail:	5800 Cottage Gr. Rd. ;;; Madison WI 53716
audio:	608-221-9001 eXt 110

To err is human, to really foul up world news requires the net!

hwe@beta.UUCP (Skip Egdorf) (06/25/87)

In article <1873@ames.UUCP>, eugene@pioneer.arpa (Eugene Miya N.) writes:
> Actually, I have given some thought to this temporary overcapacity as a
> useful measure.  It took 3 weeks before one of our X-MPs (A 4-CPU
> upgrade) became saturated.  I think this might give a useful measurement
> on the derviative of user usage.  Other sites using other smaller CPUs
> reported at SIGMETRICS that their machine became saturated at the 3
> month mark.  This makes a lot of assumptions, but I feel that it makes
> a somewhat useful subjective measure.
> 
> From the Rock of Ages Home for Retired Hackers:
> 
> --eugene miya
>   NASA Ames Research Center
>   eugene@ames-aurora.ARPA

When Los Alamos went from 6 to 7 Crays a few years back, the Cray folks
said that we did something that they had never seen...
Gone from 6 saturated Crays at 6 PM to 7 saturated Crays at 6:30 the next
morning.

Give me a hundred mips on my desk tomorrow, and I will use it up on BOTH
graphics and simulation.
				Skip Egdorf
				hwe@lanl.gov

bradley@think.uucp (Bradley Kuszmaul) (06/25/87)

In article <337@astroatc.UUCP> johnw@astroatc.UUCP (John F. Wardale) writes:
>This may be true, but for most "real" problems, some well know
>person determined that the average code spends 90% of its time
>executing 10% of its code.
>
>This and other related studys show that a large fraction of
>problems that have no, or very limited parrallelism.
Actually, I interpret the 90%-10% rule as indicating that there might be
a lot of parallelism in problems.  Since 90% of the time is spent
executing a very small amount of code, it seems likely that there is a
lot of data involved.  It is also likely that the dependencies between
data are something smaller than "completely connected", and so it is
likely that different parts of the data can be processed in parallel.

Now, I am the first to admit that everything I just said has a lot of
"maybes" and "likelys" in it, and while I personally believe that in fact
there is a lot of parallelism lying around, I would not expect you to
believe it based on such a weak argument.

My feeling is that the lesson to be learned here is not "There is X
amount of parallelism lying around in current applications".  The lesson
is "Rules of thumb, such as Amdahl's Law, don't have much to say about
parallel computing".

>A couple mounths ago there was some discussion of somebody's
>challenge (with a moderate cash prize) .... As I recall, you had
>to speed up a general problem (not limited to HIS problem set, but
>he could reject anything that was "embarrisingly parrallel" [like
>the 100 compiles example]) by a factor of 100, and you could use
>as many processors as you wanted.   Did anyone save any of these?
>Has anyone won the prize yet?

As far as I know, no one has seriously tried to solve the Karp problem.
I think it is only a matter of time.  The problem is that in order to
even have a chance at solving the Karp problem, you have to have at least
100 processors in your parallel machine (otherwise no way are you going
to get a factor of 100 speedup).  Most of the machines which have <1000
processors in them (e.g. BBN butterfly or Intel IPSC) have relatively
slow communciations networks, and thus find it difficult not to lose a
factor of 10 due to that sort of inefficiency.  If we assume (handwave
handwave) that parallel computing carrys come constant performance cost
(say a factor of 10) then we need at least 1000 processors.  There are
very few machines with >1000 processors in them, most of them are not
quite general purpose enough to solve the problem.

Company plug:  The Connection Machine, which has 64000 processors in it,
probably has the best chance of solving the Karp problem, but it seems a
little unfair to compare 64000 one bit processors to a single one bit
processor (for a performance improvement of 64000) or to compare a 64000
processor Connection Machine to a Cray (which is made out of much more
expensive technology).  My feeling is that the Karp problem should have
been phrased as "compare your parallel machine with a single processor
serial machine made out of about the same amount of hardware" (or perhaps
the same cost?).  This would pit a Connection Machine against about a
third of a Cray 1, or perhaps a couple of Vax 8800's or a high end IBM mainframe.
There are several applications for which the Connection Machine which can
beat those machines by an order of magnitude.  One can imagine that there
are some applications for which the Connection Machine is not well
suited, but, aside from simulating a network of Cray computers, I can't
think of any :-)
 -Bradley




Bradley C. Kuszmaul, Thinking Machines Corporation, Cambridge, MA
 bradley@think.com
 bradley@think.uucp (i.e. seismo!think!bradleWit

mac@uvacs.CS.VIRGINIA.EDU (Alex Colvin) (06/25/87)

> 
> When Los Alamos went from 6 to 7 Crays a few years back, the Cray folks
> said that we did something that they had never seen...
> Gone from 6 saturated Crays at 6 PM to 7 saturated Crays at 6:30 the next
> morning.

Some load isn't spread out by adding processors.  I hear that the Los Almos
Crays spend much of their time processing characters in the tty drivers
and screen editors.

pase@ogcvax.UUCP (Douglas M. Pase) (06/26/87)

	In article <2120@dg_rtp.UUCP> wood@dg_rtp.UUCP (Tom Wood) writes:

	Personally, I believe the 90% solution to obtaining parallelism is to
	take advantage of multiple independent computations.  (It's much easier
	to make 100 compiles go 100 times faster by using 100 machines than it
	is to make 1 machine go 100 times faster on each compile.)

I hope I'm not missing the point, but I think you're off by 10% -- that is,
the only approach to parallelism is, by definition, taking advantage of
multiple independent computations.  There's lots of levels to choose from, not
just one.  What you have mentioned here (with the 100 compiles) is parallelism
at the process level.  This approach is the easiest, the best understood, and 
many vendors provide commercial products which successfully take advantage of
this type of parallelism.  Honeywell has been doing this for years with CP-6,
and Sequent and Apollo are two newer entries.  (Oh, I see.  I bet this level
is what you meant by "independent" -- correct me if I still misunderstand.)
The advantage of this level is that the overhead required to parallelize the
computation is relatively small, and controlled by the system (eg in system
locks and resource scheduling) - not introduced into the computation itself.

The next level has multiple co-operating tasks, as in a producer/consumer
relationship and similar approaches.  At this level the overhead is built
directly into the application.  Sometimes the overhead required to distribute
sufficient information to run multiple parallel tasks cancels any benefits
that might have accrued.  Keller's Readyflow system operates at this level.
The Cray-X-MP, Sequent, Alliant, and some other shared memory machines can be
operated at this level using some form of microtasking.  All distributed memory
machines (such as the Intel Hypercube and NCube's machine) are operated at this
level.  (By the way, "large grain dataflow" is at this level.)

Another level is the instruction level.  At this level, instructions are
scheduled independently and in parallel.  The MIT dataflow machine is an
example of this.  Operands accumulate in a "waiting-matching store" until
all operands required by an operator have accumulated.  At that time the
operator and its operands are placed in a queue, and executed as soon as a
processor becomes available.  The Manchester dataflow machine works very
similarly to the MIT machine.  The Cray machines also take advantage of this
level of parallelism.

Perhaps the bottom level is the microcode level.  Any machine (such as the
DEC 8600 series) which pipelines its microcode is executing in parallel.
The Goodyear MPP is the machine which offers the most parallelism at this
level (although it's not exactly pipelined).  It weighs in at 16K 8-bit
processors.

	In article <astroatc.337> johnw@astroatc.UUCP (John F. Wardale) writes:

	This may be true, but for most "real" problems, some well know
	person determined that the average code spends 90% of its time
	executing 10% of its code.

	This and other related studys show that a large fraction of
	problems that have no, or very limited parrallelism.

As you have stated the study, it does *not* support your conclusion.  Tight
loops in FORTRAN (shudder) programs may often be parallelized, by pipelining
instructions.  Kuck's work on parallelizing compilers have shown amazing
improvements can be gained by pipelining and vectorizing DO loops.  (Yes,
both are a form of parallelism.)  It is the data dependencies which determine
the available parallelism, not the size of the code.

	A couple mounths ago there was some discussion of somebody's
	challenge (with a moderate cash prize) .... As I recall, you had
	to speed up a general problem (not limited to HIS problem set, but
	he could reject anything that was "embarrisingly parrallel" [like
	the 100 compiles example]) by a factor of 100, and you could use
	as many processors as you wanted.   Did anyone save any of these?
	Has anyone won the prize yet?

				John W
	- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
	Name:	John F. Wardale
	UUCP:	... {seismo | harvard | ihnp4} !uwvax!astroatc!johnw
	arpa:   astroatc!johnw@rsch.wisc.edu
	snail:	5800 Cottage Gr. Rd. ;;; Madison WI 53716
	audio:	608-221-9001 eXt 110

This seems pretty easy to me.  Any fluid modeling or simulation problem such
as numerical weather forcasting, dynamic air-flow analysis, oceanographic
simulation, planet/galaxy formation, gas dispersion, etc., would benefit
a lot from just about any level of parallelism.  If this problem set isn't
sufficiently "real", how about finite-element analysis, or image processing?
I would rather solve any of these problems on the MPP than any single 8-bit
processor.
--
Doug Pase   --   ...ucbvax!tektronix!ogcvax!pase  or  pase@Oregon-Grad (CSNet)

fu@hc.DSPO.GOV (Castor L. Fu) (06/26/87)

In article <1613@uvacs.CS.VIRGINIA.EDU> mac@uvacs.CS.VIRGINIA.EDU (Alex Colvin) writes:
>> 
>> When Los Alamos went from 6 to 7 Crays a few years back, the Cray folks
>> said that we did something that they had never seen...
>> Gone from 6 saturated Crays at 6 PM to 7 saturated Crays at 6:30 the next
>> morning.
>
>Some load isn't spread out by adding processors.  I hear that the Los Almos
>Crays spend much of their time processing characters in the tty drivers
>and screen editors.

Gosh, as far as I knew, there aren't any screen editors on the crays.
In fact, it is quite common for people to do most of their code preparation
on other computers which have friendly editors o-f46

bs@linus.UUCP (Robert D. Silverman) (06/26/87)

>The Goodyear MPP is the machine which offers the most parallelism at this
>level (although it's not exactly pipelined).  It weighs in at 16K 8-bit
>processors.
 
Slight correction here. The MPP consists of 16K 1-bit processors, not 8.

                                                                         
                                                                       
Bob Silverman

baum@apple.UUCP (Allen J. Baum) (06/29/87)

--------
[]

>Gosh, as far as I knew, there aren't any screen editors on the crays.
>In fact, it is quite common for people to do most of their code preparation
>on other computers which have friendly editors etc.

Well, now you know differently. We are running VI on ours,  and we may
be getting EMACS in the near future

--
{decwrl,hplabs,ihnp4}!nsc!apple!baum		(408)973-3385

johnw@astroatc.UUCP (John F. Wardale) (06/29/87)

In article <1323@ogcvax.UUCP> pase@ogcvax.UUCP (Douglas M. Pase) writes:
>	In article <2120@dg_rtp.UUCP> wood@dg_rtp.UUCP (Tom Wood) writes:
>	to make 100 compiles go 100 times faster by using 100 machines than it

>	In article <astroatc.337> (ME) johnw@astroatc.UUCP (John F. Wardale) writes:
>
>	This and other related studys show that a large fraction of
>	problems that have no, or very limited parrallelism.
>
>As you have stated the study, it does *not* support your conclusion.  Tight
>loops in FORTRAN (shudder) programs may often be parallelized, by pipelining
>instructions.  Kuck's work on parallelizing compilers have shown amazing
>improvements can be gained by pipelining and vectorizing DO loops.  (Yes,
>both are a form of parallelism.)  It is the data dependencies which determine
>the available parallelism, not the size of the code.

Right, and *MOST* programs have dependencies that limit the paralleliam to
one or two digits worth.  Tom's original posting was looking for three
digits worth.  To get that, you need a problem that is "embarrassingly
parallel" or an amazing system (hw+sw).

>This seems pretty easy to me.  Any <long list of ap's> 
> would benefit a lot from just about any level of parallelism.

True,    ***BUT***
Can you achive >= 100x speed-up by parallelizing the examples you give.
Use as many processors as you want!  You can have a case of <beer,???>
from me if you succeed!


			John W

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
Name:	John F. Wardale
UUCP:	... {seismo | harvard | ihnp4} !uwvax!astroatc!johnw
arpa:   astroatc!johnw@rsch.wisc.edu
snail:	5800 Cottage Gr. Rd. ;;; Madison WI 53716
audio:	608-221-9001 eXt 110

To err is human, to really foul up world news requires the net!

bs@linus.UUCP (Robert D. Silverman) (07/02/87)

In article <343@astroatc.UUCP> johnw@astroatc.UUCP (John F. Wardale) writes:
>In article <1323@ogcvax.UUCP> pase@ogcvax.UUCP (Douglas M. Pase) writes:
>>	to make 100 compiles go 100 times faster by using 100 machines than it
>
>>	In article <astroatc.337> (ME) johnw@astroatc.UUCP (John F. Wardale) writes:
>>
>>	This and other related studys show that a large fraction of
>>	problems that have no, or very limited parrallelism.
>>
>>As you have stated the study, it does *not* support your conclusion.  Tight
>>loops in FORTRAN (shudder) programs may often be parallelized, by pipelining
>>instructions.  Kuck's work on parallelizing compilers have shown amazing
>>improvements can be gained by pipelining and vectorizing DO loops.  (Yes,
>>both are a form of parallelism.)  It is the data dependencies which determine
>>the available parallelism, not the size of the code.
>
>Right, and *MOST* programs have dependencies that limit the paralleliam to
>one or two digits worth.  Tom's original posting was looking for three
>digits worth.  To get that, you need a problem that is "embarrassingly
>parallel" or an amazing system (hw+sw).
 

Interestingly enough, I have an algorithm (actually two separate algorithms)
which can take FULL advantage of ALL the parallelism available. These
algorithms are the Quadratic Sieve and the Elliptic Curve Method for factoring
integers. I have demonstrated that N MIMD processors yield an N-fold speed-up
for both of these algorithms running on a distributed network of SUN-3's.

I have a paper describing the Quadratic Sieve work that will appear this
fall in the Journal of Supercomputing. My current set of hardware consists
of about 22 SUN's and they quite definitely yield a 22-fold speedup.
If I could get hold of 100 SUN-3's on a single network file system I could
meet the conditions of the Karp challenge.

The Elliptic Curve Method is a random algorithm that does some group (ring)
computations on a random elliptic curve. If the order of the group of that
curve is divisible by ONLY small primes then the algorithm will succeed in
factoring a very large number. If the group isn't divisible by only small
primes, one gives up and tries another random curve.
 
By giving separate, random curves to each processor N processors will run N
curves at once rather than one after another. A consequence is that during
the computations the processors need NOT communicate with one another.
If one processor succeeds it simply halts the others. This sort of thing
would also run very nicely on a SIMD machine. I suspect, however, that
Karp would consider this sort of thing in the category of 'trivially
parallelizable'.

Comments Please?


Bob Silverman