[comp.parallel] CM-2 SIMD question

john%ghostwheel.unm.edu@ariel.unm.edu (John Prentice) (01/08/91)

I have a SIMD question motivated by the CM-2.  We are doing finite
difference calculations on a simple structured grid.  It is trivial to
port our code to the CM-2 to do the difference stencil since it is
inherently a SIMD type problem.  However, different nodes in our mesh
use different equations of state.  Some are tabular, some are
analytic, some are iterative.  Clearly you can't perform all these
calculations in lock step.  So, I assume we have to turn off all processors
except those using a specific equation of state, do the EOS calculations for 
those nodes, and then turn those processors off and turn on those for the next
EOS, etc... until we have exhausted all the types of EOS we need.  Is there
a better way to do this?


John K. Prentice
Amparo Corporation
john@unmfys.unm.edu

pjn@cs.UMD.EDU (P. J. Narayanan) (01/09/91)

<I have a SIMD question motivated by the CM-2.  We are doing finite
<difference calculations on a simple structured grid.  It is trivial to
<port our code to the CM-2 to do the difference stencil since it is
<inherently a SIMD type problem.  However, different nodes in our mesh
<use different equations of state.  Some are tabular, some are
<analytic, some are iterative.  Clearly you can't perform all these
<calculations in lock step.  So, I assume we have to turn off all processors
<except those using a specific equation of state, do the EOS calculations for 
<those nodes, and then turn those processors off and turn on those for the next
<EOS, etc... until we have exhausted all the types of EOS we need.  Is there
<a better way to do this?

	I am afraid there isn't really a better way to do it, if the
individual methods to compute equations of state are quite different,
as you seem to imply. Each PE of the Connection Machine can
communicate independently with any other PE. There is a trick by which
each can address its own memory (an array in it) independently of one
another also. But all PEs are constrained to do identical operations
(add/subtract/memory read etc), or none at all, at all times.

	If the computations are not *very* different from each other,
you may be able to "parametrize" them into a table of values and bring
the computation into a pure SIMD framework. For instance, compute all
logical functions using AND and NOT alone, with the right values taken
from a table and in the right order. Similarly, subtraction can be
avoided by using a multiplication by a table value and addition -- use
1 as the table value to get addition, and -1 to get subtraction etc. I
don't think anything more complicated than toy-programs can be
transformed efficiently this way.

<John K. Prentice

P J Narayanan

stein@dhw68k.cts.com (Rick 'Transputer' Stein) (01/09/91)

In article <12515@hubcap.clemson.edu> john%ghostwheel.unm.edu@ariel.unm.edu (John Prentice) writes:
>inherently a SIMD type problem.  However, different nodes in our mesh
>use different equations of state.  Some are tabular, some are
>analytic, some are iterative.  Clearly you can't perform all these
>calculations in lock step.  So, I assume we have to turn off all processors
>except those using a specific equation of state, do the EOS calculations for 
>those nodes, and then turn those processors off and turn on those for the next
>EOS, etc... until we have exhausted all the types of EOS we need.  Is there
>a better way to do this?
>John K. Prentice
If I understand SIMD class computation the way I think (and this always
suspect, at least at this time), you have two options:

1) either all cpus execute the same instruction,
2) or, they ignore it.

Hence, you've got to do one set of equations (or, equivalently operate)
on a type of data stream on x processors, feed them a no-op, and then do
the other data stream on y processors, while no-oping the other x cpus.

Apologies if this is b.s. (What can you say, I'm a MIMD-type at heart).
-- 
Richard M. Stein (aka, Rick 'Transputer' Stein)
Sole proprietor of Rick's Software Toxic Waste Dump and Kitty Litter Co.
"You build 'em, we bury 'em." uucp: ...{spsd, zardoz, felix}!dhw68k!stein

alan@msc.edu (Alan Klietz) (01/09/91)

>From: john%ghostwheel.unm.edu@ariel.unm.edu (John Prentice)
>
>I have a SIMD question motivated by the CM-2.  We are doing finite
>difference calculations on a simple structured grid.  It is trivial to
>port our code to the CM-2 to do the difference stencil since it is
>inherently a SIMD type problem.  However, different nodes in our mesh
>use different equations of state.  Some are tabular, some are
>analytic, some are iterative.  Clearly you can't perform all these
>calculations in lock step.  So, I assume we have to turn off all processors
>except those using a specific equation of state, do the EOS calculations for 
>those nodes, and then turn those processors off and turn on those for the next
>EOS, etc... until we have exhausted all the types of EOS we need.  Is there
>a better way to do this?

There a few of approaches you can take, depending on how much rewriting of 
your code you willing to do.

First, you could use a more general solution method which may be
more expensive to solve but is applicable to all nodes.  For example,
I wrote a ray tracer on the CM-2 that originally iterated over each type of
surface (cone, sphere, plane, etc.)  I discovered that it was faster
to just solve the general 3-D quartic equation for all surfaces rather than
using a n equations for each special surface.  In general, you could
start from physical principles and derive a new equation or you could do
it the brute force way: collect all terms in each (original) expression 
into a single large polynomial where various coefficients are zero
on different processors.

Second, you could use the voting trick that Frye and Myczkowski
used to solve Conway's blocks-in-a-box puzzle.  It only works
if your states are dynamic (perhaps not suitable in your case,
but it's a neat concept anyway :-).  Each processor 'votes' for the state
it would like to see executed and whichever state wins a plurality
of votes wins.  Some of the winning processors change state as a
result, and the cycle is repeated.  Watch out for deadlocks, though.

Third, you could wait a couple years for MSIMD machines to come out :-).

--
Alan E. Klietz
Minnesota Supercomputer Center, Inc.
1200 Washington Avenue South
Minneapolis, MN  55415
Ph: +1 612 626 1734    Internet: alan@msc.edu
				"If only the CM-2 acted more like a Cray 
				 and a Cray acted more like the CM-2."

mccalpin@perelandra.cms.udel.edu (John D. McCalpin) (01/09/91)

> On 8 Jan 91 13:49:05 GMT, john@ghostwheel.unm.edu (John Prentice) said:

John> I have a SIMD question motivated by the CM-2.  We are doing
John> finite difference calculations on a simple structured grid.
John> [....]  However, different nodes in our mesh use different
John> equations of state.  Some are tabular, some are analytic, some
John> are iterative.  Clearly you can't perform all these calculations
John> in lock step.  So, I assume we have to turn off all processors
John> except those using a specific equation of state, do the EOS
John> calculations for those nodes, and then turn those processors off
John> and turn on those for the next EOS, etc... until we have
John> exhausted all the types of EOS we need.  Is there a better way
John> to do this?

Not easily!

I talked with Danny Hillis about this last year.  In his original
plan, there were to be multiple instruction streams operating
simultaneously.  Each processor would have a few bits telling it which
instruction stream to process.  In the end they decided that most
cases could be handled by a simple masking bit, so that is what got
put it ---- I guess it was a lot cheaper and it had the advantage of
not requiring any more bandwidth from the front end. 

I was interested at the time because of the possibility of
implementing boundary condition code using alternate instruction
streams, but we also had users who had locally changing physics (cloud
modellers) who needed essentially the same thing that you are
discussing....

I still think that having several instruction streams (4 or 8 ?) would
greatly increase the flexibility of the machine.  It would allow a lot
of the benefits of MIMD processing with all the advantages of the SIMD
architecture....
--
John D. McCalpin			mccalpin@perelandra.cms.udel.edu
Assistant Professor			mccalpin@brahms.udel.edu
College of Marine Studies, U. Del.	J.MCCALPIN/OMNET

pratt@cs.stanford.edu (Vaughan Pratt) (01/10/91)

	I talked with Danny Hillis about this last year.  In his
	original plan, there were to be multiple instruction streams
	operating simultaneously.  Each processor would have a few bits
	telling it which instruction stream to process.  In the end
	they decided that most cases could be handled by a simple
	masking bit,

I'm all for fine-grain parallelism so long as each grain has a grain of
common sense---say around IQ 130 or so.

Grains this smart have a substantial volume today, but they'll get
finer.

Who will make this fine grain, said the little red hen.
-v