john%ghostwheel.unm.edu@ariel.unm.edu (John Prentice) (01/08/91)
I have a SIMD question motivated by the CM-2. We are doing finite difference calculations on a simple structured grid. It is trivial to port our code to the CM-2 to do the difference stencil since it is inherently a SIMD type problem. However, different nodes in our mesh use different equations of state. Some are tabular, some are analytic, some are iterative. Clearly you can't perform all these calculations in lock step. So, I assume we have to turn off all processors except those using a specific equation of state, do the EOS calculations for those nodes, and then turn those processors off and turn on those for the next EOS, etc... until we have exhausted all the types of EOS we need. Is there a better way to do this? John K. Prentice Amparo Corporation john@unmfys.unm.edu
pjn@cs.UMD.EDU (P. J. Narayanan) (01/09/91)
<I have a SIMD question motivated by the CM-2. We are doing finite <difference calculations on a simple structured grid. It is trivial to <port our code to the CM-2 to do the difference stencil since it is <inherently a SIMD type problem. However, different nodes in our mesh <use different equations of state. Some are tabular, some are <analytic, some are iterative. Clearly you can't perform all these <calculations in lock step. So, I assume we have to turn off all processors <except those using a specific equation of state, do the EOS calculations for <those nodes, and then turn those processors off and turn on those for the next <EOS, etc... until we have exhausted all the types of EOS we need. Is there <a better way to do this? I am afraid there isn't really a better way to do it, if the individual methods to compute equations of state are quite different, as you seem to imply. Each PE of the Connection Machine can communicate independently with any other PE. There is a trick by which each can address its own memory (an array in it) independently of one another also. But all PEs are constrained to do identical operations (add/subtract/memory read etc), or none at all, at all times. If the computations are not *very* different from each other, you may be able to "parametrize" them into a table of values and bring the computation into a pure SIMD framework. For instance, compute all logical functions using AND and NOT alone, with the right values taken from a table and in the right order. Similarly, subtraction can be avoided by using a multiplication by a table value and addition -- use 1 as the table value to get addition, and -1 to get subtraction etc. I don't think anything more complicated than toy-programs can be transformed efficiently this way. <John K. Prentice P J Narayanan
stein@dhw68k.cts.com (Rick 'Transputer' Stein) (01/09/91)
In article <12515@hubcap.clemson.edu> john%ghostwheel.unm.edu@ariel.unm.edu (John Prentice) writes: >inherently a SIMD type problem. However, different nodes in our mesh >use different equations of state. Some are tabular, some are >analytic, some are iterative. Clearly you can't perform all these >calculations in lock step. So, I assume we have to turn off all processors >except those using a specific equation of state, do the EOS calculations for >those nodes, and then turn those processors off and turn on those for the next >EOS, etc... until we have exhausted all the types of EOS we need. Is there >a better way to do this? >John K. Prentice If I understand SIMD class computation the way I think (and this always suspect, at least at this time), you have two options: 1) either all cpus execute the same instruction, 2) or, they ignore it. Hence, you've got to do one set of equations (or, equivalently operate) on a type of data stream on x processors, feed them a no-op, and then do the other data stream on y processors, while no-oping the other x cpus. Apologies if this is b.s. (What can you say, I'm a MIMD-type at heart). -- Richard M. Stein (aka, Rick 'Transputer' Stein) Sole proprietor of Rick's Software Toxic Waste Dump and Kitty Litter Co. "You build 'em, we bury 'em." uucp: ...{spsd, zardoz, felix}!dhw68k!stein
alan@msc.edu (Alan Klietz) (01/09/91)
>From: john%ghostwheel.unm.edu@ariel.unm.edu (John Prentice) > >I have a SIMD question motivated by the CM-2. We are doing finite >difference calculations on a simple structured grid. It is trivial to >port our code to the CM-2 to do the difference stencil since it is >inherently a SIMD type problem. However, different nodes in our mesh >use different equations of state. Some are tabular, some are >analytic, some are iterative. Clearly you can't perform all these >calculations in lock step. So, I assume we have to turn off all processors >except those using a specific equation of state, do the EOS calculations for >those nodes, and then turn those processors off and turn on those for the next >EOS, etc... until we have exhausted all the types of EOS we need. Is there >a better way to do this? There a few of approaches you can take, depending on how much rewriting of your code you willing to do. First, you could use a more general solution method which may be more expensive to solve but is applicable to all nodes. For example, I wrote a ray tracer on the CM-2 that originally iterated over each type of surface (cone, sphere, plane, etc.) I discovered that it was faster to just solve the general 3-D quartic equation for all surfaces rather than using a n equations for each special surface. In general, you could start from physical principles and derive a new equation or you could do it the brute force way: collect all terms in each (original) expression into a single large polynomial where various coefficients are zero on different processors. Second, you could use the voting trick that Frye and Myczkowski used to solve Conway's blocks-in-a-box puzzle. It only works if your states are dynamic (perhaps not suitable in your case, but it's a neat concept anyway :-). Each processor 'votes' for the state it would like to see executed and whichever state wins a plurality of votes wins. Some of the winning processors change state as a result, and the cycle is repeated. Watch out for deadlocks, though. Third, you could wait a couple years for MSIMD machines to come out :-). -- Alan E. Klietz Minnesota Supercomputer Center, Inc. 1200 Washington Avenue South Minneapolis, MN 55415 Ph: +1 612 626 1734 Internet: alan@msc.edu "If only the CM-2 acted more like a Cray and a Cray acted more like the CM-2."
mccalpin@perelandra.cms.udel.edu (John D. McCalpin) (01/09/91)
> On 8 Jan 91 13:49:05 GMT, john@ghostwheel.unm.edu (John Prentice) said:
John> I have a SIMD question motivated by the CM-2. We are doing
John> finite difference calculations on a simple structured grid.
John> [....] However, different nodes in our mesh use different
John> equations of state. Some are tabular, some are analytic, some
John> are iterative. Clearly you can't perform all these calculations
John> in lock step. So, I assume we have to turn off all processors
John> except those using a specific equation of state, do the EOS
John> calculations for those nodes, and then turn those processors off
John> and turn on those for the next EOS, etc... until we have
John> exhausted all the types of EOS we need. Is there a better way
John> to do this?
Not easily!
I talked with Danny Hillis about this last year. In his original
plan, there were to be multiple instruction streams operating
simultaneously. Each processor would have a few bits telling it which
instruction stream to process. In the end they decided that most
cases could be handled by a simple masking bit, so that is what got
put it ---- I guess it was a lot cheaper and it had the advantage of
not requiring any more bandwidth from the front end.
I was interested at the time because of the possibility of
implementing boundary condition code using alternate instruction
streams, but we also had users who had locally changing physics (cloud
modellers) who needed essentially the same thing that you are
discussing....
I still think that having several instruction streams (4 or 8 ?) would
greatly increase the flexibility of the machine. It would allow a lot
of the benefits of MIMD processing with all the advantages of the SIMD
architecture....
--
John D. McCalpin mccalpin@perelandra.cms.udel.edu
Assistant Professor mccalpin@brahms.udel.edu
College of Marine Studies, U. Del. J.MCCALPIN/OMNET
pratt@cs.stanford.edu (Vaughan Pratt) (01/10/91)
I talked with Danny Hillis about this last year. In his original plan, there were to be multiple instruction streams operating simultaneously. Each processor would have a few bits telling it which instruction stream to process. In the end they decided that most cases could be handled by a simple masking bit, I'm all for fine-grain parallelism so long as each grain has a grain of common sense---say around IQ 130 or so. Grains this smart have a substantial volume today, but they'll get finer. Who will make this fine grain, said the little red hen. -v