[comp.arch] MIMD on Connection Machine

barmar@think.COM (Barry Margolin) (08/10/89)

In article <5796@pt.cs.cmu.edu> jps@cat.cmu.edu (James Salsman) writes:
>In article <2238@jhunix.HCF.JHU.EDU> ins_atge@jhunix.UUCP (Thomas G Edwards) writes:
>> And just because every processor gets the same instruction feed, one must not
>> think that every processor is "doing the same thing." Each CM processor
>> can hold an index to an array located in that processor,
>> so with the right software, the CM could become a MIMD machine.  
>
>Please do not attempt this.  I have tried for months to
>optimize a MIMD-emulator on the CM-2 with very disappointing
>results.  It might be a fun class project for a bunch of
>novice hackers learning parallelism.

Right.  I assume Thomas was thinking of keeping a macro-instruction
stream in the CM processor memory and stepping through that using the
indirect addressing feature.  You could then implement a parallel
simulator to interpret the macro-instruction stream.  However, only a
small part of that interpreter can actually run fully parallel.  Once
the macro-instruction is decoded and it's time to actually execue it,
you essentially have a big CASE (or, in *Lisp syntax, *COND)
statement.  Each branch will be executed in parallel by all the
processors that pass its associated test, but you still have to run
through all the tests sequentially.  The code might look something
like:

(*cond ((load-op-p opcode) (do-load operands))
       ((store-op-p opcode) (do-store operands))
       ((add-op-p opcode) (do-add operands))
       ...)

This will first do all the simultaneous LOAD operations in parallel,
then all the simultaneous STOREs, then all the simultaneous ADDs, etc.
Therefore, the speed of this MIMD simulator will be O(n), where n is
the number of different opcodes.  I can think of some ways to adjust
the above tests dynamically to change it to O(m), where m is the
average number of different opcodes being executed simultaneously, but
that's still not very good.

It'll never be a very good MIMD machine.  The individual processors
aren't very powerful.  It might be reasonable for research on
programming techniques for use on kiloprocessor MIMD machines until
the real things come along (it's probably better than trying to
simulate them on uniprocessors, since the performance there is O(p),
where p is tne number of processors being simulated).

Barry Margolin
Thinking Machines Corp.

barmar@think.com
{uunet,harvard}!think!barmar

rjc@maui.cs.ucla.edu (Robert Collins) (08/10/89)

In article <26356@news.Think.COM> barmar@kulla.UUCP (Barry Margolin) writes:
>In article <5796@pt.cs.cmu.edu> jps@cat.cmu.edu (James Salsman) writes:
>>In article <2238@jhunix.HCF.JHU.EDU> ins_atge@jhunix.UUCP (Thomas G Edwards) writes:
>>> And just because every processor gets the same instruction feed, one must not
>>> think that every processor is "doing the same thing." Each CM processor
>>> can hold an index to an array located in that processor,
>>> so with the right software, the CM could become a MIMD machine.  
>>
>>Please do not attempt this.  I have tried for months to
>>optimize a MIMD-emulator on the CM-2 with very disappointing
>>results.  It might be a fun class project for a bunch of
>>novice hackers learning parallelism.

Please do attempt this.  It is both kinda fun and definitely will
teach you a whole lot about SIMD programming and performance
issues.  It is pretty easy to get something running.  I think
that the first major problem that you will hit is:  What the hell
do you do with 64k MIMD processors??? (that wouldn't be easier to
do with SIMD).

> [ ... ]
>This will first do all the simultaneous LOAD operations in parallel,
>then all the simultaneous STOREs, then all the simultaneous ADDs, etc.
>Therefore, the speed of this MIMD simulator will be O(n), where n is
>the number of different opcodes.  I can think of some ways to adjust
>the above tests dynamically to change it to O(m), where m is the
>average number of different opcodes being executed simultaneously, but
>that's still not very good.

Right.  There are lots of things you can do to optimize the throughput
of the virtual MIMD machine, especially if you include expensive (slow),
rarely used virtual MIMD instructions, in your instruction set.  What
really matters is the `effective instruction set size' (Barry's m).

I looked into the design/performance/implementation issues for my MS
thesis (now I just have to finish writing it up!).

>It'll never be a very good MIMD machine.  The individual processors
>aren't very powerful.  It might be reasonable for research on
>programming techniques for use on kiloprocessor MIMD machines until
>the real things come along (it's probably better than trying to
>simulate them on uniprocessors, since the performance there is O(p),
>where p is tne number of processors being simulated).

I think if you are serious about doing MIMD computation, you want to
buy a MIMD computer.  If you are doing research for which a
simulator is good enough (and you have a CM sitting around), this is
a good way to go.  It will kick a** over a uniprocessor simulator.

I think the CM is advertised as 2500 MIPS.  The best I got from any of
my emulated machines was about 450 MIPS.  I typically would get
100-200 MIPS from a variety of emulated architectures/instruction set
designs.  That is between 4% and 18% of `full speed'.  All things
considered, this is not too bad (IMHO).

>Barry Margolin
>Thinking Machines Corp.
>
>barmar@think.com
>{uunet,harvard}!think!barmar

As was Barry mentioned, SIMD conditional statements can lead to really
bad performance.  What do you do if your SIMD program requires a
wide conditional?  One option is to compile the conditional statement
to virtual MIMD code, and emulated it as above.  In some cases, this
might be a big win, because you can potentially take advantage of
more of the inherent parallelism.  I view this as a new kind of SIMD
language compiler optimization.

rob collins

-------------------------------------------------------------------------------
rjc@cs.ucla.edu	                    C++/Paris on the CM2:  The only way to fly.
-------------------------------------------------------------------------------