barmar@think.COM (Barry Margolin) (08/10/89)
In article <5796@pt.cs.cmu.edu> jps@cat.cmu.edu (James Salsman) writes: >In article <2238@jhunix.HCF.JHU.EDU> ins_atge@jhunix.UUCP (Thomas G Edwards) writes: >> And just because every processor gets the same instruction feed, one must not >> think that every processor is "doing the same thing." Each CM processor >> can hold an index to an array located in that processor, >> so with the right software, the CM could become a MIMD machine. > >Please do not attempt this. I have tried for months to >optimize a MIMD-emulator on the CM-2 with very disappointing >results. It might be a fun class project for a bunch of >novice hackers learning parallelism. Right. I assume Thomas was thinking of keeping a macro-instruction stream in the CM processor memory and stepping through that using the indirect addressing feature. You could then implement a parallel simulator to interpret the macro-instruction stream. However, only a small part of that interpreter can actually run fully parallel. Once the macro-instruction is decoded and it's time to actually execue it, you essentially have a big CASE (or, in *Lisp syntax, *COND) statement. Each branch will be executed in parallel by all the processors that pass its associated test, but you still have to run through all the tests sequentially. The code might look something like: (*cond ((load-op-p opcode) (do-load operands)) ((store-op-p opcode) (do-store operands)) ((add-op-p opcode) (do-add operands)) ...) This will first do all the simultaneous LOAD operations in parallel, then all the simultaneous STOREs, then all the simultaneous ADDs, etc. Therefore, the speed of this MIMD simulator will be O(n), where n is the number of different opcodes. I can think of some ways to adjust the above tests dynamically to change it to O(m), where m is the average number of different opcodes being executed simultaneously, but that's still not very good. It'll never be a very good MIMD machine. The individual processors aren't very powerful. It might be reasonable for research on programming techniques for use on kiloprocessor MIMD machines until the real things come along (it's probably better than trying to simulate them on uniprocessors, since the performance there is O(p), where p is tne number of processors being simulated). Barry Margolin Thinking Machines Corp. barmar@think.com {uunet,harvard}!think!barmar
rjc@maui.cs.ucla.edu (Robert Collins) (08/10/89)
In article <26356@news.Think.COM> barmar@kulla.UUCP (Barry Margolin) writes: >In article <5796@pt.cs.cmu.edu> jps@cat.cmu.edu (James Salsman) writes: >>In article <2238@jhunix.HCF.JHU.EDU> ins_atge@jhunix.UUCP (Thomas G Edwards) writes: >>> And just because every processor gets the same instruction feed, one must not >>> think that every processor is "doing the same thing." Each CM processor >>> can hold an index to an array located in that processor, >>> so with the right software, the CM could become a MIMD machine. >> >>Please do not attempt this. I have tried for months to >>optimize a MIMD-emulator on the CM-2 with very disappointing >>results. It might be a fun class project for a bunch of >>novice hackers learning parallelism. Please do attempt this. It is both kinda fun and definitely will teach you a whole lot about SIMD programming and performance issues. It is pretty easy to get something running. I think that the first major problem that you will hit is: What the hell do you do with 64k MIMD processors??? (that wouldn't be easier to do with SIMD). > [ ... ] >This will first do all the simultaneous LOAD operations in parallel, >then all the simultaneous STOREs, then all the simultaneous ADDs, etc. >Therefore, the speed of this MIMD simulator will be O(n), where n is >the number of different opcodes. I can think of some ways to adjust >the above tests dynamically to change it to O(m), where m is the >average number of different opcodes being executed simultaneously, but >that's still not very good. Right. There are lots of things you can do to optimize the throughput of the virtual MIMD machine, especially if you include expensive (slow), rarely used virtual MIMD instructions, in your instruction set. What really matters is the `effective instruction set size' (Barry's m). I looked into the design/performance/implementation issues for my MS thesis (now I just have to finish writing it up!). >It'll never be a very good MIMD machine. The individual processors >aren't very powerful. It might be reasonable for research on >programming techniques for use on kiloprocessor MIMD machines until >the real things come along (it's probably better than trying to >simulate them on uniprocessors, since the performance there is O(p), >where p is tne number of processors being simulated). I think if you are serious about doing MIMD computation, you want to buy a MIMD computer. If you are doing research for which a simulator is good enough (and you have a CM sitting around), this is a good way to go. It will kick a** over a uniprocessor simulator. I think the CM is advertised as 2500 MIPS. The best I got from any of my emulated machines was about 450 MIPS. I typically would get 100-200 MIPS from a variety of emulated architectures/instruction set designs. That is between 4% and 18% of `full speed'. All things considered, this is not too bad (IMHO). >Barry Margolin >Thinking Machines Corp. > >barmar@think.com >{uunet,harvard}!think!barmar As was Barry mentioned, SIMD conditional statements can lead to really bad performance. What do you do if your SIMD program requires a wide conditional? One option is to compile the conditional statement to virtual MIMD code, and emulated it as above. In some cases, this might be a big win, because you can potentially take advantage of more of the inherent parallelism. I view this as a new kind of SIMD language compiler optimization. rob collins ------------------------------------------------------------------------------- rjc@cs.ucla.edu C++/Paris on the CM2: The only way to fly. -------------------------------------------------------------------------------