jps@cat.cmu.edu (James Salsman) (06/30/89)
I really hate having to deal with indirect addressing on most SIMD machines. I wish someone would build a SIMD array using PE's with address buffers. Just one tiny address buffer per processor is all I want... nothing fancy. As long as *ALL* the memory addresses have to come over the global instruction stream and thus are the *SAME* for each element, a lot of potential processing power is going to waste! For example, on the Connection Machine in *Lisp, indirect aref!!'s take FOR EVER. This is SERIOUSLY slowing down the Production System that I wrote in *Lisp (regardless, it's faster than CMU/Soar's Production System Machine or any other implementation of a production system that I've heard about.) TMC has added somthing called "sideways arrays" to help indirect addressing, but the *Lisp manual is totally obscure (so what else is new) and from what I can tell, it looks like "sideways" means "spread out over several physical processors." Ack/Pft! The way I would hack an address buffer in to the CM is by employing a shift register added to each PE. (1) Add a new nanoinstruction pin [or two] that selects memory input to the ALU between "Address A [or B]" from the instruction pins and the contents of the indirection register. (2) Add a new nanoinstruction pin that causes the output from the ALU to be shifted into the indirection register. That's all there is to it. Two or three new pins, a shift register, and PE memory indirection takes 13 nanocycles instead of a zillion. I am sure that a similar thing could be done to other SIMD architectures. If anybody thinks that indirect addressing is not worth a register and a couple of new nanopins. :James P.S. If anyone wants to use this idea :-) it's free -- I think patents are morally wrong. -- :James P. Salsman (jps@CAT.CMU.EDU)
prins@prins.cs.unc.edu (Jan Prins) (07/03/89)
In article <5886@hubcap.clemson.edu>, jps@cat.cmu.edu (James Salsman) writes: > I really hate having to deal with indirect addressing on > most SIMD machines. I wish someone would build a SIMD array > using PE's with address buffers. [...] Early proposals for SIMD parallel computers included indirect addressing. But when you build a massively parallel processor, the wiring to bring 65K (or however many) individual addresses out to the memories from the PEs is daunting. > The way I would hack an address buffer in to the CM is by > employing a shift register added to each PE. > > (1) Add a new nanoinstruction pin [or two] that selects > memory input to the ALU between "Address A [or B]" > from the instruction pins and the contents of the > indirection register. > > (2) Add a new nanoinstruction pin that causes the output > from the ALU to be shifted into the indirection register. > > That's all there is to it. Two or three new pins, a shift > register, and PE memory indirection takes 13 nanocycles > instead of a zillion. Where would this shift register reside? If it is on chip with the PEs, then you suddenly have a lot of extra address lines to bring off chip -- with 16 PEs, and 16 bits of local addressing that amounts to 256 extra wires! If the register is off-chip, say in the memory, then you can fill it bit-serially without extra wires but you need logic to use it as an address. I was under the impression that TMC used standard memory parts (or was that only for the CM-1?), so the latter approach would be extremely cumbersome in that setting. > I am sure that a similar thing could be done to other SIMD > architectures. [...] There are examples of massively-parallel SIMD architectures that support indirect addressing. One of them is BLITZEN, an extension of MPP that permits the contents of the PE shift register to be used as a local modification to the global address. The wiring problem is solved by placing PEs and memories on the same chip, although this approach limits the size of local memory so that very fast I/O to external memory is required. The current BLITZEN design places 128 PEs, each with 1K of local memory, per chip. > :James P. Salsman (jps@CAT.CMU.EDU) Jan Prins (prins@cs.unc.edu) Dept. of Computer Science UNC - Chapel Hill Blevins, Davis, Heaton, Reif "BLITZEN: A Highly Integrated Massively Parallel Machine", Frontiers Mass. Par. Comp. 1988. Heaton, Blevins "BLITZEN: A VLSI Array Processing Chip", IEEE CICC 1989.