[comp.arch] parallel pipelines

levisonm@qucis.queensu.CA (Mark Levison) (06/29/89)

    I have been wondering for a while now why the only pipeline
designs that I have seen are all sequential.

ie    |
      |  stage 1 (3 clock cycles)
      v
      - <-----  buffer 1 (1 clock cycle delay)
      |
      |  stage 2 (6 clock cycles)
      v
      - <-----  buffer 2 (1 clock cycle delay)
      |
      |  stage 3 (2 clock cycles)
      v

    The traditional approach to solving the problem that stage 2 doubles
the pipeline cycle time is to split it into two stages.

ie    |
      |  stage 1 (3 clock cycles)
      v
      - <-----  buffer 1 (1 clock cycle delay)
      |
      |  stage 2 (3 clock cycles)
      v
      - <-----  buffer 2 (1 clock cycle delay)
      |
      |  stage 2.5 (3 clock cycles)
      v
      - <-----  buffer 3 (1 clock cycle delay)
      |
      |  stage 3 (2 clock cycles)
      v

   But this has two potential problems, first it may not be possible
to break the original stage 2 into two parts (and thereby increases
the latency). The second problem is it adds an additional one clock
cycle buffer delay. So why not create 2 instances of stage 2 and time
multiplex them. So that item one goes through the (a) stage 2 and the
second item goes through (b) stage 2. When the third item is ready the (a)
stage 2 is free and so on.


      |
      |  stage 1 (3 clock cycles)
      v
     ----    buffer delay (1 clock cycle)
   |      |
   | (a)  | (b) stage 2 (6 clock cycles)
   v      v
     ----    buffer delay (1 clock cycle)
      |
      |  stage 3 (2 clock cycles)
      v

   The big problem that I can see with this is the additional silicon
(or GaAs (sp?)) that having two copies of stage 2 and the multiplexor
is going to entail.

   Has anyone seen this sort of pipeline in the real world or even
proposed on paper or have I missed a really big flaw.

Mark Levison
levisonm@qucis.queensu.ca  | These are my opinions and no one else's.
Computer Science Dept      | -----------------------------------------
Queen's University         |
Kingston, Ont              |    Someone who thinks time pyramids are
Canada                     | neat (better than digital watches).

slackey@bbn.com (Stan Lackey) (06/30/89)

In article <209@qusunb.queensu.CA> levisonm@qucis.queensu.CA (Mark Levison) writes:
>    I have been wondering for a while now why the only pipeline
>designs that I have seen are all sequential.

>    The traditional approach to solving the problem that stage 2 doubles
>the pipeline cycle time is to split it into two stages.

>   But this has two potential problems, first it may not be possible
>to break the original stage 2 into two parts (and thereby increases
>the latency). The second problem is it adds an additional one clock
>cycle buffer delay. So why not create 2 instances of stage 2 and time
>multiplex them.

The Alliant vector processor in the FX/8 does this.  The floating
point multiplier took two clocks for double prec. multiply.  The
reason the multiplier wasn't broken into two parts was because the
chip used the same multiply array twice.  We simply used two of them,
connected in parallel, one for even operands and one for odd.

I don't know of other implementations of this strategy, but that
doesn't mean there aren't any.
-Stan