[sci.electronics] programmable bit permutation circuit

touch@dsl.cis.upenn.edu (Joe Touch) (11/15/90)

In article <HUGHES.90Nov14134247@locusts.Berkeley.EDU> hughes@locusts.Berkeley.EDU (Eric Hughes) writes:
>I am looking for a circuit which can arbitrarily permute a fix number
>of bits (e.g. 64).

>any such circuit which uses O(n log n) gates?

Use a Batcher-banyan organization.  It has (n/2)log(n) elements,
and can realize any permutation.  

See references on Multistage Interconnection Networks.

Joe

wos@Oxford.COM (Olin Sibert) (11/15/90)

In article <HUGHES.90Nov14134247@locusts.Berkeley.EDU> hughes@locusts.Berkeley.EDU (Eric Hughes) writes:
 > I am looking for a circuit which can arbitrarily permute a fix number
 > of bits (e.g. 64).  The particular permutation will normally be held
 > in some latches.
 > 
 > Now a crossbar switch suffices, but uses O(n^2) gates.  Is there
 > any such circuit which uses O(n log n) gates?

I believe that even if there is a O(n log n) answer (or related), it
depends on your speed requirements and what implementation technology
you're using as to what's "optimal".  When you're only talking about 64
bits, O(anything) can be quite misleading, because packaging, speed, and
parts availability considerations often dominate.

For example, it's clear how to implement arbitrarily large permutations
with just two shift registers and and some clocking circuitry, if you're
willing to wait O(n^2) cycles.  That's O(n) gates (and a small value of
O, too), but sluggish.

A variation on that idea is to use one small-to-large permutation
network (say, permute 8 input bits to 64 output bits) several times, and
combine the results.  I think that might be O(n^1.5) gates, and it is
reasonably zippy, though still not flow-through.  It's a lot like what
one does with software table lookup.

If one is doing it out of SSI (does anyone *do* that any more?  Shows
how obsolete I am), a tree of many-to-1 multiplexors is probably better
than a ton of little gates, even though that's more gates total.  In any
discrete implementation, one would need lots of buffering to overcome
fan-in/fan-out problems.

If custom (or gate array) implementation is the target, then I dunno.
My guess is that something like a tree of multiplexors is the right
choice, just to avoid fan-in/fan-out problems.  I'd ask my gate array
vendor.

For off-the-shelf parts, I think a good choice would be Electrically
Programmable Logic Devices (e.g., Xilinx, Cypress).  This is especially
true if the permutation doesn't change very often and can thus be
programmed in as opposed to loading it into latches on the EPLD.  If a
single EPLDs can't hack the whole 64-to-64 at once, then combine a
handful of them externally, say four 16-to-64s, or share a single one
and combine in several clocks.

A multi-clock approach is usually OK for a microprocessor peripheral,
since it's going to take the processor a few clocks to get the data in
and out each time anyway.  If it's part of a black box where permutation
is the clear bottleneck, of course, that doesn't apply.

Basically, it all depends on what you've got to implement it with and
how much of a hurry you're in for each answer.
-- 
Olin Sibert           |Internet: Sibert@Oxford.COM
Oxford Systems, Inc.  |UUCP:     uunet!oxford!sibert