touch@dsl.cis.upenn.edu (Joe Touch) (11/15/90)
In article <HUGHES.90Nov14134247@locusts.Berkeley.EDU> hughes@locusts.Berkeley.EDU (Eric Hughes) writes: >I am looking for a circuit which can arbitrarily permute a fix number >of bits (e.g. 64). >any such circuit which uses O(n log n) gates? Use a Batcher-banyan organization. It has (n/2)log(n) elements, and can realize any permutation. See references on Multistage Interconnection Networks. Joe
wos@Oxford.COM (Olin Sibert) (11/15/90)
In article <HUGHES.90Nov14134247@locusts.Berkeley.EDU> hughes@locusts.Berkeley.EDU (Eric Hughes) writes: > I am looking for a circuit which can arbitrarily permute a fix number > of bits (e.g. 64). The particular permutation will normally be held > in some latches. > > Now a crossbar switch suffices, but uses O(n^2) gates. Is there > any such circuit which uses O(n log n) gates? I believe that even if there is a O(n log n) answer (or related), it depends on your speed requirements and what implementation technology you're using as to what's "optimal". When you're only talking about 64 bits, O(anything) can be quite misleading, because packaging, speed, and parts availability considerations often dominate. For example, it's clear how to implement arbitrarily large permutations with just two shift registers and and some clocking circuitry, if you're willing to wait O(n^2) cycles. That's O(n) gates (and a small value of O, too), but sluggish. A variation on that idea is to use one small-to-large permutation network (say, permute 8 input bits to 64 output bits) several times, and combine the results. I think that might be O(n^1.5) gates, and it is reasonably zippy, though still not flow-through. It's a lot like what one does with software table lookup. If one is doing it out of SSI (does anyone *do* that any more? Shows how obsolete I am), a tree of many-to-1 multiplexors is probably better than a ton of little gates, even though that's more gates total. In any discrete implementation, one would need lots of buffering to overcome fan-in/fan-out problems. If custom (or gate array) implementation is the target, then I dunno. My guess is that something like a tree of multiplexors is the right choice, just to avoid fan-in/fan-out problems. I'd ask my gate array vendor. For off-the-shelf parts, I think a good choice would be Electrically Programmable Logic Devices (e.g., Xilinx, Cypress). This is especially true if the permutation doesn't change very often and can thus be programmed in as opposed to loading it into latches on the EPLD. If a single EPLDs can't hack the whole 64-to-64 at once, then combine a handful of them externally, say four 16-to-64s, or share a single one and combine in several clocks. A multi-clock approach is usually OK for a microprocessor peripheral, since it's going to take the processor a few clocks to get the data in and out each time anyway. If it's part of a black box where permutation is the clear bottleneck, of course, that doesn't apply. Basically, it all depends on what you've got to implement it with and how much of a hurry you're in for each answer. -- Olin Sibert |Internet: Sibert@Oxford.COM Oxford Systems, Inc. |UUCP: uunet!oxford!sibert