[comp.sys.transputer] Sigma networks

conor@inmos.co.uk (Conor O'Neill) (01/26/89)

I'm interested to know if anyone has used Sigma networks for transputers.
I first heard of them at the Occam User Group at Southampton in Sep 88,
but I've forgotten who mentioned them.

Basically, they give you a network of 2-to-the-n processors, with 4 links
each, and a maximum distance between processors of n. This compares with
a hypercube's requirement of n links per processor.
Unfortunately you don't get this for free, because the *average* message
distance is higher, and routing is not so simple.
As a bonus though, the connections leave some links unused so that
you can connect something else to the network!
As you can see, these could be very useful for transputer systems.

I will briefy describe how to create these.
Suppose we have n=4 (i.e. 16 processors). Then each processor is numbered
in binary with n bits: ABCD.  This processor is connected to four other
processors, whose numbers are found by shifting ABCD either right or left,
and adding a 0 or a 1. So ABCD is connected to 0ABC, 1ABC, BCD0, BCD1.

You notice that if ABCD = 0000, then it connects two links back to itself,
which is where the spare links come from.

If you draw out one of these networks, they tend to look pretty irregular,
but here's the n=3 picture:

    001---------011
   / | \       / | \
000  |  010=101  |  111     (both 000 and 111 have their other 2 links
   \ | /       \ | /         connected together)
    100---------110

I haven't attempted to draw a 64 processor network, but you can use the
fact that there's a lot of symmetry in there (vertical, horizontal, and
rotational) to help draw them.

A simplistic way to route a message is to slowly shift the bits of the
destination address into the current address. This will almost always
take n steps, so it is not very efficient.

EG. to get from 000 to 101, shift the 000 right and shift in the rightmost
bit of the destination to get to 100. Then shift in the middle bit of the
destination to get to 010. Finally you get to 101 in 3 steps.
Hence it is easy to see that you can always get from any processor
to any other in n steps. The trick is to make use of the higher
connectivity to reduce the number of hops.
Also this simple method has the disadvantage that it doesn't fully use
all of the links, for example all messages leaving node ABC will go to
either 0AB or 1AB, without using BC0 or BC1. 

There would be a lot of scope for recognising sub-strings within the
addresses and using that to reduce the number of hops, but I haven't
been able to give this a lot of thought. It would become very
important as you increased the number of nodes.
This might also make better use of all four links leaving each processor.
For example, with n=10, sending a message from 1010110101 to 1010111111
would take 10 steps if you did it that simple way, or you could shift
in 11111 from the right instead, in 5 steps. Alternative you could use 
a mixture of shifts, if the common pattern was in the middle of the 
bit-pattern. So you could get from 0111111110 to 1111111111 by going to:
111111110x, 1111111110, 1111111111. (where x is any value)

Anyway, has anyone actually used one of these?

Incidently, they are not restricted to 4-way nodes. If you had 8 links,
simply number the nodes in base 4, ie. ABCD is connected to 
0ABC, 1ABC, 2ABC, 3ABC, BCD0, BCD1, BCD2, BCD3.
Similarly for any *even* number of links.

The person at the Occam User Group was actually discussing his work
on finding other networks with better values for maximum distance verses
number of processors and valency (number of links).
I vaguely remember him reporting a network of distance 10, with over
10,000 processors, but I can't remember whether that was valency 4 or 10.
Does anyone have any similar graph-theoretic results?
-- 
Conor O'Neill, Software Group, INMOS Ltd.    conor@inmos.co.uk
Disclaimer: All views are my own, not those of INMOS.

CBKM23@ujvax.ulster.ac.uk (02/02/89)

        In response to the enquiry by C. O'Neill, 30-1-89, I think that these
are a variant of the perfect shuffle inteconnnection first presented by H.S. 
Stone in 1971. This pattern looks a lot more familiar than the graph shown 
in the message, but is topologically equivalent. 
        
        I am using this interconnection at present, with 16 transputers 
mounted on four B012 motherboards, and have found it quite tricky to wire up,
because the motherboards have hardwire connections for some of the links, and
I have had to find four disjoint 'runs' of processors in the interconnection 
pattern. 
        
        As to routing algorithms, I had a rummage through the literature a few
months ago, but apart from the obvious algorithm came up with nothing. However
ther is a vast North American research effort on network topologies, so 
there may well be a paper buried in there somewhere.

        Another way of viewing the topology is as two bi-directional binary
trees, meshed together in a slightly tricky way. The two root nodes are the
the processors, one down from the top, and one up from the bottom of the 
standard diagram. (This diagram is : split the processors into 'fronts' and
'backs', then draw a column of 'fronts' each with two wires emanating from
them, connected to a column of 'backs'. Number the processors from top to 
bottom.) This view suggests clever ways of routing the data, but I haven't
actually managed to produce an appropriate algorithm.

        Your question about optimal graphs for fixed valencies, is also of
interest. I would also love to know if there is an answer.

        The topic appeared to be an unsolved question in graph theory, in the
late 1970s, see Bollobas, 'Extremal Graph Theory', (Chapter 5 I think...), 
but perhaps an answer has been found by now.

        By the way, the problem is cast in Bollobas as finding the best pattern 
of direct flights between a set of airports, in such a way as to minimise the 
average flight time between airports, where indirect flights are defined 
as a concatenation of direct flights, and the number of flights leaving or 
ending at any one airport is limited. Obviously, the airports correspond to 
processors, and direct flights to links! 

Mary Shapcott,
Department of Computing Science,
University of Ulster at Jordanstown,
Co. Antrim BT37 0QB
tel : (0232) 365131 ext. 2671

References :

Stone H.S., 1971. "Parallel Processing with the Perfect Shuffle"
        I.E.E.E. Transactions on Computers, Vol 20, No.2, pp 153-161

Parker D.S., 1980. "Notes on Shuffle/exchange-type Switching Networks"
        I.E.E.E. Transactions on Computers, Vol C-29, pp 213-222.

JRA@vax1.esprit.southampton.ac.uk (02/07/89)

   I think I was the person who Conor O'Neill heard talking about sigma
networks at the Southampton Occam User Group. I believe they are more 
widely known as delta networks (I chose the wrong Greek symbol at some point).
The generalization of these to any even valency can safely be referred to as 
de Bruijn graphs.
   The largest known 4-valent graph of diameter does indeed have over 10,000
nodes. A table of the current largest known graphs of small valency and
diameter is maintained by J.-C. Bermond of Universite de Paris-Sud. Here is the
most up-to-date version of this table that I have for 3, 4 and 5-valent graphs.

             3-valent   4-valent     5-valent
Diameter 2    10 nodes     15 nodes      24 nodes
   ''    3    20  ''       40  ''        70  ''
   ''    4    38  ''       95  ''       182  ''
   ''    5    70  ''      364  ''       532  ''
   ''    6   128  ''      734  ''      2742  ''
   ''    7   184  ''     1081  ''      4368  ''
   ''    8   320  ''     2943  ''     11200  ''
   ''    9   540  ''     7439  ''     33600  ''
   ''   10   938  ''    15657  ''    123120  '' 
                                                 James Allwright

James Allwright
Dept. of Electronics & Computer Science
Southampton University
JRA@UK.AC.SOTON.ESP.V1 or JRA@UK.AC.SOTON.CM