[comp.sys.transputer] BYTE article on H1

wxh@lanl.gov (William x Harvey) (05/31/90)

The following information is extracted from April 1990 BYTE (European
version), Virtual Channels:  The Next Generation of Transputers, by Dick
Pountain. 
 
Current transputer hardware and software only partially embody the
Communicating Sequential Processes (CSP) model.  Distributing processes
onto multiple transputers is limited because the transputer has only four
links, cramping the way programs are written and losing some of the
advantages of the CSP model.
 
To reduce the channels used, either rewrite the single-processor program,
which defeats one objective of Occam - rewriting of its "scheduling
invariance", or force the processes to share links by multiplexing.  An
example of this limitation is the need to use additional transputers at
each node of a hypercube to just get link connectivity. 
 
David May at Inmos, Bristol gave out some information on the H1, which has
a solution to the interconnectivity problem.  Inmos felt adding more
on-chip links would not help much, and instead added multiplexing hardware
to work with a high-speed routing chip to implement packet switching.
 
Links now become virtual channels, with each program having as many as
needed.  The routing chip delivers messages to a named destination.  Two
processes anywhere in a network communicate using a named channel, not
caring about routing, making the programs more portable.
 
The H1 still has four links onboard, but divides messages into packets and
interleaves these onto a single link.  The links operate at 100 megabits
per second full-duplex.  Messages are split into 32-byte packets, and sent
with a one or two byte header, and an end-of-packet or end-of-message
trailer.  The receiver acknowledges every packet with an empty packet.
Each virtual link has two virtual channels, one to send and one to receive.
A sending process waits until its last sent packet is acknowledged, which
occurs at reception of the first byte, allowing unbroken transmisions if
the receiver is ready.  If not ready, a one-packet long buffer for each
link is provided in memory, permitting any number of virtual channels. 
 
To use the H1 as other than an improved T800 requires the C104 routing
chip, containing 32 transputer links with a 32 by 32 crossbar switch.
Connecting multiple C104 chips allows a complex switching network.  The
C104 contains only simple logic to route messages.  The technique of
wormhole routing is used, in which the header instructs the C104 which link
to connect, closing that connection at the end of the message.  A node in
use will stall a competing message, however with a good routing stratey low
latency is observed. 
 
Inmos chose interval routing, which is complete (every packet eventually
arrives), deadlock-free, inexpensive, fast, scalable (good for any size
network), versatile (good for any type topology), and can be near optimal
(shortest routing followed).  A number assigned to each node is used for
its address.  At every routing switch each link knows which header values
it should recognize.  Intervals are nonoverlapping and numbered so every
header falls into just one interval.  Each interval is a pointer to the
subtree containing all these processor numbers.  The attraction in this
scheme is the routing can be computed with a single comparison.  The C104
allows 1 or 2 byte headers, giving up to 65536 transputers in a single
network.  Algorithms exist to prevent cyclic paths that are deadlock free.
 
An optimal labeling scheme for an arbritray network is unknown, but the
more common topologys such as rings, trees, and two-dimensional arrays are
known.  Grid and hypercube schemes can be constructed from these topologys.
 
When too many messages go through a single node, a hot spot develops,
stalling a large number of packets.  This can be avoided by evenly
distributing the flow.  The C104 can perform this using universal routing
by choosing a random node to initially send the packet to, where it is
forwarded to its destination.  Latency is increased and throughput is
decreased, but the worst-case performance is near optimum.  Extra headers
can join additional networks also, to whatever level is needed.
 
Packet switching removes certain restrictions in Occam.  The present
version is a static language in which resources must be determined at
compile time, so new processes cannot be added at run time on remote
processers.  Also each channel must be explicitly allocated to a physical
link at compile time.  H1 and the C104 allow Full Occam to be implemented.
It looks the same, but PLACE need not be used to allocate channels to links
- the routing system will handle this.  The compiler will be able to PLACE
processes on processors itself.  Portability to any routed array of
transputers will occur, though efficiency will be determined by the
topology.  The next step is Dynamic Full Occam.  Compiler allocation
restrictions are removed, allowing recursive processes, replicators with
run-time indexes, and reallocation of processors on the fly.
 
The H1 will also have an on chip Floating Point Unit and four communication
links.  The core processor will use on-chip caching and a DRAM controller
supporting static column addressing.  Peak performance should be 100 MIPS
or 20 MFLOPS.  A memory protection scheme allowing code, data, stack, and
heap will help in implementing operating systems such as Unix.  A
many-to-one channel instruction will allow many processes to share a common
resource, such as a file server or graphics engine, more efficiently than
ALT does.  To ease the implementation of semaphores, the instructions
SIGNAL and WAIT will be introduced.  Added also is hardware support for
exception handling in accordance with IEEE floating point standards. 
 
Simulations suggest the H1/C104 pair will perform close to the theoretical
maximum.  The average message delay will go from 12 to 27 microseconds in
going from a 64 to a 16384 processor hypercube.
 
The author, Dick Pountain, lives in London.  He can be contacted on BIX as
"dickp."
 
-----
 
Billy Harvey          wxh@lanl.gov
Phone: +49 6565 4342 (W. Germany)