greeny@wotan.top.cis.syr.edu (Jonathan Greenfield) (03/20/91)
Having now received several papers about, information concerning, and
references to implemented, general-purpose routing systems, I am prepared
to summarize that sampling for those of you who requested that I share the
information.
At least two of the systems are currently available for use on transputer-
based computers. The information that follows has been provided to me
via published papers and private communications. I have not independently
verified any of the information. The information in the summaries is, to
the best of my knowledge, accurate. Please post additional information or
corrections to comp.parallel and comp.sys.transputer.
Many thanks to all who responded to my requests for information.
--------------------------------------------------
TROLLIUS 2.0 offers a variety of different methods for implementing
interprocess communication. Since our interest here is in general-purpose
routers (which give the programmer the illusion of a completely connected
network), we consider only the "Nsend" method of Trollius, which provides
such general-purpose routing. (Note that Trollius provides two variations
for each message passing method: Regular and Virtual Circuit. I do not
have information on what these variations entail.)
The following information was provided for Trollius 2.0 operating on
20 MHz T800 transputers (20 Mbit/s links?).
T, the time per message, per hop is:
Ts + Tc*<message length> + Tp*<number of extra packets>
where,
packetization is provided with a statically configurable packet size
(default is 4096 bytes),
Ts, the set-up charge, is 480 us for Regular message passing
168 us for Virtual Circuit msg. passing,
Tc, the communication time, is .56 us/byte for Reg. or VC,
Tp, the packetization cost, is 552 us for Reg.
140 us for VC.
Since no information on loading conditions was provided, I would assume that
these figures exclude any contention for routing services. I also have no
information regarding the routing scheme, or cpu utilization. Sorry, I don't
have references for further information either, though I understand that there
is a Trollius mailing list for information (possibly originating from Ohio
State University)
-----------------------------------------------------
THE VIRTUAL CHANNEL ROUTER runs on T-series transputers and allows
distributed programs to be written in a topology-independent format, and
then be bound to a topology-dependent routing kernel at run-time. The system
is based on an occam compiler/configurer for H1 simulation, but can be
interfaced with C, Pascal, and FORTRAN programs. The system supports
deadlock free routing.
The time per message, per hop, T, is reported to be
Ts + Tc*<message length> + Tp*<number of extra packets>
where,
Ts (startup time) is 80 us,
Tc (communication time) is determined by link bandwidth
(approximately .6 us/byte),
Tp (packetization time) is 45 us.
The default packet size is 160 bytes. The switch time and packet size give
a node output bandwidth of 3 Mbytes/s (about 50% of raw bandwidth) if all
processing cycles are dedicated to through-routing. This figure can be traded
off against latency by varying the packet size. (I would assume that these
figures are for T800 transputers.)
VCR and accompanying information are available from
Transputer Technology Solutions
2 Venture Road
Chilworth Research Cntre
Southampton
SO1 7NP
United Kingdom
Tel.: 0703-760834
Fax.: 0703-760833
E-mail: manager@uk.ac.soton.tts
--------------------------------------------
A routing system based upon the deadlock elimination technique of Dally and
Seitz [1], is described in
M. Shumway, "Deadlock-Free Packet Networks," Transputer Research and
Applications 2 (Proc. of 2nd North American Transputer Conference),
IOS Press, 139-177 (1989).
The paper considers a technique for designing deadlock-free routing systems
for arbitrary topologies, and presents performance results for a two-
dimensional grid implementation. (Note that according to a private
communication, the author has found that grids outperform toroid meshes
(due to simplified deadlock avoidance) for networks up to thousands of
processors.)
Mean internode latencies (hop time) for a 64-byte packet ranges from
90 us to 280 us, depending upon demand (100 msg/s to 10,000 msg/s). With
demand at saturation (approximately 3000 to 4000 msg/s for 64-byte packets),
latencies vary from under 200 us to 1800 us for packet sizes from 16 bytes
to 1024 bytes. (All figures for 20 MHz T800 transputers with 20 Mbit/s links.)
Additional performance figures, including processor utilization and
network scalability is provided in the paper (but is too extensive to
provide here).
---------------------------------------------
A routing system based upon a less-restrictive variation of the deadlock
avoidance technique of [1] is presented in
J. Yantchev and C. Jesshope, "Adaptive, low-latency, deadlock-free
packet routing for networks of processors," IEE Proceedings, Vol. 136,
Pt. E, 178-186 (1989).
The routing technique utilizes a partial ordering of virtual channels that
preserves some adaptive capacity (as much as is possible within the framework
of an optimal-distance router). The technique is applicable to arbitrary
networks.
The paper then introduces a low-latency routing strategy, called "the
mad postman," intended to outperform wormhole and virtual cut-through
routing. The technique is primarily aimed at a hardware implementation,
which is being implemented (or has been implemented?) on transputer
networks at Southampton University.
Details of the technique, including an estimation of latencies, may be
found in the paper.
-------------------------------------------
I have also been informed of two additional routing systems, both implemented
on transputer networks:
1) TINY, a general-purpose transputer router developed at the
Edinburgh Parallel Computing Centre (University of Edinburgh).
(A paper on this system will be appearing in Concurrency: Practice
and Experience some time soon.)
and 2) An adaptive algorithm for a toroid mesh which provides deadlock-free
routing along with automatic local congestion reduction, developed
at CRAI. (A paper on this system is apparently being prepared for
publication. Perhaps someone can post a further reference in the
future.)
-----------------------------------------------
Reference:
1. W. Dally and C. Seitz, "Deadlock-Free Message Routing in Multiprocessor
Interconnection Networks," IEEE Transactions on Computers, Vol. C-36,
547-553 (1987).
Hope this is helpful!
Jonathan