[comp.parallel] General-purpose router summary

greeny@wotan.top.cis.syr.edu (Jonathan Greenfield) (03/20/91)
  Having now received several papers about, information concerning, and 
references to implemented, general-purpose routing systems, I am prepared 
to summarize that sampling for those of you who requested that I share the 
information.

  At least two of the systems are currently available for use on transputer-
based computers.  The information that follows has been provided to me
via published papers and private communications.  I have not independently
verified any of the information.  The information in the summaries is, to
the best of my knowledge, accurate.  Please post additional information or
corrections to comp.parallel and comp.sys.transputer.

Many thanks to all who responded to my requests for information.

--------------------------------------------------

  TROLLIUS 2.0 offers a variety of different methods for implementing
interprocess communication.  Since our interest here is in general-purpose
routers (which give the programmer the illusion of a completely connected
network), we consider only the "Nsend" method of Trollius, which provides
such general-purpose routing.  (Note that Trollius provides two variations
for each message passing method: Regular and Virtual Circuit.  I do not
have information on what these variations entail.)

  The following information was provided for Trollius 2.0 operating on
20 MHz T800 transputers (20 Mbit/s links?).

  T, the time per message, per hop is:

    Ts + Tc*<message length> + Tp*<number of extra packets>

where,

    packetization is provided with a statically configurable packet size
    (default is 4096 bytes),

    Ts, the set-up charge, is 480 us for Regular message passing
                              168 us for Virtual Circuit msg. passing,

    Tc, the communication time, is .56 us/byte for Reg. or VC,

    Tp, the packetization cost, is 552 us for Reg.
                                   140 us for VC.

Since no information on loading conditions was provided, I would assume that
these figures exclude any contention for routing services.  I also have no
information regarding the routing scheme, or cpu utilization.  Sorry, I don't 
have references for further information either, though I understand that there 
is a Trollius mailing list for information (possibly originating from Ohio 
State University)

-----------------------------------------------------

THE VIRTUAL CHANNEL ROUTER runs on T-series transputers and allows
distributed programs to be written in a topology-independent format, and
then be bound to a topology-dependent routing kernel at run-time.  The system
is based on an occam compiler/configurer for H1 simulation, but can be 
interfaced with C, Pascal, and FORTRAN programs.  The system supports
deadlock free routing.

   The time per message, per hop, T, is reported to be

      Ts + Tc*<message length> + Tp*<number of extra packets>

where,

      Ts (startup time) is 80 us,
      Tc (communication time) is determined by link bandwidth
        (approximately .6 us/byte),
      Tp (packetization time) is 45 us.

The default packet size is 160 bytes.  The switch time and packet size give
a node output bandwidth of 3 Mbytes/s (about 50% of raw bandwidth) if all
processing cycles are dedicated to through-routing.  This figure can be traded
off against latency by varying the packet size.  (I would assume that these
figures are for T800 transputers.)

VCR and accompanying information are available from

   Transputer Technology Solutions
   2 Venture Road
   Chilworth Research Cntre
   Southampton
   SO1 7NP
   United Kingdom

   Tel.: 0703-760834
   Fax.: 0703-760833
   E-mail: manager@uk.ac.soton.tts

--------------------------------------------

A routing system based upon the deadlock elimination technique of Dally and
Seitz [1], is described in

   M. Shumway, "Deadlock-Free Packet Networks," Transputer Research and
   Applications 2 (Proc. of 2nd North American Transputer Conference),
   IOS Press, 139-177 (1989).

The paper considers a technique for designing deadlock-free routing systems
for arbitrary topologies, and presents performance results for a two-
dimensional grid implementation.  (Note that according to a private
communication, the author has found that grids outperform toroid meshes
(due to simplified deadlock avoidance) for networks up to thousands of 
processors.)

   Mean internode latencies (hop time) for a 64-byte packet ranges from
90 us to 280 us, depending upon demand (100 msg/s to 10,000 msg/s).  With
demand at saturation (approximately 3000 to 4000 msg/s for 64-byte packets), 
latencies vary from under 200 us to 1800 us for packet sizes from 16 bytes 
to 1024 bytes.  (All figures for 20 MHz T800 transputers with 20 Mbit/s links.)

   Additional performance figures, including processor utilization and
network scalability is provided in the paper (but is too extensive to
provide here).

---------------------------------------------

A routing system based upon a less-restrictive variation of the deadlock
avoidance technique of [1] is presented in

   J. Yantchev and C. Jesshope, "Adaptive, low-latency, deadlock-free
   packet routing for networks of processors," IEE Proceedings, Vol. 136,
   Pt. E, 178-186 (1989).

The routing technique utilizes a partial ordering of virtual channels that
preserves some adaptive capacity (as much as is possible within the framework
of an optimal-distance router).  The technique is applicable to arbitrary
networks.

  The paper then introduces a low-latency routing strategy, called "the
mad postman," intended to outperform wormhole and virtual cut-through
routing.  The technique is primarily aimed at a hardware implementation,
which is being implemented (or has been implemented?) on transputer
networks at Southampton University.

   Details of the technique, including an estimation of latencies, may be
found in the paper.

-------------------------------------------

I have also been informed of two additional routing systems, both implemented
on transputer networks:

    1) TINY, a general-purpose transputer router developed at the
       Edinburgh Parallel Computing Centre (University of Edinburgh).
       (A paper on this system will be appearing in Concurrency: Practice
       and Experience some time soon.)

and 2) An adaptive algorithm for a toroid mesh which provides deadlock-free
       routing along with automatic local congestion reduction, developed
       at CRAI.  (A paper on this system is apparently being prepared for
       publication.  Perhaps someone can post a further reference in the
       future.)

-----------------------------------------------

Reference:

  1.  W. Dally and C. Seitz, "Deadlock-Free Message Routing in Multiprocessor
      Interconnection Networks," IEEE Transactions on Computers, Vol. C-36,
      547-553 (1987).


Hope this is helpful!

Jonathan



-- 
=========================== MODERATOR ==============================
Steve Stevenson                            {steve,fpst}@hubcap.clemson.edu
Department of Computer Science,            comp.parallel
Clemson University, Clemson, SC 29634-1906 (803)656-5880.mabell