greeny@wotan.top.cis.syr.edu (Jonathan Greenfield) (03/20/91)
Having now received several papers about, information concerning, and references to implemented, general-purpose routing systems, I am prepared to summarize that sampling for those of you who requested that I share the information. At least two of the systems are currently available for use on transputer- based computers. The information that follows has been provided to me via published papers and private communications. I have not independently verified any of the information. The information in the summaries is, to the best of my knowledge, accurate. Please post additional information or corrections to comp.parallel and comp.sys.transputer. Many thanks to all who responded to my requests for information. -------------------------------------------------- TROLLIUS 2.0 offers a variety of different methods for implementing interprocess communication. Since our interest here is in general-purpose routers (which give the programmer the illusion of a completely connected network), we consider only the "Nsend" method of Trollius, which provides such general-purpose routing. (Note that Trollius provides two variations for each message passing method: Regular and Virtual Circuit. I do not have information on what these variations entail.) The following information was provided for Trollius 2.0 operating on 20 MHz T800 transputers (20 Mbit/s links?). T, the time per message, per hop is: Ts + Tc*<message length> + Tp*<number of extra packets> where, packetization is provided with a statically configurable packet size (default is 4096 bytes), Ts, the set-up charge, is 480 us for Regular message passing 168 us for Virtual Circuit msg. passing, Tc, the communication time, is .56 us/byte for Reg. or VC, Tp, the packetization cost, is 552 us for Reg. 140 us for VC. Since no information on loading conditions was provided, I would assume that these figures exclude any contention for routing services. I also have no information regarding the routing scheme, or cpu utilization. Sorry, I don't have references for further information either, though I understand that there is a Trollius mailing list for information (possibly originating from Ohio State University) ----------------------------------------------------- THE VIRTUAL CHANNEL ROUTER runs on T-series transputers and allows distributed programs to be written in a topology-independent format, and then be bound to a topology-dependent routing kernel at run-time. The system is based on an occam compiler/configurer for H1 simulation, but can be interfaced with C, Pascal, and FORTRAN programs. The system supports deadlock free routing. The time per message, per hop, T, is reported to be Ts + Tc*<message length> + Tp*<number of extra packets> where, Ts (startup time) is 80 us, Tc (communication time) is determined by link bandwidth (approximately .6 us/byte), Tp (packetization time) is 45 us. The default packet size is 160 bytes. The switch time and packet size give a node output bandwidth of 3 Mbytes/s (about 50% of raw bandwidth) if all processing cycles are dedicated to through-routing. This figure can be traded off against latency by varying the packet size. (I would assume that these figures are for T800 transputers.) VCR and accompanying information are available from Transputer Technology Solutions 2 Venture Road Chilworth Research Cntre Southampton SO1 7NP United Kingdom Tel.: 0703-760834 Fax.: 0703-760833 E-mail: manager@uk.ac.soton.tts -------------------------------------------- A routing system based upon the deadlock elimination technique of Dally and Seitz [1], is described in M. Shumway, "Deadlock-Free Packet Networks," Transputer Research and Applications 2 (Proc. of 2nd North American Transputer Conference), IOS Press, 139-177 (1989). The paper considers a technique for designing deadlock-free routing systems for arbitrary topologies, and presents performance results for a two- dimensional grid implementation. (Note that according to a private communication, the author has found that grids outperform toroid meshes (due to simplified deadlock avoidance) for networks up to thousands of processors.) Mean internode latencies (hop time) for a 64-byte packet ranges from 90 us to 280 us, depending upon demand (100 msg/s to 10,000 msg/s). With demand at saturation (approximately 3000 to 4000 msg/s for 64-byte packets), latencies vary from under 200 us to 1800 us for packet sizes from 16 bytes to 1024 bytes. (All figures for 20 MHz T800 transputers with 20 Mbit/s links.) Additional performance figures, including processor utilization and network scalability is provided in the paper (but is too extensive to provide here). --------------------------------------------- A routing system based upon a less-restrictive variation of the deadlock avoidance technique of [1] is presented in J. Yantchev and C. Jesshope, "Adaptive, low-latency, deadlock-free packet routing for networks of processors," IEE Proceedings, Vol. 136, Pt. E, 178-186 (1989). The routing technique utilizes a partial ordering of virtual channels that preserves some adaptive capacity (as much as is possible within the framework of an optimal-distance router). The technique is applicable to arbitrary networks. The paper then introduces a low-latency routing strategy, called "the mad postman," intended to outperform wormhole and virtual cut-through routing. The technique is primarily aimed at a hardware implementation, which is being implemented (or has been implemented?) on transputer networks at Southampton University. Details of the technique, including an estimation of latencies, may be found in the paper. ------------------------------------------- I have also been informed of two additional routing systems, both implemented on transputer networks: 1) TINY, a general-purpose transputer router developed at the Edinburgh Parallel Computing Centre (University of Edinburgh). (A paper on this system will be appearing in Concurrency: Practice and Experience some time soon.) and 2) An adaptive algorithm for a toroid mesh which provides deadlock-free routing along with automatic local congestion reduction, developed at CRAI. (A paper on this system is apparently being prepared for publication. Perhaps someone can post a further reference in the future.) ----------------------------------------------- Reference: 1. W. Dally and C. Seitz, "Deadlock-Free Message Routing in Multiprocessor Interconnection Networks," IEEE Transactions on Computers, Vol. C-36, 547-553 (1987). Hope this is helpful! Jonathan -- =========================== MODERATOR ============================== Steve Stevenson {steve,fpst}@hubcap.clemson.edu Department of Computer Science, comp.parallel Clemson University, Clemson, SC 29634-1906 (803)656-5880.mabell