koontz%capvax.decnet@CAPSRV.JHUAPL.EDU ("CAPVAX::KOONTZ") (07/27/90)
Vandana, You have just experienced the problem with general purpose I/O on a transputer. The problem is not in the transputer itself but in the fact that it does not have a hard-disk attached directly to it. Instead, it must perform remote procedure calls through a transputer link to a suitable interface on the Sun which is executing a driver to control the link adapter which must then respond back to a server program time-sliced with other stuff executing on the Sun...yuch, what a mess. The physical interface is one of the main bottlenecks. On a B014, the VMEbus to transputer interface uses a simple Link Adapter. The interface allows one byte to be transmitted or received to/from the transputer at a time. The server program running in the Sun must either poll the link adapter or set up an interrupt. The interrupt won't improve I/O speed since no extra buffers are provided; it just allows the Sun to do other things while this slow interface is sending or receiving a byte. While a transputer link can transfer around 1.75MB/sec (half-duplex @ 20Mbps), Inmos 'sez that the B014 VMEbus interface will only handle around 150KB/sec. If the interface could handle the link speed, we might see up to an 11.67 times improvement. Thus, your T800 time could become something around 2.7secs. This is still 20 times slower than the Sun-3 and 40 times slower than the Sun-4. So this isn't the complete answer. Other things probably influence the speed such as the driver for the link adapter and the time-slicing of other processes under Unix. I started thinking maybe the speed of the SCSI disk vs. the transputer link could be a facter - but SCSI disks aren't that much faster (and with the seek times they are much slower). You said that the benchmark doesn't actually perform any disk activity. Ah...maybe that's the clue. The times for the Sun's are so quick because the disk buffering is direct (and no disk activity); you're seeing the speed of the Sun's memory and CPU interface. If the T800 did disk buffering in it's memory, you might see a time somewhere between the Sun-3 and Sun-4. However, the Sun is performing the disk buffering in its memory for the T800. So how to speed things up? You can either get a better interface between the Sun and the T800 (get rid of that link adapter!) or you can connect a disk directly to the T800. Or maybe you can increase the priority of the server and driver under Unix. Or maybe you can get a better C compiler... Just for fun, I tried your benchmark on one of our PC systems. The transputer hardware consists of an Inmos B008 with a B404-3 (T800-20) on Slot 0. The PC is a Compaq Port 386/20. I compiled the benchmark using Logical Systems C v88.4. The LSC server (cio) allows you to enable or disable the DMA capability on the B008. This can speed things up a lot when using an 286-style PC but not as much with a 386 (the 386 can poll almost as fast as the DMA can transfer - it's not such a great DMA design...). I tried it with and without the DMA: here's what I got: Trial 1: DMA disabled (polling) 3.798 sec Trial 2: DMA enabled 4.231 sec Wow! Did I misplace a decimal point? No, I ran each 3 times and verified against a wall clock. I can't believe that the B008 and B014 interfaces are that much different. I suspect we're seeing the difference between a server running on a dedicated CPU (the PC) vs. one running under a multi-user OS like Unix. Or are the drivers for the B014 to blame. OK netters...the source code for the benchmark is provided. Anyone have a Unix system that uses a faster interface (e.g. FIFOs or dual-ported memory) from someone like Transtech, Meiko, Topologix, et al. And does anyone have a T800 with a SCSI disk connected directly and appropriate software? Helios has a standalone filing system for the Inmos B422 SCSI TRAM (OUG News #13). How about hereing from anyone with a Yarc transputer board or another PC-type board that has a faster interface to the PC. Can our friendly neighborhood hardware designed at Inmos (yes Dave Boreham, that's you...) contribute any info about the B008 and B014 link adapter designs or the driver designs. Do you have a driver for the B016 yet that uses dual-ported memory and can you run a benchmark on this setup? Sorry, I'm bored writing documentation, I like playing with benchmarks, and this looks like an interesting one for figuring out filer performance. This looks like a fun project to get lots of times from different HW/SW configurations. It'll take each person about 15 minutes to get it from your email file into your system, compile it, and read the numbers. I'll try to keep track of the scores. So let's all go play in our sandboxes and compare results. Ken Koontz The Johns Hopkins University Applied Physics Laboratory Laurel, MD USA email: koontz@capsrv.jhuapl.edu
davidb@brac.inmos.co.uk (David Boreham) (07/29/90)
In article <9007271900.AA09549@tcgould.TN.CORNELL.EDU> koontz%capvax.decnet@CAPSRV.JHUAPL.EDU ("CAPVAX::KOONTZ") writes: >Vandana, > > You have just experienced the problem with general purpose I/O on a >transputer. The problem is not in the transputer itself but in the fact that it ... >20Mbps), Inmos 'sez that the B014 VMEbus interface will only handle around >150KB/sec. If the interface could handle the link speed, we might see up to an ^^^^^^^ Well actually the link adaptor will do ~800K. The link adaptor in any circuit you can design and put into a computer will do ~600K on blind reads and ~300K polled. The 150K is half what should be realised, probably due to SUN's slow VME and allowing for another user process. >11.67 times improvement. Thus, your T800 time could become something around >2.7secs. This is still 20 times slower than the Sun-3 and 40 times slower than >the Sun-4. So this isn't the complete answer. > ... > > So how to speed things up? You can either get a better interface between >the Sun and the T800 (get rid of that link adapter!) or you can connect a disk >directly to the T800. Or maybe you can increase the priority of the server and >driver under Unix. Or maybe you can get a better C compiler... > > Just for fun, I tried your benchmark on one of our PC systems. The >transputer hardware consists of an Inmos B008 with a B404-3 (T800-20) on Slot 0. >The PC is a Compaq Port 386/20. I compiled the benchmark using Logical Systems >C v88.4. The LSC server (cio) allows you to enable or disable the DMA >capability on the B008. This can speed things up a lot when using an 286-style >PC but not as much with a 386 (the 386 can poll almost as fast as the DMA can >transfer - it's not such a great DMA design...). I tried it with and without ^^^^^^^^^^^^^^^^ I guess you mean the PC's DMA design ?? Yes, I thought so. >the DMA: here's what I got: > > Trial 1: DMA disabled (polling) 3.798 sec > Trial 2: DMA enabled 4.231 sec > >Wow! Did I misplace a decimal point? No, I ran each 3 times and verified >against a wall clock. I can't believe that the B008 and B014 interfaces are >that much different. I suspect we're seeing the difference between a server ^^^^^^^^^^ Nope, fundamentally identical. >running on a dedicated CPU (the PC) vs. one running under a multi-user OS like >Unix. Or are the drivers for the B014 to blame. Wait a minute, you're running a different server, right ? Run that sucker on a B014 and see what you get there, I'd be quite interested. > > OK netters...the source code for the benchmark is provided. Anyone have a >Unix system that uses a faster interface (e.g. FIFOs or dual-ported memory) from >someone like Transtech, Meiko, Topologix, et al. And does anyone have a T800 >with a SCSI disk connected directly and appropriate software? Helios has a >standalone filing system for the Inmos B422 SCSI TRAM (OUG News #13). How about ^^^^^^^^^^^^^^ Yes, works rather well I'm told. Mabe I can get the benchmark run on that system, although it's off the premises at the moment. >hereing from anyone with a Yarc transputer board or another PC-type board that >has a faster interface to the PC. Can our friendly neighborhood hardware >designed at Inmos (yes Dave Boreham, that's you...) contribute any info about ^^^^^^^^^^^^^^^^^ Hmm, problems with the keyboard Ken ?, my Mother says I was designed in Edinburgh :) >the B008 and B014 link adapter designs or the driver designs. Do you have a >driver for the B016 yet that uses dual-ported memory and can you run a benchmark >on this setup? RSN. B016 goes about five times as fast as B014 in a SUN3/140, will do full link speed. > > Sorry, I'm bored writing documentation, I like playing with benchmarks, and >this looks like an interesting one for figuring out filer performance. This >looks like a fun project to get lots of times from different HW/SW >configurations. It'll take each person about 15 minutes to get it from your >email file into your system, compile it, and read the numbers. I'll try to keep >track of the scores. > Problems interfacing transputers to general purpose computers: 1. All software known expects to talk to a LINK and be a BYTE stream. This immediatly screws things up because this means linkadaptors unless major hacking is done and also some degree of playing with individual bytes as they go though. 2. Link adaptors (the present one, in C011 and C012 guise) are not suitable for interfacing to computers. They are basically a toy usefull for talking to slow dumb peripherals. The big lack is BUFFERING and their 8-bit ness. Every link adaptor interface is limited to a few hundred Kbytes/s. If the computer is a multitasking one, the device driver will steal CPU time from everything else because it is polling the link adaptor. Obviously other computer-to-link interface schemes are possible, all of which promise full link bandwith capability. I designed my first in 1985 and invented quite a neat one a couple of years ago. The goal being full link speed with minimum CPU loading. The reason why these schemes are not universal and link adaptors consigned to the trash are as follows: 1. Expense. 2. Software incompatibility. 3. No standard approach to follow. 4. Preceived lack of need for fast I/O !!! That last one can be partly explained by noting that the iserver, as implemented, and the iserver protocol impose a much lower limit on communication speed than does the link adaptor. Expect to see a new link adaptor sometime which is designed properly. Start thinking about how to solve the interface problems and software with H1 DS links @ tens of megabytes/s :) David Boreham, INMOS Limited | mail(uk): davidb@inmos.co.uk or ukc!inmos!davidb Bristol, England | (us): uunet!inmos.com!davidb +44 454 616616 ex 547 | Internet: davidb@inmos.com
jeremy@cs.ua.oz.au (Jeremy Webber) (07/31/90)
In article <8932@ganymede.inmos.co.uk> davidb@brac.inmos.co.uk (David Boreham) writes: In article <9007271900.AA09549@tcgould.TN.CORNELL.EDU> koontz%capvax.decnet@CAPSRV.JHUAPL.EDU ("CAPVAX::KOONTZ") writes: ... >20Mbps), Inmos 'sez that the B014 VMEbus interface will only handle around >150KB/sec. If the interface could handle the link speed, we might see up to an ^^^^^^^ Well actually the link adaptor will do ~800K. ... Problems interfacing transputers to general purpose computers: 1. All software known expects to talk to a LINK and be a BYTE stream. 2. Link adaptors (the present one, in C011 and C012 guise) are not suitable for interfacing to computers. They are basically a toy usefull for talking to slow dumb peripherals. The big lack is BUFFERING and their 8-bit ness. I was faced with the task of interfacing a B014 to an SGI Personal Iris, including writing device drivers from scratch. Fundamentally, the reason the B014 is so slow is because it does 8bit data transfers, and has no support for DMA. On our 32 bit R2000 based machine this is a _real_ lose, both in link speed and bus bandwidth consumed. The fact that the link semantics are a byte stream have nothing to do with it. RS232 connections are also byte streams, but serial controllers which do word tranfers over the bus, with DMA and siloing, have been around for donkey's years. Whoever designed the B014 was either designing a toy or hadn't read the literature. As an aside, the device driver was easy to write, and does polling rather than interrupts because there are less instructions needed per byte transferred (when there is data available) this way. This does, however, make the device driver unusable on a multi-user computer because the CPU spins when there is no data available (yes, the driver does relinquish the CPU after a certain number of spins). -jeremy webber -- -- Jeremy Webber ACSnet: jeremy@chook.ua.oz Digital Arts Film and Television, Internet: jeremy@chook.ua.oz.au 60 Hutt St, Adelaide 5001, Voicenet: +61 8 223 2430 Australia Papernet: +61 8 272 2774 (FAX)