[comp.sys.transputer] I/O Calls on the Transputer and General Call for Benchmarks

koontz%capvax.decnet@CAPSRV.JHUAPL.EDU ("CAPVAX::KOONTZ") (07/27/90)

Vandana,

     You have just experienced the problem with general purpose I/O on a 
transputer.  The problem is not in the transputer itself but in the fact that it 
does not have a hard-disk attached directly to it.  Instead, it must perform 
remote procedure calls through a transputer link to a suitable interface on the 
Sun which is executing a driver to control the link adapter which must then 
respond back to a server program time-sliced with other stuff executing on the 
Sun...yuch, what a mess.

     The physical interface is one of the main bottlenecks.  On a B014, the 
VMEbus to transputer interface uses a simple Link Adapter.  The interface allows 
one byte to be transmitted or received to/from the transputer at a time.  The 
server program running in the Sun must either poll the link adapter or set up an 
interrupt.  The interrupt won't improve I/O speed since no extra buffers are 
provided; it just allows the Sun to do other things while this slow interface is 
sending or receiving a byte.

     While a transputer link can transfer around 1.75MB/sec (half-duplex @ 
20Mbps), Inmos 'sez that the B014 VMEbus interface will only handle around 
150KB/sec.  If the interface could handle the link speed, we might see up to an 
11.67 times improvement.  Thus, your T800 time could become something around 
2.7secs.  This is still 20 times slower than the Sun-3 and 40 times slower than 
the Sun-4.  So this isn't the complete answer.

     Other things probably influence the speed such as the driver for the link 
adapter and the time-slicing of other processes under Unix.  I started thinking 
maybe the speed of the SCSI disk vs. the transputer link could be a facter - but 
SCSI disks aren't that much faster (and with the seek times they are much 
slower).  You said that the benchmark doesn't actually perform any disk 
activity.  Ah...maybe that's the clue.  The times for the Sun's are so quick 
because the disk buffering is direct (and no disk activity); you're seeing the 
speed of the Sun's memory and CPU interface.  If the T800 did disk buffering in 
it's memory, you might see a time somewhere between the Sun-3 and Sun-4.  
However, the Sun is performing the disk buffering in its memory for the T800.

     So how to speed things up?  You can either get a better interface between 
the Sun and the T800 (get rid of that link adapter!) or you can connect a disk 
directly to the T800.  Or maybe you can increase the priority of the server and 
driver under Unix.  Or maybe you can get a better C compiler...

     Just for fun, I tried your benchmark on one of our PC systems.  The 
transputer hardware consists of an Inmos B008 with a B404-3 (T800-20) on Slot 0. 
The PC is a Compaq Port 386/20.  I compiled the benchmark using Logical Systems 
C v88.4.  The LSC server (cio) allows you to enable or disable the DMA 
capability on the B008.  This can speed things up a lot when using an 286-style 
PC but not as much with a 386 (the 386 can poll almost as fast as the DMA can 
transfer - it's not such a great DMA design...).  I tried it with and without 
the DMA: here's what I got:

     Trial 1: DMA disabled (polling)     3.798 sec
     Trial 2: DMA enabled                4.231 sec

Wow!  Did I misplace a decimal point?  No, I ran each 3 times and verified 
against a wall clock.  I can't believe that the B008 and B014 interfaces are 
that much different.  I suspect we're seeing the difference between a server 
running on a dedicated CPU (the PC) vs. one running under a multi-user OS like 
Unix.  Or are the drivers for the B014 to blame.

     OK netters...the source code for the benchmark is provided.  Anyone have a 
Unix system that uses a faster interface (e.g. FIFOs or dual-ported memory) from 
someone like Transtech, Meiko, Topologix, et al.  And does anyone have a T800 
with a SCSI disk connected directly and appropriate software?  Helios has a 
standalone filing system for the Inmos B422 SCSI TRAM (OUG News #13).  How about 
hereing from anyone with a Yarc transputer board or another PC-type board that 
has a faster interface to the PC.  Can our friendly neighborhood hardware 
designed at Inmos (yes Dave Boreham, that's you...) contribute any info about 
the B008 and B014 link adapter designs or the driver designs.  Do you have a 
driver for the B016 yet that uses dual-ported memory and can you run a benchmark 
on this setup?

     Sorry, I'm bored writing documentation, I like playing with benchmarks, and 
this looks like an interesting one for figuring out filer performance.  This 
looks like a fun project to get lots of times from different HW/SW 
configurations.  It'll take each person about 15 minutes to get it from your 
email file into your system, compile it, and read the numbers.  I'll try to keep 
track of the scores.

     So let's all go play in our sandboxes and compare results.


Ken Koontz
The Johns Hopkins University
  Applied Physics Laboratory
Laurel, MD USA 
email: koontz@capsrv.jhuapl.edu

davidb@brac.inmos.co.uk (David Boreham) (07/29/90)

In article <9007271900.AA09549@tcgould.TN.CORNELL.EDU> koontz%capvax.decnet@CAPSRV.JHUAPL.EDU ("CAPVAX::KOONTZ") writes:
>Vandana,
>
>     You have just experienced the problem with general purpose I/O on a 
>transputer.  The problem is not in the transputer itself but in the fact that it 
...
>20Mbps), Inmos 'sez that the B014 VMEbus interface will only handle around 
>150KB/sec.  If the interface could handle the link speed, we might see up to an 
 ^^^^^^^

  Well actually the link adaptor will do ~800K.
  The link adaptor in any circuit you can design
  and put into a computer will do ~600K on blind reads
  and ~300K polled. 
  The 150K is half what should be realised, probably
  due to SUN's slow VME and allowing for another user process.

>11.67 times improvement.  Thus, your T800 time could become something around 
>2.7secs.  This is still 20 times slower than the Sun-3 and 40 times slower than 
>the Sun-4.  So this isn't the complete answer.
>
...
>
>     So how to speed things up?  You can either get a better interface between 
>the Sun and the T800 (get rid of that link adapter!) or you can connect a disk 
>directly to the T800.  Or maybe you can increase the priority of the server and 
>driver under Unix.  Or maybe you can get a better C compiler...
>
>     Just for fun, I tried your benchmark on one of our PC systems.  The 
>transputer hardware consists of an Inmos B008 with a B404-3 (T800-20) on Slot 0. 
>The PC is a Compaq Port 386/20.  I compiled the benchmark using Logical Systems 
>C v88.4.  The LSC server (cio) allows you to enable or disable the DMA 
>capability on the B008.  This can speed things up a lot when using an 286-style 
>PC but not as much with a 386 (the 386 can poll almost as fast as the DMA can 
>transfer - it's not such a great DMA design...).  I tried it with and without 
                            ^^^^^^^^^^^^^^^^
   I guess you mean the PC's DMA design ??  Yes, I thought so.

>the DMA: here's what I got:
>
>     Trial 1: DMA disabled (polling)     3.798 sec
>     Trial 2: DMA enabled                4.231 sec
>
>Wow!  Did I misplace a decimal point?  No, I ran each 3 times and verified 
>against a wall clock.  I can't believe that the B008 and B014 interfaces are 
>that much different.  I suspect we're seeing the difference between a server
           ^^^^^^^^^^
   Nope, fundamentally identical.
 
>running on a dedicated CPU (the PC) vs. one running under a multi-user OS like 
>Unix.  Or are the drivers for the B014 to blame. 

   Wait a minute, you're running a different server, right ?
   Run that sucker on a B014 and see what you get there, I'd be
   quite interested. 

>
>     OK netters...the source code for the benchmark is provided.  Anyone have a 
>Unix system that uses a faster interface (e.g. FIFOs or dual-ported memory) from 
>someone like Transtech, Meiko, Topologix, et al.  And does anyone have a T800 
>with a SCSI disk connected directly and appropriate software?  Helios has a 
>standalone filing system for the Inmos B422 SCSI TRAM (OUG News #13).  How about 
                                        ^^^^^^^^^^^^^^
    Yes, works rather well I'm told. Mabe I can get the benchmark
    run on that system, although it's off the premises at the moment.

>hereing from anyone with a Yarc transputer board or another PC-type board that 
>has a faster interface to the PC.  Can our friendly neighborhood hardware 
>designed at Inmos (yes Dave Boreham, that's you...) contribute any info about 
 ^^^^^^^^^^^^^^^^^
  Hmm, problems with the keyboard Ken ?, my Mother says I was designed
  in Edinburgh :) 
   
>the B008 and B014 link adapter designs or the driver designs.  Do you have a 
>driver for the B016 yet that uses dual-ported memory and can you run a benchmark 
>on this setup? 

  RSN. B016 goes about five times as fast as B014 in a SUN3/140,
  will do full link speed.  

>
>     Sorry, I'm bored writing documentation, I like playing with benchmarks, and 
>this looks like an interesting one for figuring out filer performance.  This 
>looks like a fun project to get lots of times from different HW/SW 
>configurations.  It'll take each person about 15 minutes to get it from your 
>email file into your system, compile it, and read the numbers.  I'll try to keep 
>track of the scores.
>

Problems interfacing transputers to general purpose computers:

1. All software known expects to talk to a LINK and be a BYTE stream.
   This immediatly screws things up because this means linkadaptors
   unless major hacking is done and also some degree of playing with
   individual bytes as they go though.

2. Link adaptors (the present one, in C011 and C012 guise) are not suitable
   for interfacing to computers. They are basically a toy usefull for 
   talking to slow dumb peripherals. The big lack is BUFFERING and their
   8-bit ness.

Every link adaptor interface is limited to a few hundred Kbytes/s.
If the computer is a multitasking one, the device driver will steal
CPU time from everything else because it is polling the link adaptor.

Obviously other computer-to-link interface schemes are possible, all 
of which promise full link bandwith capability. I designed my first 
in 1985 and invented quite a neat one a couple of years ago. The goal
being full link speed with minimum CPU loading.
The reason why these
schemes are not universal and link adaptors consigned to the trash are
as follows:

1. Expense.
2. Software incompatibility.   
3. No standard approach to follow.
4. Preceived lack of need for fast I/O !!!

That last one can be partly explained by noting that the iserver,
as implemented, and the iserver protocol impose a much lower limit
on communication speed than does the link adaptor. 

Expect to see a new link adaptor sometime which is designed properly.

Start thinking about how to solve the interface problems and software
with H1 DS links @ tens of megabytes/s :)  


David Boreham, INMOS Limited | mail(uk): davidb@inmos.co.uk or ukc!inmos!davidb
Bristol,  England            |     (us): uunet!inmos.com!davidb
+44 454 616616 ex 547        | Internet: davidb@inmos.com

jeremy@cs.ua.oz.au (Jeremy Webber) (07/31/90)

In article <8932@ganymede.inmos.co.uk> davidb@brac.inmos.co.uk (David Boreham) writes:

   In article <9007271900.AA09549@tcgould.TN.CORNELL.EDU> koontz%capvax.decnet@CAPSRV.JHUAPL.EDU ("CAPVAX::KOONTZ") writes:
    ...
   >20Mbps), Inmos 'sez that the B014 VMEbus interface will only handle around 
   >150KB/sec.  If the interface could handle the link speed, we might see up to an 
    ^^^^^^^

     Well actually the link adaptor will do ~800K.
   ...
   Problems interfacing transputers to general purpose computers:

   1. All software known expects to talk to a LINK and be a BYTE stream.

   2. Link adaptors (the present one, in C011 and C012 guise) are not suitable
      for interfacing to computers. They are basically a toy usefull for 
      talking to slow dumb peripherals. The big lack is BUFFERING and their
      8-bit ness.

I was faced with the task of interfacing a B014 to an SGI Personal Iris,
including writing device drivers from scratch.  Fundamentally, the reason the
B014 is so slow is because it does 8bit data transfers, and has no support for
DMA.  On our 32 bit R2000 based machine this is a _real_ lose, both in link
speed and bus bandwidth consumed.

The fact that the link semantics are a byte stream have nothing to do with it.
RS232 connections are also byte streams, but serial controllers which do word
tranfers over the bus, with DMA and siloing, have been around for donkey's
years.  Whoever designed the B014 was either designing a toy or hadn't read the
literature.

As an aside, the device driver was easy to write, and does polling rather than
interrupts because there are less instructions needed per byte transferred
(when there is data available) this way.  This does, however, make the device
driver unusable on a multi-user computer because the CPU spins when there is no
data available (yes, the driver does relinquish the CPU after a certain number
of spins).

		-jeremy webber
--
--
Jeremy Webber			   ACSnet: jeremy@chook.ua.oz
Digital Arts Film and Television,  Internet: jeremy@chook.ua.oz.au
60 Hutt St, Adelaide 5001,	   Voicenet: +61 8 223 2430
Australia			   Papernet: +61 8 272 2774 (FAX)