[net.unix-wizards] DZ-11 vs DH-11

bcw (09/14/82)

From:	Bruce C. Wright @ Duke University
Re:	DZ-11 vs DH-11

It is entirely reasonable that there are tradeoffs between the DZ-11
and the DH-11.  The DH-11 requires the acquisition of things like
buffered data paths (on the VAX), and freeing them when done;  it
also requires some extra overhead for setting up the dma transfer
registers.  The considerations would look something like:

	DZ-11 time = per_io_setup_time + length_of_transfer *
			(character_translation_time +
			 interrupt_time)

	DH-11 time = per_io_setup_time + length_of_transfer *
			character_translation_time +
			interrupt_time +
			dma_setup_time

where:	per_io_setup_time is the time to set up the i/o operation
			(read and check parameters, etc)
	length_of_transfer is the transfer size in bytes
	character_translation_time is the time to translate things
			like tabs and so forth (you will have to
			scan the string and move the translated
			characters to some new buffer)
	interrupt_time is the length of an interrupt
	dma_setup_time is the time required to set up the dma transfer.

If the transfer is short enough (1 or two bytes), the DZ-11 will
clearly win since it will not have the dma setup time, and will have
the same number of interrupts as the dh-11.  If the transfer is long
enough, the DH-11 will win because all those interrupts will not happen
for the dh.  The only question is where the tradeoff point is, which
is going to be hardware and operating-system dependent.  I have heard
that this happens around 10-15 characters, but have not done any
timing myself;  the claim was by someone with an ax to grind (DEC maybe,
or maybe Able), so I'm not sure that the numbers haven't been fiddled.

For many systems (especially if they tend to do single-character or
only a few-character I/O), the DH-11 will clearly not be worth it.  The
DMF-32 has both programmed I/O (like the DZ) and dma I/O (like the DH),
so the device driver can choose the most efficient method, so this
looks like a better choice in the long run (though I think the DMF-32
is still not intelligent enough...).

			Bruce C. Wright @ Duke University

thomas (09/14/82)

Ken O's comments about DH inefficiencies on Vaxen do not apply to the 4.1bsd
operating system.  Why?  Because the DH driver PERMANENTLY allocates a piece
of Unibus map which points to the clist area.  Thus, there is NO overhead
of setting up the Unibus map for DH DMA output.  There is still the normal
overhead of setting up the DMA transfer, but this is small.

=Spencer

swatt (09/14/82)

I can contribute a LITTLE information  on the  programmed I/O vs.
DMA tradeoff point.

We have a Megatek 7200  graphics  system  running  which  can  do
both.    The  DMA  interface  is  quite  conventional.   The  PIO
operation never interrupts unless you specify a bad address (even 
then the interrupt  can  be  disabled  and  the  error  is  still
available  in  the  status  register).  I modified a driver I got
from Purdue which had provisions for choosing which method to use 
based on transfer size.  This driver was for  an  11/70  and  the
threshold was set at 20 bytes.  

I/O to Megatek graphics memory  is  always  in  terms  of  32-bit
words,  so  the  Purdue  driver  would use DMA for transfers of 5
Megatek words or more.  

The PIO operation is such that  if  you're  transferring  several
words  to  sequential  addresses, you only load the address once.
Thereafter each transfer is just: 

	load most significant 16-bit half-word
	load least significant 16-bit half-word
	check for error

The transfer goes as fast as the VAX can run that loop.   I  have
added  code  that  allows a user-settable threshold and have done
some crude experiments in PIO vs.   DMA  overhead.   The  default
threshold  is  now 64 megatek words (128 bytes) and it seems that
even for those size transfers PIO is less overhead than  DMA.   I
haven't  looked closely at the Unibus map allocate operation, but
it must be fairly involved.  

Now for devices like DZ's, where you can't transfer characters as 
fast as the CPU can stuff  them,  the  tradeoff  point  obviously
depends  on  how fast you can service an interrupt.  Berkeley 4.1
has a special assembly-language transmit  interrupt  routine  for
DZ's that take characters out of a buffer and stuff them into the 
DZ  data  buffer  and  only call the C interrupt routine when the
buffer is empty.  The Berkeley documents say that 1 DZ line doing 
continuous output at 9600 baud consumes 5% of a 780  CPU,  verses
3% for the same output from a DH line.  

If you had a  DZ  device  with  an  internal  buffer  of  say  64
characters  per  line,  and  you could stuff characters into that
buffer as fast as the CPU could loop, and only get  an  interrupt
when the buffer was empty, then I bet such a device would be less 
overhead  than a DH in all cases (for VAX anyway; PDP-11 might be
different).  UNIX won't do DMA to DH devices in hunks larger than 
a cblock structure can hold anyway (28 characters on  4.1bsd;  14
characters on standard V7).  I'm SURE you could stuff 28 bytes in 
a  loop  in  a lot less time than it takes to allocate and free a
Unibus map.  

	- Alan S. Watt
	([decvax!]ittvax!swatt)