bcw (09/14/82)
From: Bruce C. Wright @ Duke University Re: DZ-11 vs DH-11 It is entirely reasonable that there are tradeoffs between the DZ-11 and the DH-11. The DH-11 requires the acquisition of things like buffered data paths (on the VAX), and freeing them when done; it also requires some extra overhead for setting up the dma transfer registers. The considerations would look something like: DZ-11 time = per_io_setup_time + length_of_transfer * (character_translation_time + interrupt_time) DH-11 time = per_io_setup_time + length_of_transfer * character_translation_time + interrupt_time + dma_setup_time where: per_io_setup_time is the time to set up the i/o operation (read and check parameters, etc) length_of_transfer is the transfer size in bytes character_translation_time is the time to translate things like tabs and so forth (you will have to scan the string and move the translated characters to some new buffer) interrupt_time is the length of an interrupt dma_setup_time is the time required to set up the dma transfer. If the transfer is short enough (1 or two bytes), the DZ-11 will clearly win since it will not have the dma setup time, and will have the same number of interrupts as the dh-11. If the transfer is long enough, the DH-11 will win because all those interrupts will not happen for the dh. The only question is where the tradeoff point is, which is going to be hardware and operating-system dependent. I have heard that this happens around 10-15 characters, but have not done any timing myself; the claim was by someone with an ax to grind (DEC maybe, or maybe Able), so I'm not sure that the numbers haven't been fiddled. For many systems (especially if they tend to do single-character or only a few-character I/O), the DH-11 will clearly not be worth it. The DMF-32 has both programmed I/O (like the DZ) and dma I/O (like the DH), so the device driver can choose the most efficient method, so this looks like a better choice in the long run (though I think the DMF-32 is still not intelligent enough...). Bruce C. Wright @ Duke University
thomas (09/14/82)
Ken O's comments about DH inefficiencies on Vaxen do not apply to the 4.1bsd operating system. Why? Because the DH driver PERMANENTLY allocates a piece of Unibus map which points to the clist area. Thus, there is NO overhead of setting up the Unibus map for DH DMA output. There is still the normal overhead of setting up the DMA transfer, but this is small. =Spencer
swatt (09/14/82)
I can contribute a LITTLE information on the programmed I/O vs. DMA tradeoff point. We have a Megatek 7200 graphics system running which can do both. The DMA interface is quite conventional. The PIO operation never interrupts unless you specify a bad address (even then the interrupt can be disabled and the error is still available in the status register). I modified a driver I got from Purdue which had provisions for choosing which method to use based on transfer size. This driver was for an 11/70 and the threshold was set at 20 bytes. I/O to Megatek graphics memory is always in terms of 32-bit words, so the Purdue driver would use DMA for transfers of 5 Megatek words or more. The PIO operation is such that if you're transferring several words to sequential addresses, you only load the address once. Thereafter each transfer is just: load most significant 16-bit half-word load least significant 16-bit half-word check for error The transfer goes as fast as the VAX can run that loop. I have added code that allows a user-settable threshold and have done some crude experiments in PIO vs. DMA overhead. The default threshold is now 64 megatek words (128 bytes) and it seems that even for those size transfers PIO is less overhead than DMA. I haven't looked closely at the Unibus map allocate operation, but it must be fairly involved. Now for devices like DZ's, where you can't transfer characters as fast as the CPU can stuff them, the tradeoff point obviously depends on how fast you can service an interrupt. Berkeley 4.1 has a special assembly-language transmit interrupt routine for DZ's that take characters out of a buffer and stuff them into the DZ data buffer and only call the C interrupt routine when the buffer is empty. The Berkeley documents say that 1 DZ line doing continuous output at 9600 baud consumes 5% of a 780 CPU, verses 3% for the same output from a DH line. If you had a DZ device with an internal buffer of say 64 characters per line, and you could stuff characters into that buffer as fast as the CPU could loop, and only get an interrupt when the buffer was empty, then I bet such a device would be less overhead than a DH in all cases (for VAX anyway; PDP-11 might be different). UNIX won't do DMA to DH devices in hunks larger than a cblock structure can hold anyway (28 characters on 4.1bsd; 14 characters on standard V7). I'm SURE you could stuff 28 bytes in a loop in a lot less time than it takes to allocate and free a Unibus map. - Alan S. Watt ([decvax!]ittvax!swatt)