[comp.os.minix] DMA on the AT

dpi@loft386.UUCP (Doug Ingraham) (09/21/89)

In article <397@crash.cts.com>, jca@pnet01.cts.com (John C. Archambeau) writes:
> dpi@loft386.UUCP (Doug Ingraham) writes:

>> <earlier analysis deleted> 

> For a 6 MHz AT, you are right, but how many of us out there have a 6 MHz AT?
> I don't.  I have a 16 MHz 286 box.  Also, the higher speed '286 chips have to

It really doesn't matter what speed machine you have, as the Ratio between
CPU time to transfer and DMA time to transfer will remain constant.

<much deleted>

> >If we are talking about a good implementation of DMA I agree.  Unfortunatly
> >on the AT DMA should be used only for the floppy.
>  
> I do agree that I wouldn't implement DMA on a 6 MHz 286 (if you can find them
> anymore).  However, if the later machines can handle it better as I suspect,
> then why not use it?  Also, what about DMA on machine equipped with an EISA
> bus or MCA?  Would it work better than the classic 6 MHz AT bus?  Most likely,
> but then again, how many of us can afford the bus specs for MCA from IBM?

I didn't mean to start an argument about what could be or might have been.
I was simply providing a little analysis of the AT Bus.  And as Henry
Spencer pointed out in a related note there are cases where DMA is not
appropriate.  I believe that the hard disk on an 80x86 is one of those 
cases.  Let me detail the sequence of events that make up a hard disk
read data transfer.

1)  Select the cylinder, head, sector and read operation and issue
    the command.  

2)  Disk process goes to sleep waiting for an interrupt.

3)  Controller causes drive to seek to the requested cylinder.

4)  Controller selects the requested head and reads the requested
    sector into its buffer.

5)  Controller generates an interrupt which wakes up the sleeping
    disk process.

6)  The disk process moves the data from the controller's buffer into
    memory.  This could be DMA or programmed I/O whichever takes the
    least amount of real time.

It is in step 6 that the discussion about DMA vs Programmed I/O comes
into play.  It is actually step 2 where all the time in a hard disk
transfer is saved.  Remember that when step 6 is started the data has
been already been read.  Its just in the wrong place.  The most important
thing is to get it to where its supposed to be.  In the case of programmed
I/O step six the CPU instruction REP INPSW will transfer all the data
to the correct address in memory at the maximum possible speed.  The
setup time for the DMA controller alone is probably close to the transfer
time using programmed I/O.  And as I stated in a previous posting, due
to the design of the DMA logic the data will probably still have to be
moved to the correct place in memory via the CPU wasting still more time.

On the other hand, the floppy disk doesn't have a sector buffer.  This
means that either the CPU or DMA must wait for each character to become
available or be lost.  This is a perfect use for DMA since the data will
be available to the requesting process in almost the same time no matter
which one does the transfer.  Plus the CPU is free to work on other tasks
while the DMA is taking place.  DMA works well for unbuffered or
asynchronous transfers.  I have seen DMA chips that were able to do block
moves.  This would be another good use for DMA assuming that the CPU
was still able to execute or that the move via DMA was much faster than
could be done by the CPU.

Conclusion:

    DMA should not be used in the particular case of transfering the
sector (or track) buffer unless the transfer would take less real time.
In the case of the Intel 80286 (and 80386) the repeated string I/O
instructions are better able to utilize bus bandwidth than the DMA.
Although I haven't actually cranked through the instruction sequence
I suspect that DMA would beat the CPU on an XT machine.  The reason
is that the REP INPS and OUTS instructions don't exist.  Here is the
loop that would replace the REP INPSB sequence used in an AT:

	mov	cx,512		; byte count
	mov	dx,1F0H		; data register
	mov	di,offset buff	; data buffer pointer
rdloop:	in	al,dx		; read a data byte (8 clocks)
	stosb			; buff[di++] = al; (11 clocks)
	loop	rdloop		; if (--cx) goto rdloop; (17 clocks)


Thats 36 clocks/byte and even assuming a 10 mhz clock that works out to
1843 microseconds.  This is only 271k/second.  The DMA almost has to be
better than this on an XT.

Since MCA and EISA were mentioned, I thought I would look for DMA info.
From an article on the EISA bus there are four defined DMA modes:

* Standard: 4.1 Mbytes/sec (8 clocks/transfer)
* Type A:   5.3 Mbytes/sec (6 clocks/transfer)
* Type B:   8.3 Mbytes/sec (4 clocks/transfer)
* Type C:    33 Mbytes/sec (1 clock /transfer)

The only data I could find on MCA DMA did not include any specific
timings.  The article on the EISA bus implied that the above timings
were better than MCA.  Of the DMA speeds above, it isn't until we get
to the type B and C that we would gain an advantage using DMA.

I think that I have beat this topic to death.

-- 
Doug Ingraham (SysAdmin)
Lofty Pursuits (Public Access for Rapid City SD USA)
uunet!loft386!dpi