[comp.dcom.lans] IBM PC/AT DMA loses

romkey@kaos.UUCP (John Romkey) (11/21/87)

In article <155@tic.UUCP> ruiu@tic.UUCP (Dragos Ruiu) writes:
>   However, I had great difficulty believing their hardware section, which
>repeatedly says that DMA interface boards are slower than others. Unless there 
>is something I don't know, DMA is much faster than interrupt driven hardware.
>I have not had the time or the resources to research their claims, but
>common sense would indicate that this claim of theirs is way off base.

First off, interrupts and DMA are orthogonal. It's really a choice of DMA
versus programmed I/O. On the PC/AT, the DMA controller runs about 10%
slower than the standard PC DMA controller. On top of this, the 80286 has
string IN and OUT instructions in addition to the 8086 string MOV instruction.
Both string IN/OUT and string MOV instructions push data around a lot faster
than the DMA on that system controller can. You basically end up with the
processor sitting in a tight loop shoveling bytes across the bus.

Under MS-DOS this is fine, since MS-DOS is single tasking. You'll never have
another process waiting to use computes that are being used up by having
the processor move data that other hardware could move (more slowly). Under
a multitasking system you'd want to examine the tradeoff; in some situations
you'd get more aggregate computes out of the system if the I/O was performed
by the (slow) DMA controller than the processor, because other processes could
run while the system waited for the DMA to complete.
-- 
			- john romkey
		...mit-eddie!blblbl!kaos!romkey
		    romkey@xx.lcs.mit.edu

farren@gethen.UUCP (Michael J. Farren) (11/22/87)

In article <261@kaos.UUCP> romkey@kaos.UUCP (John Romkey) writes:
>
>First off, interrupts and DMA are orthogonal. It's really a choice of DMA
>versus programmed I/O. On the PC/AT, the DMA controller runs about 10%
>slower than the standard PC DMA controller. On top of this, the 80286 has
>string IN and OUT instructions in addition to the 8086 string MOV instruction.
>Both string IN/OUT and string MOV instructions push data around a lot faster
>than the DMA on that system controller can. You basically end up with the
>processor sitting in a tight loop shoveling bytes across the bus.

DMA and interrupts are NOT orthogonal.  In any interrupt-driven scheme,
there will be system overhead required for each and every byte of data
transferred.  In most systems I am aware of, this interrupt overhead so
overwhelms any one-time overhead involved in setting up DMA, and the
processor slowing associated with DMA (if any - many systems are
designed such that DMA does not affect the processor in any meaningful
way.  Most 68000 systems, for example, take full advantage of the fact
that the processor only requires the bus every other cycle, more or
less), for any transfer over a very few bytes, that considering
interrupts instead of DMA ensures great inefficiency.

On the choice between DMA and programmed I/O, much depends on the system
design.  The IBM architecture may not allow a distinct advantage for DMA
vs. programmed I/O, but many other systems do.  To make a general
statement that DMA is not as efficient as programmed I/O is wrong.

-- 
----------------
Michael J. Farren      "... if the church put in half the time on covetousness
unisoft!gethen!farren   that it does on lust, this would be a better world ..."
gethen!farren@lll-winken.arpa             Garrison Keillor, "Lake Wobegon Days"

truett@cup.portal.com (11/24/87)

On the IBM bus, the DMA loses yet again!  Remember that on that bus, the
CPU unconditionally grants the bus to any DMA request (it almost has to,
that's how the PC and XT do their dynamic memory refresh).  Thus, several
DMA devices can capture the bus and lock the CPU out.  If PIO is used,
though, the CPU has a choice.

On a bus that allows a DMA request to be blocked by a higher priority
compute task this problem probably does not occur.  Also, note that the
assumption that a CPU only needs the bus every n-th cycle is very dependent
on the design of the particular system being considered.  There are, I
believe, processors out there that can do a fetch and an operation on every
cycle.  I know some DSPs do and some RISCs probably do, not to memtion
highly pipelined microprocessors of more traditional type.

Truett Smith, Sunnyvale, CA
UUCP:  truett@cup.portal.com

romkey@kaos.UUCP (John Romkey) (11/24/87)

In article <372@gethen.UUCP> farren@gethen.UUCP (Michael J. Farren) writes:
>In article <261@kaos.UUCP> romkey@kaos.UUCP (John Romkey) writes:
>>
>>First off, interrupts and DMA are orthogonal. It's really a choice of DMA
>>versus programmed I/O.
>
>DMA and interrupts are NOT orthogonal.  In any interrupt-driven scheme,
>there will be system overhead required for each and every byte of data
>transferred.

It was my impression that the original article meant to say "programmed I/O"
instead of "interrupts", which is why I launched off into my discussion of
programmed I/O.

It isn't rational to discuss DMA vs. interrupts for any PC or PC/AT
bus network interface that I've encountered, and I've written or seen
drivers for most of them (the list would double the length of this
message). None of them give you an option of taking an interrupt per
byte while transferring data. You either do or do not take one
interrupt on receive or transmit completion (or DMA completion if you
use DMA), and you either use DMA to transfer data or you use
programmed I/O. The two are independent and therefore *orthogonal*. If
you want any responsiveness out of your network code (at least in a
TCP/IP implementation), you'll want to use interrupts regardless of
whether or not you use DMA. You'll decide whether to use DMA based on
the network interface's architecture (many memory-mapped interfaces don't
support it) and your bus. 

>Michael J. Farren      "... if the church put in half the time on covetousness
>unisoft!gethen!farren   that it does on lust, this would be a better world ..."
>gethen!farren@lll-winken.arpa             Garrison Keillor, "Lake Wobegon Days"

-- 
			- john romkey
		...mit-eddie!blblbl!kaos!romkey
		    romkey@xx.lcs.mit.edu

ruiu@tic.UUCP (11/25/87)

[The discussion is about why DMA would be slower on an AT than polled I/O]

In light of the facts pointed out by everyone, a poor implementation of DMA
seems to be available on the AT. A certain traditionalist streak in me refuses
to accept a tight loop as the highest performance data transfers.

So if DMA is no good, then what is the high performance approach needed to
'squeeze' every ounce of performance out of an AT ?

An aquaintance who is designing a major PC based hardware project has chosen
to use double-ported memory. Truett Smith has already suggested this as the 
solution.

So, in light of the dropping cost of such devices, they are the preferred way
to go. Right ?

Does anyone care to comment? Does anyone know of any products that use this
approach to data transfers ?

(What did I start with that innocuous first posting ??!!? :-)

-- 
Dragos Ruiu          Disclaimer: My opinons are my employer's, I'm unemployed!
            UUCP:{ubc-vision,mnetor,vax135,ihnp4}!alberta!edson!tic!dragos!work
(403) 432-0090         #1705, 8515 112th Street, Edmonton, Alta. Canada T6G 1K7 
Never play leapfrog with Unicorns...

phil@amdcad.UUCP (11/26/87)

In article <162@tic.UUCP> ruiu@tic.UUCP (Dragos Ruiu) writes:
>An aquaintance who is designing a major PC based hardware project has chosen
>to use double-ported memory. Truett Smith has already suggested this as the 
>solution.
>
>So, in light of the dropping cost of such devices, they are the preferred way
>to go. Right ?

Many dual ported systems are not made of dual ported memory devices.
Certainly none of mine are. It's hard to get 2 megabytes of dual
ported memory using dinky 2 kilobyte devices. 

-- 
I speak for myself, not the company.

Phil Ngai, {ucbvax,decwrl,allegra}!amdcad!phil or amdcad!phil@decwrl.dec.com

romkey@kaos.UUCP (John Romkey) (11/27/87)

In article <162@tic.UUCP> ruiu@tic.UUCP (Dragos Ruiu) writes:
>An aquaintance who is designing a major PC based hardware project has chosen
>to use double-ported memory. Truett Smith has already suggested this as the 
>solution.
>
>So, in light of the dropping cost of such devices, they are the preferred way
>to go. Right ?

Right. Many of the recent network interfaces for the PC and AT in fact use
dual-ported memory with the LAN controller hardware on one side and the
PC or AT bus on the other. In fact, the best network interfaces on the market
right now all take this approach.

But there's still a catch. Most of these network interfaces only provide
8K or 16K bytes of RAM. To get really good performance out of them, you
want their memory available to receive data from the net as soon as is
possible. So you end up copying the data into the PC's main memory. You can
actually program the DMA controller to do that, but who'd want to? Using an
8086 MOVS instruction is so much faster...it should be faster even on the PC,
but I don't have the books here to check it out and make sure.

So you still end up copying, rather than using the data in place.

You could put lots of memory on the network interface, like 256Kbytes, but
then you'd have two problems. The hardware would have a hard time mapping in
all that memory into the PC address space, so it would probably have to be
bank-switched. The software would have problems managing it and figuring out
who had buffers where and then trying to reclaim them later on.

The boards which are memory mapped include the Micom-Interlan NI5210,
the Western Digital WD8003, the Univation NIC and the Excelan EXOS205, all
of which are ethernet interfaces. Proteon also sells a memory-mapped IEEE 802.5
token ring card, which is either the P1340 or the P1344.

I'm sure I've left out a couple, but I just woke up...

>-- 
>Dragos Ruiu          Disclaimer: My opinons are my employer's, I'm unemployed!
>            UUCP:{ubc-vision,mnetor,vax135,ihnp4}!alberta!edson!tic!dragos!work
>(403) 432-0090         #1705, 8515 112th Street, Edmonton, Alta. Canada T6G 1K7 
>Never play leapfrog with Unicorns...
-- 
			- john romkey
		...mit-eddie!blblbl!kaos!romkey
		    romkey@xx.lcs.mit.edu

truett@cup.portal.com (11/28/87)

rulu@tic.uucp (Dragos Rulu) asks if there are any examples of commercial
hardware products that use the dual-ported memory approach to bulk data
movement into and out of an IBM PC/AT.  There are many, but let me illustrate
the preponderance of this method by looking at one situation where bulk data
must be moved quite quickly -- laser printer interfaces.

I know of at least four such interfaces that use some form of memory dual-
porting to achieve high transfer rates:  1) the Tall Tree JLaser, 2) the
Advanced Vision Research MegaBuffer, 3) the Cordata LBP interface, and
4) the AST Turbolaser interface.  I believe the Laser Master also does this.
Unfortunately, I am not very familiar with the Hewlitt-Packard LaserJet
interfaces.  This problem of moving data to a fast printing engine quickly
becomes most acute when bit-mapped graphics are involved.

Why all of the laser printer manuafacturers haven't come out with a SCSI
interface is beyond me!  It has the throughput and would allow the printer
interface to share a slot with other peripherals.  Similar considerations

would apply to input from image scanners.  This surfeit of proprietary
interfaces cannot be any good, in the long run, for the industry.  I would
note that most SCSI interfaces for the PC standard bus give the programmer
the choice of DMA or PIO control of the transfer.

Truett Smith, Sunnyvale, CA
UUCP:  truett@cup.portal.com