dupuy@amsterdam.columbia.edu.UUCP (02/18/87)
While reading Tanenbaum's new OS book (the Minix book) a sort of half baked idea came to me. While sophisticated OS design strives to minimize the number of memory to memory copies, there is some irreducible minimum in a system with distinct user and kernel spaces. Since the DMA chip on your favorite disk/tape controller works by stealing bus cycles when the CPU is busy with other things (like arithmetic), would there be any advantage in having a DMA chip which would simply be used for memory to memory copies (from user to kernel space, or from one user space to another)? At some point the various DMA chips may start bumping into each other, and it may not be worth the effort (in more complex bus access/arbitration logic) to add this memory to memory DMA chip. But given sufficient bus bandwidth, if the CPU spends a significant amount of time without accessing the bus, there could be a significant performance boost for current operating systems. So what do you hardware types think? Is there anything to this idea? Does this sort of thing already exist, and I just don't know about it? Or is there some problem which I have missed? @alex ---- arpa: dupuy@columbia.edu uucp: ...!seismo!columbia!dupuy
ron@brl-sem.UUCP (02/18/87)
In article <4343@columbia.UUCP>, dupuy@amsterdam.columbia.edu (Alexander Dupuy) writes: > Since the DMA chip on your favorite disk/tape controller works by stealing > bus cycles when the CPU is busy with other things (like arithmetic), would > there be any advantage in having a DMA chip which would simply be used for > memory to memory copies (from user to kernel space, or from one user space to > another)? It's a good idea. I'm glad I've used computers that the designers had thought of it. This is used most commonly in certain graphics displays to make the block memory moves on the display for things like windows happen faster. The Denelcor HEP super computer had a block transfer hardware device, but we never got around to making use of it before we scrapped the thing to make room for the CRAY. UNIX could probably see a pretty good speed up from this thing. Certain performance studies show that UNIX spends a majority of it's kernel time shuffling data between the buffer cache and user data space. -Ron
grr@cbmvax.UUCP (02/18/87)
In article <4343@columbia.UUCP> dupuy@amsterdam.columbia.edu (Alexander Dupuy) writes: > > Since the DMA chip on your favorite disk/tape controller works by stealing >bus cycles when the CPU is busy with other things (like arithmetic), would >there be any advantage in having a DMA chip which would simply be used for >memory to memory copies (from user to kernel space, or from one user space to >another)? Many general purpose DMA controller chips can already do this sort of thing. Other, fancier things like BLIT chips can also to high-speed memory to memory DMA as a degenerate case. -- George Robbins - now working for, uucp: {ihnp4|seismo|rutgers}!cbmvax!grr but no way officially representing arpa: cbmvax!grr@seismo.css.GOV Commodore, Engineering Department fone: 215-431-9255 (only by moonlite)
farren@hoptoad.UUCP (02/18/87)
In article <4343@columbia.UUCP> dupuy@amsterdam.columbia.edu (Alexander Dupuy) writes: > >While reading Tanenbaum's new OS book (the Minix book) a sort of half baked >idea came to me. No, your idea seems fully baked :-) > Since the DMA chip on your favorite disk/tape controller works by stealing >bus cycles when the CPU is busy with other things (like arithmetic), would >there be any advantage in having a DMA chip which would simply be used for >memory to memory copies (from user to kernel space, or from one user space to >another)? Many DMA circuits that are not designed for a single purpose (i.e., disk controller to memory transfer) can be used like this, and it IS a good idea, as long as the system is not DMA intensive. In particular, a number of micros (specifically, the Amiga) have this capability, and use it. Particularly good if you are copying large blocks of memory to other memory spaces, a task often associated with graphics, but otherwise very useful. DMA can usually do the move from two to four times faster than even a tightly-coded loop. -- ---------------- "... if the church put in half the time on covetousness Mike Farren that it does on lust, this would be a better world ..." hoptoad!farren Garrison Keillor, "Lake Wobegon Days"
elh@vu-vlsi.UUCP (02/18/87)
In article <4343@columbia.UUCP>, dupuy@amsterdam.columbia.edu (Alexander Dupuy) writes: > ....would > there be any advantage in having a DMA chip which would simply be used for > memory to memory copies (from user to kernel space, or from one user space to > another)? > I believe this is currently a feature on many commercially available DMA controllers. In particular, while I was working on the architecture/ partitioning of some of the peripheral chips for the ATT WE32XXX family, we decided to include this feature in the the DMA member of that family. This has shown increased performance in memory-to-memory copies in the operating system (as reported in a paper in the 1986 International Conference on Computer Design... I forget the exact reference). This part also has a number of other interesting features including a *separate* byte wide bus which services commercially available byte wide peripherals (disk, lan, etc. controllers), byte to word packing, word buffering and burst mode bus transactions.... The peripherals on the byte wide bus lie in the address space of the DMA peripheral which lies somewhere in the address space of the system (obviously...). Dr. Ed Hepler, Adjunct Prof. Villanova University Staff Engineer, GE Astro Space, Valley Forge (Formally MTS, Bell Labs, Naperville, Ill.)
kds@mipos3.UUCP (02/19/87)
like has been said before, most DMA chips can be set up to do this, but... ...whether what you are suggesting is effective depends on the system. I believe that the 680[12]0 and the [23]86 processors are capable of moving data across the entire width of the data bus at the maximum bus bandwidth, so to move your data around as quickly you'd have to have a 32-bit dma chip around that can also run at the maximum processor bandwidth. Also, the setup time at the beginning of the transfer is probably going to be longer, since it usually takes longer just to set one of these things up, and if it is sitting on the other side of the memory management, you have to take that into account. Also, whether it is going to really be effective is dependent on whether the processor can really do something useful while DMA is going on, since if it cannot gain access to the bus during the transfer, it will just be sitting there anyway if it needs to get something from memory. Some DMA controllers have a "throttle" which limits their maximum bus utilization to take care of this so the processor can get in a transfer edgewise to take care of problems like this. And another novel use of DMA controller? I believe the original IBM pc uses a DMA controller to do DRAM refresh. -- The above views are personal. The primary reason innumeracy is so pernicious is the ease with which numbers are invoked to bludgeon the innumerate into dumb acquiescence. - John Allen Paulos Ken Shoemaker, Microprocessor Design, Intel Corp., Santa Clara, California uucp: ...{hplabs|decwrl|amdcad|qantel|pur-ee|scgvaxd|oliveb}!intelca!mipos3!kds csnet/arpanet: kds@mipos3.intel.com
rdt@houxv.UUCP (02/19/87)
simple answer. There is a distinct tradeoff between the size of the block transfer (how large a chunk) and the amount of indirection overhead in: 1) making a system call, 2) being authorized to use the DMA in the fashion you want, (permission checks via probes and address translations; recall that the DMA works in physical space where as the CPU tends to work in virtual spaces) 3) and then setting up the DMA context registers with the necessary pointers, sizes and configuration information. CONCLUSION: for transfers under 32-64 words, the overhead time to setup the dma may swamp the more efficient block move capabilities of the specialized dma hardware. mileage (breakeven point) may vary with OS system call structure, and manufacturers MMU and DMA hardware. Richard Trauben ATTIS, Holmdel, New Jersey WE32x00 Processor Development
andy@batcomputer.UUCP (02/19/87)
Every board in our FPS T-20 (18 Transputers, 16 vector coprocessors) has a DMA chip that can be used for memory-to-memory copies. They also use video RAMs (with a special address mode to get at more than 32 bits / fetch) to pump things into and out of the vector processor. Nifty. -- Andy Pfiffer andy@tcgould.tn.cornell.edu Cornell Theory Center / Cornell U. cornell!batcomputer!andy Home of the first usable T-Series (607) 255-8686 "...that's the way a Transputer works, right?" Systems Group
mahar@weitek.UUCP (02/19/87)
In article <4343@columbia.UUCP>, dupuy@amsterdam.columbia.edu (Alexander Dupuy) writes: > Since the DMA chip on your favorite disk/tape controller works by stealing > bus cycles when the CPU is busy with other things (like arithmetic), would > there be any advantage in having a DMA chip which would simply be used for > memory to memory copies (from user to kernel space, or from one user space to > another)? > The memory management hardware could be used to send messages between the OS and tasks if messages were multiples of the page size. This could reduce message transfer time a lot. -- Mike Mahar UUCP: {turtlevax, cae780}!weitek!mahar Disclaimer: The above opinions are, in fact, not opinions. They are facts.
paul@unisoft.UUCP (02/21/87)
In article <630@vu-vlsi.UUCP> elh@vu-vlsi.UUCP (Edward L. Hepler) writes: >DMA controllers. In particular, while I was working on the architecture/ >partitioning of some of the peripheral chips for the ATT WE32XXX family, .... >This part also has a number of other interesting features including >a *separate* byte wide bus which services commercially available >byte wide peripherals (disk, lan, etc. controllers), byte to word >packing, word buffering and burst mode bus transactions.... The >peripherals on the byte wide bus lie in the address space of the >DMA peripheral which lies somewhere in the address space of the system >(obviously...). > >Dr. Ed Hepler, Adjunct Prof. Villanova University > Staff Engineer, GE Astro Space, Valley Forge > (Formally MTS, Bell Labs, Naperville, Ill.) Everyone doing serious peripheral design should look at this chip (WE32106 I think). It must the the best DMA chip on the market ... only one catch - the price $250 a piece in 100 quantities. Still, if you want to see how to design a DMA chip right (esp. if you are going to design one), look at this one. Paul Campbell ..!ucbvax!unisoft!paul
mac@uvacs.UUCP (02/23/87)
> Since the DMA chip on your favorite disk/tape controller works by stealing > bus cycles when the CPU is busy with other things (like arithmetic), would > there be any advantage in having a DMA chip which would simply be used for > memory to memory copies (from user to kernel space, or from one user space to > another)? Been done. I believe it was on the Univac 1100s. Used for plated-wire to core memory transfers, among other things.
bzs@bu-cs.UUCP (03/04/87)
> Since the DMA chip on your favorite disk/tape controller works by stealing >bus cycles when the CPU is busy with other things (like arithmetic), would >there be any advantage in having a DMA chip which would simply be used for >memory to memory copies (from user to kernel space, or from one user space to >another)? I remember suggesting this on our LSI-11 systems using the DMA buffer in the RX02 floppy drive as you could write/read it w/o any I/O going to the disk (load/unload buffer [DMA] and xfer to/from disk were different operations.) I figured we could buy an extra RX02 controller if this brilliant idea worked (this was around '78-'79 I guess.) Just an anecdote, we never tried it because the Mini-Unix was too busy swapping to the RX02 to be used for this...I think we figured out it wouldn't be very fast either. Anyhow, maybe you already have a device to try it with, sure this wouldn't work on your disk controller or some such (hey, no warranties expressed or implied!) -Barry Shein, Boston University
greg@utcsri.UUCP (Gregory Smith) (03/04/87)
Somebody writes: > Since the DMA chip on your favorite disk/tape controller works by stealing > bus cycles when the CPU is busy with other things (like arithmetic), would > there be any advantage in having a DMA chip which would simply be used for > memory to memory copies (from user to kernel space, or from one user space to > another)? The Z80 DMA chip does this. It can copy from a moving memory address to a moving memory address, or from a moving memory address to a constant i/o address, or the reverse of the latter. Actually, in most configurations, the DMA chip steals the bus whether the CPU wants it or not - you still win since a DMA copy operation can be done with two memory cycles per byte, and most CPU's don't allow this. The Z80 chip was the first DMA chip I ran into, and I was surprised to find out that others were not like that (most others generate moving memory addresses and control signals, and the I/O device must be wired to place the data on the bus (or read it) during the memory cycle). With the Z80 setup, the DMA chip addresses the I/O device, so if this device is always ready, it needs no special hardware and doesn't 'know' it is being addressed by the DMA chip rather than the CPU. If it is not always ready, there is a control signal to throttle the DMA. The disadvantage is that I/O transfers require two memory cycles per byte, whereas only one is required in the usual setup. -- ---------------------------------------------------------------------- Greg Smith University of Toronto UUCP: ..utzoo!utcsri!greg Have vAX, will hack...
adam@gec-mi-at.co.uk (Adam Quantrill) (03/13/87)
In article <4343@columbia.UUCP>, dupuy@amsterdam.columbia.edu (Alexander Dupuy) writes: >> [] would >> there be any advantage in having a DMA chip which would simply be used for >> memory to memory copies (from user to kernel space, or from one user space to >> another)? Yup. You don't wear out the cpu so much. -- -Adam. /* If at first it don't compile, kludge, kludge again.*/
njh@root44.UUCP (03/23/87)
In article <518@gec-mi-at.co.uk> you write: >In article <4343@columbia.UUCP>, dupuy@amsterdam.columbia.edu (Alexander Dupuy) writes: >>> [] would >>> there be any advantage in having a DMA chip which would simply be used for >>> memory to memory copies (from user to kernel space, or from one user space to >>> another)? > >Yup. You don't wear out the cpu so much. When doing a port of UniPlus+ on a machine with a spare channel on it's 68450 I tried a few benchmarks. I'm afraid (with a 68k at least) it was slower using the 68450 than the 68010 (dbra's, moveml's etc.) for memory to memory copies. I put this down to overhead of setting it up in C, the CPU having to wait for the DMAC to finish (as this gives rise to bus contention - the 68010 has to do *something* while the 68450 is copying, a wait till finish means it has to sit in a loop reading instructions, or you run the 68450 in interrupt mode, in which case you have all that nasty interrupt goo after it's finished) etc. Sorry about the double subclauses in the above paragraph - I never was any good at English. Anyhow, moral is, leave the CPU do to memory copies, it's not worth the hassle. -- -- Nigel Horne, Divisional Director, Root Technical Systems. <njh@root.co.uk> G1ITH Fax: (01) 726 8158 Phone: +44 1 606 7799 Telex: 885995 ROOT G BT Gold: CQQ173
chris@mimsy.UUCP (03/25/87)
In article <246@root44.root.co.uk> njh@root.co.uk (Nigel Horne) writes: >When doing a port of UniPlus+ on a machine with a spare channel on it's >68450 I tried a few benchmarks. I'm afraid (with a 68k at least) it was >slower using the 68450 than the 68010 (dbra's, moveml's etc.) for memory >to memory copies. ... (I assume you mean `movl's; moveml will not run in loop mode.) >... leave the CPU do to memory copies, it's not worth the hassle. Well, now, that depends on the machine architecture. (Must be why this is in comp.arch :-).) We have some Heurikon 68010 based boards that, when the MMU is enabled, suffer a wait state per CPU memory access. The DMA chip does not go through the MMU, and copies over a certain size (we have not yet caculated or measured just *what* size) will run faster when done via the DMA chip in spite of the setup overhead. But as yet we are not worried about this sort of (small) performance improvement. McMob needs first an O/S.... -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7690) UUCP: seismo!mimsy!chris ARPA/CSNet: chris@mimsy.umd.edu
elh@vu-vlsi.UUCP (03/26/87)
In article <246@root44.root.co.uk>, njh@root.co.uk (Nigel Horne) writes: > In article <518@gec-mi-at.co.uk> you write: > >In article <4343@columbia.UUCP>, dupuy@amsterdam.columbia.edu (Alexander Dupuy) writes: > >>> [] would > >>> there be any advantage in having a DMA chip which would simply be used for > >>> memory to memory copies (from user to kernel space, or from one user space to > >>> another)? > > > >Yup. You don't wear out the cpu so much. > > When doing a port of UniPlus+ on a machine with a spare channel on it's > 68450 I tried a few benchmarks. I'm afraid (with a 68k at least) it was > slower using the 68450 than the 68010 (dbra's, moveml's etc.) for memory > to memory copies...... > > Nigel Horne, Divisional Director, Root Technical Systems. This is probably due in part to the fact that the 68450 does not (I believe) have the capability of doing "burst" transfers (Issue one address followed by 2,4 words of data without the overhead of the entire bus cycle). This along with (if I remember correctly) the fact that the part had a multiplexed address/data bus (at the part) hurt its performance. Notice that using moveml instructions to effect a block move (as suggested in the article) really emulate a "burst" (in a manner). Up to 16 words (every register) can be copied in using contiguous bus cycles (non-multiplexed) and then copied back out... The ATT (WE32xxx) part (which I am familiar with) provides the capability to perform such burst mode transfers. Of course the memory system must be capable of servicing such requests. Ed Hepler Villanova University
davidsen@steinmetz.UUCP (03/27/87)
In article <246@root44.root.co.uk> njh@root44.UUCP (Nigel Horne) writes: >In article <518@gec-mi-at.co.uk> you write: >>In article <4343@columbia.UUCP>, dupuy@amsterdam.columbia.edu (Alexander Dupuy) writes: >>>> [] would >>>> there be any advantage in having a DMA chip which would simply be used for >When doing a port of UniPlus+ on a machine with a spare channel on it's >68450 I tried a few benchmarks. I'm afraid (with a 68k at least) it was >slower using the 68450 than the 68010 (dbra's, moveml's etc.) for memory >to memory copies. What I think you mean is "takes less real time" using the CPU. If you are doing an operation which requires waiting until the memory has been moved this is correct. If you can make other good use of the CPU to run another process, the system will probably run faster using DMA. For instance, moving a process in memory to garbage collect, might have enough overhead with pointer fiddling in tables to make the total real time less using DMA. The overhead of handling the interrupt is trivial: set a flag in the interrupt handler and return. The CPU can loop on the flag when the rest of the tasks are done. -- bill davidsen sixhub \ ihnp4!seismo!rochester!steinmetz -> crdos1!davidsen chinet / ARPA: davidsen%crdos1.uucp@ge-crd.ARPA (or davidsen@ge-crd.ARPA)
mats@forbrk.UUCP (03/30/87)
In article <668@vu-vlsi.UUCP> elh@vu-vlsi.UUCP (Edward L. Hepler) writes: >The ATT (WE32xxx) part (which I am familiar with) provides the capability >to perform such burst mode transfers. Of course the memory system must >be capable of servicing such requests. This is a lovely chip, agreed. Now, if only the price would come down to where one could afford to use it in a moderately priced system.... Our hardware guys had to reject it right away becuase of its' high cost, and because the AT&T rep didn't see any prospects of it coming down at all. We were facing having this part be the most expensive chip in the system, since the 68020 CPU and 68851 MMU are clearly coming down quickly. Sigh. Mats Wichmann