GORRIEDE@UREGINA1.BITNET (Dennis Robert Gorrie) (11/30/89)
I can understand you predicament well. The advertizing realy does confuse the whole matter of DMA vs Non-DMA. Supposed benchmarks confuse the matter even more. No member of this newsgroup has currently submitted enough information about this matter for anyone to gain a clear understanding of it. There has not been an overwhelming about of factual info about it at any rate. I've posted this twice before, but maybe this time there will be some response. How about a step by step explanation of DMA vs NON-DMA transfer, showing: 1)Under what conditions is there contention for chip RAM, for both DMA and Non-DMA. 2)Why is there contention for chip RAM? Why is access to chip RAM neccessary? 3)What are the solutions used in a209x, hardframe, and cltd controlers to get around the problem of chip ram contention? What are the resulting speeds (using same drive/mountlist)? 4) where are the device registers and buffers from DMA and NON-DMA devices? Where excactly are they located? What restricts them from being accessed on every cycle? What limits do they have in the size and speed of their transfers? +-----------------------------------------------------------------------+ |Dennis Gorrie 'Chain-Saw Tag... | |GORRIEDE AT UREGINA1.BITNET Try It, You'll Like It!'| +-----------------------------------------------------------------------+
ccplumb@rose.waterloo.edu (Colin Plumb) (12/01/89)
In article <8911300328.AA02859@jade.berkeley.edu> GORRIEDE@UREGINA1.BITNET (Dennis Robert Gorrie) writes: >No member of this newsgroup has currently submitted enough information about >this matter for anyone to gain a clear understanding of it. There has not >been an overwhelming about of factual info about it at any rate. Okay, I thought it had been hashed to death already, but here's another go-around. >How about a step by step explanation of DMA vs NON-DMA transfer, showing: >1)Under what conditions is there contention for chip RAM, for both DMA and > Non-DMA. When accessing Fast RAM, the 68000 has the lowest priority for the bus; any other master (like a DMA device) will get it before the 68000. On Chip RAM, the blitter takes priority over any of the above (modulo the BLITTER_NASTY bit which I won't go into here), the copper can lock out the blitter, and video/audio/disk/refresh DMA can lock out any of the above. Non-DMA means 68000. DMA means alternate Fast memory bus master. They are both locked out by exactly the same things in Chip memory. >2)Why is there contention for chip RAM? Why is access to chip RAM neccessary? Because 4 bitplanes high-res needs to read bytes out of Chip memory as fast as the memory can deliver them. The memory is so busy it has no time for the processor or other DMA. Access to chip RAM is necessary if the device driver gets a request to read a block from the disk into a buffer which happens to be located in Chip RAM. On my machine, for example, a 512K A1000, there is no non-chip RAM. Even if you have Fast RAM, it's legal to ask the device driver to use a buffer located in Chip. >3)What are the solutions used in a209x, hardframe, and cltd controlers > to get around the problem of chip ram contention? > What are the resulting speeds (using same drive/mountlist)? Adequate buffering. If you get bytes delivered from the hard drive faster than you can put them somewhere, you have no choice but to drop them. This affects all controllers, DMA, non-DMA, or some other technique that relies on feeding the bytes LSD and having them fly to the right place. The problem was with the 2090 (and 2090A) *only*, and was caused by what I think Dave Haynie will admit was Bad Design. The SCSI bus lets the controller say "not so fast!" to the drive, but they provided a small buffer (64 bytes, I think) on board and *assumed* the controller could write them to memory fast enough: they did not connect the "please slow down" signal to anything. If, for some reason, the DMA engine could not empty the buffer as fast as it was being filled, Bad Things would happen. Non-DMA controllers usually have a large (512 bytes and up and up and up) buffer on board, which the controller reads into (by something very much like DMA, just simpler than having to go through the system bus), and it never asks the drive for more than a buffer's worth of data at a time. When the trnasfer is finished, the controller interrupts the 68000, which runs some code to copy the bytes to their final destination. If that happens to be in Chip memory with 4 bitplanes high-res running, it will run Very Slowly, but it will get there. Because the controller has exclusive access to the buffer, you can be sure it will always be available for more data. This makes for a simpler design. The HardFrame, A590, 2091, and probably all othe DMA controllers use the obvious solution: if their on-chip buffers start backing up, they tell the drive to slow down. With SCSI drives, you can do this. Bytes will still be copied at the fastest speed possible, it just isn't the drive which is the bottleneck. I do not have speeds available, sorry. If the 2090 overflows, it retries ad nauseam until it finally, more by luck than good intentions, succeeds. (I am exaggerating the case a little.) This results in an amazing slowdown. Except for this stupid problem, DMA controllers are pretty much universally faster than non-DMA. >4) where are the device registers and buffers from DMA and NON-DMA devices? > Where excactly are they located? What restricts them from being accessed > on every cycle? What limits do they have in the size and speed of their > transfers? They are located somewhere in memory determined by the AutoConfig process. The spaces open are 200000-9FFFFF (usually used for memory) and E80000-EFFFFF (usually used for devices). So the controller usually has an address somewhere in the Exxxxx range, but where exactly is determined only when you boot your Amiga. Note that DMA controllers probably don't have buffers visible in this space: it's their job to write out the buffers, nobody needs to come in and read them. As for what stops the DMA device from writing every cycle: nothing, really. Obviously, it's possible for a designer to build some horribly slow thing, but these days 7.14 MHz (/2 for memory cycles) is loafing along and I could build something which could go at full speed. And, of course, if they don't have data to write, writing is a pretty silly idea. Thus "the disk drive" is a possible answer. For non-DMA devices, the processor has to spend as many cycles writing the bytes to the desired memory location as it does reading them from the controller. And since it's executing instructions while it's doing this, we need some extra cycles to fetch the instructions. So the operation of copying bytes to memory goes at less than half the speed. And don't forget the context-switching overhead, to start the data-copying code, etc. Non-DMA drives are limited by the size of their buffers. You can't make a transfer larger then one buffer without running into the buffer-overflow problem that the 2090 has problems with. It can be solved, by adding hardware, but simplicity is the reason for avoiding DMA, so it's unlilely to happen. DMA devices can transfer essentially unlimited amounts of data in one burst, because their buffer is main memory. Perhaps someone uses a 16-bit DMA chip which has 64K limits, but that's still lots bigger than the buffer a non-DMA device is likely to have. The fundamental limits are various bandwidth restrictions. The drive is obvious, but aside from that, the DMA device needs to be able to access the memory being written to. You can't do better than that. A non-DMA device needs to read the instructions to do the copy, (probably Chip memory if it's autoboot, as device drivers are loaded before Fast memory is configured), the controller's buffer (a Fast memory access), and write to the destination (the same time as the DMA controller). So it *can't* go any faster than a DMA controller. Now, there is one thing people pointed out: given that the drive is the bottleneck (an assertion the 2090 proved wrong), even a processor copy can be faster than the drive can deliver data. With some cleverness (double-buffering, probably) so the drive is kept busy during the processor copy, the *throughput* can be the same as a DMA controller, although the latency (time of request to time it's finished) will always be greater. Now, I don't have DiskPerf numbers, but aside from that, is there any remaining confusion? There was a Stupid Mistake in the 2090, which causes it to be abysmally slow in certain well-understood cases. No other DMA controllers make this mistake. Because non-DMA controllers do a stop-and-wait operation, filling a bucket and then pouring it somewhere, the problem doesn't tend to arise. a DMA device is like a hose, and putting more water in one end than you take out at the other is a problem requiring a bit more care. But it's a lot faster than using a bucket. -- -Colin
waggoner@dtg.nsc.com (Mark Waggoner) (12/02/89)
In article <18870@watdragon.waterloo.edu> ccplumb@rose.waterloo.edu (Colin Plumb) writes: > >Non-DMA drives are limited by the size of their buffers. You can't make >a transfer larger then one buffer without running into the buffer-overflow >problem that the 2090 has problems with. It can be solved, by adding >hardware, but simplicity is the reason for avoiding DMA, so it's >unlikely to happen. So are DMA disk devices. It's just that the buffer is in the disk drive instead of on the controller board. You can make transfers larger than the buffer size if you break them up into blocks, they way a DMA controller essentially does. > >DMA devices can transfer essentially unlimited amounts of data in one >burst, because their buffer is main memory. Perhaps someone uses a >16-bit DMA chip which has 64K limits, but that's still lots bigger >than the buffer a non-DMA device is likely to have. A DMA device's REAL buffer is the buffer in the SCSI drive. The reason the SCSI drive can "slow down" is that it isn't feeding you the data directly from the disk. If you talk about an ST-506 type interface, for instance, there is no way a DMA controller can solve the overflow problem unless it contains an on board buffer for at least a full sector. This means you have to have either a very large FIFO or a double transfer: from the disk interface to a buffer memory on the disk controller board and then from the buffer memory to main memory. The second transfer could be done by either DMA or by the Amiga CPU. Letting the CPU do it is cheaper, but slower. > >... > >Now, I don't have DiskPerf numbers, but aside from that, is there any >remaining confusion? There was a Stupid Mistake in the 2090, which >causes it to be abysmally slow in certain well-understood cases. >No other DMA controllers make this mistake. Because non-DMA >controllers do a stop-and-wait operation, filling a bucket and then >pouring it somewhere, the problem doesn't tend to arise. a DMA >device is like a hose, and putting more water in one end than >you take out at the other is a problem requiring a bit more care. >But it's a lot faster than using a bucket. The DMA controllers you speak of also have a bucket, but it is in the disk drive. The non-DMA controllers could become DMA controllers by the addition of a DMA machine to copy from the controller buffer to the system memory. Cost and design complexity are the difficulties. >-- > -Colin -- ,------------------------------------------------------------------. | Mark Waggoner (408) 721-6306 waggoner@dtg.nsc.com | `------------------------------------------------------------------'