[comp.sys.amiga.tech] Questions about Harddisks

GORRIEDE@UREGINA1.BITNET (Dennis Robert Gorrie) (11/30/89)

I can understand you predicament well.  The advertizing realy does confuse
the whole matter of DMA vs Non-DMA.  Supposed benchmarks confuse the matter
even more.

No member of this newsgroup has currently submitted enough information about
this matter for anyone to gain a clear understanding of it.  There has not
been an overwhelming about of factual info about it at any rate.

I've posted this twice before, but maybe this time there will be some response.

How about a step by step explanation of DMA vs NON-DMA transfer, showing:
1)Under what conditions is there contention for chip RAM, for both DMA and
  Non-DMA.
2)Why is there contention for chip RAM? Why is access to chip RAM neccessary?
3)What are the solutions used in a209x, hardframe, and cltd controlers
  to get around the problem of chip ram contention?
  What are the resulting speeds (using same drive/mountlist)?
4) where are the device registers and buffers from DMA and NON-DMA devices?
   Where excactly are they located?  What restricts them from being accessed
   on every cycle?  What limits do they have in the size and speed of their
   transfers?


+-----------------------------------------------------------------------+
|Dennis Gorrie                 'Chain-Saw Tag...                        |
|GORRIEDE AT UREGINA1.BITNET                    Try It, You'll Like It!'|
+-----------------------------------------------------------------------+

ccplumb@rose.waterloo.edu (Colin Plumb) (12/01/89)

In article <8911300328.AA02859@jade.berkeley.edu> GORRIEDE@UREGINA1.BITNET (Dennis Robert Gorrie) writes:
>No member of this newsgroup has currently submitted enough information about
>this matter for anyone to gain a clear understanding of it.  There has not
>been an overwhelming about of factual info about it at any rate.

Okay, I thought it had been hashed to death already, but here's another
go-around.

>How about a step by step explanation of DMA vs NON-DMA transfer, showing:
>1)Under what conditions is there contention for chip RAM, for both DMA and
>  Non-DMA.

When accessing Fast RAM, the 68000 has the lowest priority for the bus;
any other master (like a DMA device) will get it before the 68000.

On Chip RAM, the blitter takes priority over any of the above (modulo the
BLITTER_NASTY bit which I won't go into here), the copper can lock out
the blitter, and video/audio/disk/refresh DMA can lock out any of the
above.

Non-DMA means 68000.  DMA means alternate Fast memory bus master.  They
are both locked out by exactly the same things in Chip memory.

>2)Why is there contention for chip RAM? Why is access to chip RAM neccessary?

Because 4 bitplanes high-res needs to read bytes out of Chip memory as fast
as the memory can deliver them.  The memory is so busy it has no time for
the processor or other DMA.  Access to chip RAM is necessary if the device
driver gets a request to read a block from the disk into a buffer which
happens to be located in Chip RAM.  On my machine, for example, a 512K
A1000, there is no non-chip RAM.  Even if you have Fast RAM, it's legal to
ask the device driver to use a buffer located in Chip.

>3)What are the solutions used in a209x, hardframe, and cltd controlers
>  to get around the problem of chip ram contention?
>  What are the resulting speeds (using same drive/mountlist)?

Adequate buffering.  If you get bytes delivered from the hard drive
faster than you can put them somewhere, you have no choice but to
drop them.  This affects all controllers, DMA, non-DMA, or some
other technique that relies on feeding the bytes LSD and having them
fly to the right place.

The problem was with the 2090 (and 2090A) *only*, and was caused by what
I think Dave Haynie will admit was Bad Design.  The SCSI bus lets the
controller say "not so fast!" to the drive, but they provided a small
buffer (64 bytes, I think) on board and *assumed* the controller could
write them to memory fast enough: they did not connect the "please slow
down" signal to anything.  If, for some reason, the DMA engine could
not empty the buffer as fast as it was being filled, Bad Things would
happen.

Non-DMA controllers usually have a large (512 bytes and up and up and up)
buffer on board, which the controller reads into (by something very
much like DMA, just simpler than having to go through the system bus),
and it never asks the drive for more than a buffer's worth of data at
a time.  When the trnasfer is finished, the controller interrupts the
68000, which runs some code to copy the bytes to their final destination.
If that happens to be in Chip memory with 4 bitplanes high-res running,
it will run Very Slowly, but it will get there.  Because the controller
has exclusive access to the buffer, you can be sure it will always be
available for more data.  This makes for a simpler design.

The HardFrame, A590, 2091, and probably all othe DMA controllers use
the obvious solution: if their on-chip buffers start backing up, they
tell the drive to slow down.  With SCSI drives, you can do this.
Bytes will still be copied at the fastest speed possible, it just isn't
the drive which is the bottleneck.

I do not have speeds available, sorry.  If the 2090 overflows, it
retries ad nauseam until it finally, more by luck than good intentions,
succeeds.  (I am exaggerating the case a little.) This results in an
amazing slowdown.  Except for this stupid problem, DMA controllers are
pretty much universally faster than non-DMA.

>4) where are the device registers and buffers from DMA and NON-DMA devices?
>   Where excactly are they located?  What restricts them from being accessed
>   on every cycle?  What limits do they have in the size and speed of their
>   transfers?

They are located somewhere in memory determined by the AutoConfig
process.  The spaces open are 200000-9FFFFF (usually used for memory)
and E80000-EFFFFF (usually used for devices).  So the controller
usually has an address somewhere in the Exxxxx range, but where exactly
is determined only when you boot your Amiga.

Note that DMA controllers probably don't have buffers visible in this
space: it's their job to write out the buffers, nobody needs to come
in and read them.  As for what stops the DMA device from writing every
cycle: nothing, really.  Obviously, it's possible for a designer to
build some horribly slow thing, but these days 7.14 MHz (/2 for memory
cycles) is loafing along and I could build something which could go
at full speed.  And, of course, if they don't have data to write, writing
is a pretty silly idea.  Thus "the disk drive" is a possible answer.

For non-DMA devices, the processor has to spend as many cycles writing the
bytes to the desired memory location as it does reading them from the
controller.  And since it's executing instructions while it's doing
this, we need some extra cycles to fetch the instructions.

So the operation of copying bytes to memory goes at less than half the
speed.  And don't forget the context-switching overhead, to start the
data-copying code, etc.

Non-DMA drives are limited by the size of their buffers.  You can't make
a transfer larger then one buffer without running into the buffer-overflow
problem that the 2090 has problems with.  It can be solved, by adding
hardware, but simplicity is the reason for avoiding DMA, so it's
unlilely to happen.

DMA devices can transfer essentially unlimited amounts of data in one
burst, because their buffer is main memory.  Perhaps someone uses a
16-bit DMA chip which has 64K limits, but that's still lots bigger
than the buffer a non-DMA device is likely to have.

The fundamental limits are various bandwidth restrictions.  The drive
is obvious, but aside from that, the DMA device needs to be able to
access the memory being written to.  You can't do better than that.
A non-DMA device needs to read the instructions to do the copy,
(probably Chip memory if it's autoboot, as device drivers are loaded
before Fast memory is configured), the controller's buffer (a Fast
memory access), and write to the destination (the same time as the
DMA controller).  So it *can't* go any faster than a DMA controller.

Now, there is one thing people pointed out: given that the drive is
the bottleneck (an assertion the 2090 proved wrong), even a processor
copy can be faster than the drive can deliver data.  With some cleverness
(double-buffering, probably) so the drive is kept busy during the
processor copy, the *throughput* can be the same as a DMA controller,
although the latency (time of request to time it's finished) will
always be greater.

Now, I don't have DiskPerf numbers, but aside from that, is there any
remaining confusion?  There was a Stupid Mistake in the 2090, which
causes it to be abysmally slow in certain well-understood cases.
No other DMA controllers make this mistake.  Because non-DMA
controllers do a stop-and-wait operation, filling a bucket and then
pouring it somewhere, the problem doesn't tend to arise.  a DMA
device is like a hose, and putting more water in one end than
you take out at the other is a problem requiring a bit more care.
But it's a lot faster than using a bucket.
-- 
	-Colin

waggoner@dtg.nsc.com (Mark Waggoner) (12/02/89)

In article <18870@watdragon.waterloo.edu> ccplumb@rose.waterloo.edu (Colin Plumb) writes:
>
>Non-DMA drives are limited by the size of their buffers.  You can't make
>a transfer larger then one buffer without running into the buffer-overflow
>problem that the 2090 has problems with.  It can be solved, by adding
>hardware, but simplicity is the reason for avoiding DMA, so it's
>unlikely to happen.

So are DMA disk devices.  It's just that the buffer is in the disk
drive instead of on the controller board.  You can make transfers
larger than the buffer size if you break them up into blocks, they way
a DMA controller essentially does.

>
>DMA devices can transfer essentially unlimited amounts of data in one
>burst, because their buffer is main memory.  Perhaps someone uses a
>16-bit DMA chip which has 64K limits, but that's still lots bigger
>than the buffer a non-DMA device is likely to have.

A DMA device's REAL buffer is the buffer in the SCSI drive.  The reason 
the SCSI drive can "slow down" is that it isn't feeding you the data
directly from the disk.   If you talk about an ST-506 type interface,
for instance, there is no way a DMA controller can solve the overflow
problem unless it contains an on board buffer for at least a full
sector.  This means you have to have either a very large FIFO or a
double transfer: from the disk interface to a buffer memory on the
disk controller board and then from the buffer memory to main memory.
The second transfer could be done by either DMA or by the Amiga CPU.
Letting the CPU do it is cheaper, but slower.

>
>...
>
>Now, I don't have DiskPerf numbers, but aside from that, is there any
>remaining confusion?  There was a Stupid Mistake in the 2090, which
>causes it to be abysmally slow in certain well-understood cases.
>No other DMA controllers make this mistake.  Because non-DMA
>controllers do a stop-and-wait operation, filling a bucket and then
>pouring it somewhere, the problem doesn't tend to arise.  a DMA
>device is like a hose, and putting more water in one end than
>you take out at the other is a problem requiring a bit more care.
>But it's a lot faster than using a bucket.

The DMA controllers you speak of also have a bucket, but it is in the disk
drive.  The non-DMA controllers could become DMA controllers by the
addition of a DMA machine to copy from the controller buffer to the
system memory.  Cost and design complexity are the difficulties.

>-- 
>	-Colin


-- 
 ,------------------------------------------------------------------.
|  Mark Waggoner   (408) 721-6306           waggoner@dtg.nsc.com     |
 `------------------------------------------------------------------'