[comp.os.cpm] How to speed up Ampro LB+ SCSI?

was@hp-lsd.COS.HP.COM (Bill Stubblebine) (07/31/90)

I have a question for any Ampro Little Board Z80+ BIOS hackers still left
out there.

My configuration:

	Ampro LB Z80+ (w/built-in SCSI interface)
	Adaptec ACB4000 (not 4000A) SCSI hard disk controller
	Seagate ST-125 20 MB 40 ms hard disk drive
	3M MCD-403 40 MB QIC SCSI tape drive 
	NZ-COM/Z-System

I've used this system for several years.  Until recently, I've never had
reason to complain about the speed of the LB+ BIOS SCSI routines talking to
my hard disk because most programs and editor text load within human
tolerance limits, i.e., < ~1-3 seconds.

Recently, I purchased the 3M MCD-403 SCSI tape drive to support backups.
It was a good deal for $129 surplus at Halted Electronics in Sunnyvale.
The tape drive works great, and the Ampro BIOS provides a convenient
virtual machine for accessing the SCSI bus.  Within a very short time I was
able to exercise the tape drive's basic features via SCSI.

As I started transferring real data between the hard disk and the tape
drive, I discovered that I could not source or sink data from the hard disk
fast enough to keep the tape drive streaming.  (Streaming means keeping the
tape drive motor continuously running during data transfers.)  Without
maintaining streaming operation, the tape transport stops, repositions the
tape and starts up again to read or write each physical block on the tape.
Because this extra positioning activity will probably reduce the life of
the tape transport, it looks like I need to speed up the hard disk accesses
slightly.

A few more details on the tape drive.  The tape drive reads and writes 8k
byte physical blocks.  A single SCSI command can transfer multiple 8k
blocks to or from the tape, but never less than one block.  To keep the
tape drive streaming the host needs to request a read, write or seek
operation from the tape drive within 250ms of a prior read, write or seek
operation, otherwise the tape drive motor shuts down automatically.

A few more details on the disk drive and controller.  The Adaptec ACB4000
controller formats the ST-125 using 18 512-byte physical sectors (or
logical blocks as the controller manual refers to them) per physical track.
Thus, one physical track on the disk contains 72 logical (128-byte) CP/M
sectors, with four 128-byte CP/M sectors per each 512-byte SCSI logical
block.  The Ampro BIOS computes CP/M sector and track numbers based on 64
128-byte sectors per track, and converts the CP/M track/sector numbers into
SCSI Logical Block Addresses (LBAs) as part of processing BIOS read, write
and seek requests.  I mention this so that in the following discussion when
I refer to logical sectors, you will know that I am not talking about CP/M
sectors and tracks, but logical 512-byte SCSI logical blocks.

The SCSI logical blocks are physically positioned in relation to each other
on the track based on the interleave factor specified to the Adaptec
controller at format time.  The Adaptec controller supports interleave
factors from 1:1 to 9:1, i.e., the fastest interleave (1:1) is when
sequential logical sectors occupy adjacent physical locations on the track,
while the slowest interleave (9:1) has eight physical SCSI sectors between
each logical SCSI sector.

The ST-125 spins at 3600 RPM = 60 RPS => 16.67 ms/ revolution.  Thus, the
drive has a basic latency of 16.67/2=8.33 ms, i.e., the average time you
need to wait before the desired physical block arrives under the head,
assuming, of course that the head is positioned over the desired track.

I've spent some time characterizing the hard disk operation.  To my
surprise, even with the ST-125 formatted at the slowest interleave (9:1),
the BIOS cannot transfer the contents of a 512-byte SCSI logical sector in
time to read the next SCSI logical sector on the same track nine sectors
away.  In fact, careful measurement revealed that after reading a SCSI
sector, at 9:1 interleave the the BIOS **just misses** the next available
logical sector, and has to wait for the next revolution.

For example, after reading physical sector 1, the nearest physical sector
that the BIOS can read on the same track during the same rotation is
physical sector 11 as illustrated below:

    One track:   <--------------------- 16.67 ms -------------------->
    Physical:    01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18
    Logical:     01 03 05 07 09 11 13 15 17 02 04 06 08 10 12 14 16 18
    One sector:  <---------- 8.33 ms ------->

I did these experiments using bona fide BIOS calls just as an application
program would.  I transferred each 512-byte SCSI block to memory using four
sequential CP/M sector requests starting with the CP/M sector that mapped
onto the first of the four CP/M sectors in the SCSI physical block.
(Believe me, that was an interesting exercise in integer programming.)  I
timed the SCSI block transfer starting from just before the first CP/M
sector request till just after the fourth sequential CP/M request.  These
were BIOS sector reads and writes - no BDOS overhead was involved.

I realize that reading 4 CP/M sectors per SCSI sector involves overhead in
the BIOS deblocking code.  I estimated the overhead of the deblocking code
by measuring the time to transfer a 128 byte CP/M sector I knew was already
buffered in the BIOS deblocker.  This took a little less than 1 ms per CP/M
sector - not fast, but also nowhere near the 8ms+ required for the entire
512 SCSI block.  The results indicate that the BIOS is taking > ~4ms to
read a measly 512 bytes per physical SCSI sector.

Overall, the net throughput of the Ampro SCSI HD interface seems lower than
it should be.  The best it can do is four 128 byte CP/M sectors per 16.6 ms
disk revolution, or 512 bytes/16.6ms.  Thus, even with a 1:1 interleave so
that logical sector 2 is right next to logical sector 1, transferring 8k
bytes requires:

	(8192/512 sectors)*16.67 ms/sector = 16*16.67 = 266.72 ms

	This equates to only 30,713 bytes per second net throughput from
	the hard disk - not too impressive in my opinion.

Add to this any randomness in a file's disk allocation involving head seek
time, and I'm out of the ball park for streaming.

If I could speed up the processing of a SCSI logical sector by one or two
milliseconds, I could double the throughput at an interleave of 9:1,
because the BIOS could transfer two SCSI logical sectors per revolution
instead of 1 SCSI sector per revolution as it does now.

If you're still with me, I wonder if anyone has managed to get more than
30.7K bytes per second net throughput to/from the hard disk out of a
configuration similar to mine.  I've read the Ampro BIOS source and the
Adaptec technical manual several times without finding a clue to speeding
things up further.  What's the trick?

                                Bill Stubblebine
                                Hewlett-Packard Logic Systems Div.
                                was@hp-lsd.hp.com  (Internet)
                                (719) 590-5568

dbraun@cadev6.intel.com (Doug Braun ~) (08/07/90)

In article <8190004@hp-lsd.COS.HP.COM> was@hp-lsd.COS.HP.COM (Bill Stubblebine) writes:
>I have a question for any Ampro Little Board Z80+ BIOS hackers still left
>out there.
>
>My configuration:
>
>	Ampro LB Z80+ (w/built-in SCSI interface)
>	Adaptec ACB4000 (not 4000A) SCSI hard disk controller
>	Seagate ST-125 20 MB 40 ms hard disk drive
>	3M MCD-403 40 MB QIC SCSI tape drive 
>	NZ-COM/Z-System
>
.
.
.
>
>If you're still with me, I wonder if anyone has managed to get more than
>30.7K bytes per second net throughput to/from the hard disk out of a
>configuration similar to mine.  I've read the Ampro BIOS source and the
>Adaptec technical manual several times without finding a clue to speeding
>things up further.  What's the trick?
>

Since you are already directly accessing the SCSI bus to run the tape drive,
you should do the same to access the disk.  You could then read at least
32K at a time from the disk.  In my UZI system, I swapped 32K bytes at a time.
My hardware was a 4MHz Z80, a custom-built (simple) SCSI host adapter
that used a Z80-DMA chip, a Shugart SCSI to ST-506 controller, and a
hard disk with 8 heads.  I was able to use a 2:1 interleave.  With
this setup, it takes about 4 revolutions to read 32K, which is ~68 ms.
Allowing 2 ms for overhead, this gives you thruput of over 450K bytes/sec.
The DMA chip allows me to read data fast enough for this.  If you have to use
programmed I/O, you will not do as well, had have to use a bigger interleave.

With all these SCSI disk controllers, if you do many small reads instead
of one large one, the overhead time will dominate the transfer time.
I noticed on my CP/M BIOS, which uses 1K transfers (2 disk sectors at a time),
that the performance is essentially independent of the disk interleave.

With your tape setup, if you read 8k from disk, and write it to tape,
you might keep the drive streaming msot of the time.  If not, you could
at least read 32K, and write 4 tape blocks per start/stop.
Beware. If you always let almost 250 ms go by between writing tape blocks,
you may have very large interrecord gaps, which will reduce your tape capacity.

I have dealt with most of these issues while interfacing a Memtec drive
to my system.

Doug Braun				Intel Corp CAD
					408 765-4279

 / decwrl \
 | hplabs |
-| oliveb |- !intelca!mipos3!cadev4!dbraun
 | amd    |
 \ qantel /

 or:

 dbraun@scdt.intel.com

wilker@descartes.math.purdue.edu (Clarence Wilkerson) (08/07/90)

Could you implement "scsi device to scsi device" transfer without having
to go through
the CPU? This is possible under some circumstances ( e.g. two disks on
same controller ),
but I'm not sure of the generality.

josef@nixpbe.UUCP (Moellers) (08/10/90)

In <12835@mentor.cc.purdue.edu> wilker@descartes.math.purdue.edu (Clarence Wilkerson) writes:

>Could you implement "scsi device to scsi device" transfer without having
>to go through
>the CPU? This is possible under some circumstances ( e.g. two disks on
>same controller ),
>but I'm not sure of the generality.

From what I know about SCSI, I'd say it depends. (This is standard
answer #75534)

SCSI distinguishes between initiator and target.
The initiator selects a target and then the target requests from the
initiator whatever information is needed (command block, data, message)
or sends to the initiator whatever information it holds (data, status,
message).
Usually, hosts are initiators and devices are targets.
So, in order to do a "device to device" transfer, You'll have to have
one device that can act as an initiator, communication with another
device that continues to behave as a target.
Some tape drives can do this. You just tell'em to read n blocks of data
from target x and then leave it to do it's task. If You were to look at
the SCSI bus, You'd see the tape drive selecting the disk, the drive
requesting command blocks from the tape, then sending data to the tape,
etc.
Probably one or the other controller can do a disk-to-disk-copy locally,
but that would be very controller specific.

--
| Josef Moellers		|	c/o Nixdorf Computer AG	|
|  USA: mollers.pad@nixbur.uucp	|	Abt. PXD-S14		|
| !USA: mollers.pad@nixpbe.uucp	|	Heinz-Nixdorf-Ring	|
| Phone: (+49) 5251 104662	|	D-4790 Paderborn	|

was@hp-lsd.COS.HP.COM (Bill Stubblebine) (08/21/90)

Several weeks ago, I asked for advice on how to improve throughput for bulk
data transfers from my SCSI hard disk to my SCSI QIC tape drive.  For those
who missed the original article, my configuration is:

	Ampro LB Z80+ (w/built-in SCSI interface)
	Adaptec ACB4000 (not 4000A) SCSI hard disk controller
	Seagate ST-125 20 MB 40 ms hard disk drive
	3M MCD-403 40 MB QIC SCSI tape drive 
	NZ-COM/Z-System

The 3M MCD-403 SCSI tape drive was added recently to support backups.  As I
started transferring data between the hard disk and the tape drive, I
discovered that although the SCSI disk performance was adequate for
interactive and disk-to-disk operations, the hard disk could not source or
sink data fast enough to keep the tape drive streaming during transfers.

Before I posted my original request, I had experimented with several disk
transfer strategies to try to increase throughput.  All of my tests
employed standard BIOS calls that transfers 128 bytes per BIOS call, based
on Ampro's BIOS deblocking algorithm that reads or writes 512-byte SCSI
logical blocks to the hard disk.  My experiments indicated that BIOS calls
could never achieve sufficient throughput to keep the cartridge tape drive
streaming, no matter what the interleave factor is on the tape drive or on
the disk drive.  With all the stopping, repositioning and restarting of the
cartridge drive, the overall throughput from disk to tape was under 3K
bytes per second, plus the agony of hearing the drive stop and start for
each 8K SCSI tape block transferred.

Having run out of ideas, I asked the net for advice, and was gratified by
the quantity and quality of the responses I received.  To make a long story
short, I have increased the overall throughput of disk to tape transfers
from under 3K bytes per second to 12.7K bytes per second, allowing 10
megabytes to be backed up in about 13 minutes unattended.  This is bliss
compared to the endless attended floppy disk backups I am accustomed to.

To assist anyone who may be facing similar system integration problems, I
decided to keep a log of my experiments, which is summarized below.  The
quadrupling of throughput from 3K bytes/sec to 12.7K bytes/sec resulted
from three categories of improvements to my configuration:

1. Read or write as many bytes as possible in each SCSI command, both from
   the SCSI hard disk and the SCSI tape drive.

2. Use the Z80 high-speed INIR/OTIR I/O instructions instead of software
   controlled byte-by-byte handshaking to talk to the 5380 SCSI interface
   chip on the Ampro LB+.

3. Once #1 and #2 are implemented, select optimal interleave factors on
   both the hard disk and the tape drive to maximize overall throughput.


The biggest improvement came from #1.  Reading 8k from the disk in one SCSI
command more than doubled the overall throughput compared to normal BIOS
calls, providing streaming operation in the tape drive for tape interleave
factors of 6:1 or greater.

		HD interleave:		9:1
		HD transfer mode:	byte-by-byte
		HD transfer size:	8K x 1
		Tape interleave:	6:1
		Tape transfer mode:	byte-by-byte
		Tape transfer size:	8K x 1
		Net throughput:		6631 Kbytes/sec

Next, I modified the disk read routine to read 8K bytes in two 4K SCSI
commands, thereby simulating processing two distinct 4K CP/M disk
allocation groups.  The results were the same as for a single 8K SCSI
operation, i.e., the tape keeps streaming.  This experimental result
suggests that the disk-to-tape backup program should bypass the BIOS
altogether, and process CP/M allocation groups directly from the CP/M disk
directory entries, converting the (4K-byte) CP/M allocation group number
into a SCSI logical block number, then read all 4K of the allocation block
from the disk in one SCSI command.  This should be a robust strategy,
because (in the Ampro system) HD space cannot be allocated in chunks of
less than 4K bytes = 1 CP/M allocation group.

		HD interleave:		9:1
		HD transfer mode:	byte-by-byte
		HD transfer size:	4K x 2
		Tape interleave:	6:1
		Tape transfer mode:	byte-by-byte
		Tape transfer size:	8K x 1
		Net throughput:		6631 Kbytes/sec

Next, I changed the SCSI handshakng from byte-by-byte to INIR/OTIR burst
mode for both the hard disk and the MCD tape drive.  This increased the
burst transfer rate from 15us per byte to 5.25us per byte for both devices.

Using a scope to monitor the SCSI bus, I then experimented with bulk SCSI
transfers from hard disk at various disk interleave factors, obtaining the
following surprising results:

		Hard Disk	Time to transfer 
		Interleave	8192 bytes HD->memory
		----------	----------------
		   2:1		    165ms
		   3:1		     80ms
		   4:1		     95ms
		   5:1		    110ms
		   6:1		    120ms
		   7:1		    140ms
		   8:1		    120ms
		   9:1		    140ms

At an interleave of 3:1, the fastest for bulk SCSI transfers, the hard disk
supports a burst transfer rate of 5.25us per byte = 190.4K bytes/sec to the
Ampro host, and a sustained data transfer rate of 102.4K bytes/sec, not bad
for a lowly Z-80.

Note: The previous and new interleave factors of 2:1 and 3:1, respectively,
      have virtually identical throughput for 512-byte BIOS transfers to
      and from disk.  However, for multi-block transfers like the ones I
      intend to use for tape backups, an interleave of 3:1 produces a huge
      (i.e., >double) increase in disk throughput compared to an interleave
      factor of 2:1.

With the hard disk formatted with interleave factor 3:1 and with burst mode
data transfers in effect to both the hard disk and the tape drive, I then
experimented with various tape drive interleave factors.  The result is
that I now can keep the tape drive streaming at a tape interleave factor of
4:1, which is much better than I had originally hoped.  The overall disk to
tape throughput increased to 9716 bytes/sec in this configuration.

		HD interleave:		3:1
		HD transfer mode:	burst
		HD transfer size:	4K x 2
		Tape interleave:	4:1
		Tape transfer mode:	burst
		Tape transfer size:	8K x 1
		Net throughput:		9716

Reading data from the hard disk in two 4K byte chunks takes about 80ms.  A
scope trace of SCSI bus activity indicated that a disk rotation was being
lost between reading sequential 4K chunks, even when the two chunks were
(logically) adjacent to one another on the same disk track, as is usually
the case in large sequential files.  When I repeated the experiments
reading 8K from the disk in one SCSI request, the time required to fill the
memory buffer from the disk dropped to around 60ms.  In this configuration,
the tape remained streaming at a tape interleave of 3:1, with overall
throughput from the disk to the tape increasing to 12787 bytes/sec.

		HD interleave:		3:1
		HD transfer mode:	burst
		HD transfer size:	8K x 1
		Tape interleave:	3:1
		Tape transfer mode:	burst
		Tape transfer size:	8K x 1
		Net throughput:		12787 Kbytes/sec

Getting writes to work to the tape was quite an adventure.  The same trick
that worked effectively for reads from the tape, namely setting the burst
mode for 256-byte transfers, caused writes to the tape to hang in mid SCSI
phase.  The curious thing was that the multi-block writes worked fine when
I stepped through them under manual control in the ZSID debugger, but hung
when running normally.  Figuring there was some race condition between the
disk reads and the tape writes, I fiddled around with delays everywhere to
no avail.  Because the multi-block transfers worked OK with byte-by-byte
handshaking, I finally concluded that 256 must be the wrong number of data
bytes to transfer to the tape controller in a burst during the SCSI
data-out phase.  But what was the right number?  I set the burst mode to 16
bytes per burst, which cut the byte-by-byte overhead by a factor of 16.
This worked fine, allowing writes to the tape to stream at a tape
interleave factor of 3:1, the same as for reads.

Note:  I still cannot explain why write transfers to the tape drive hang
       with 256 byte bursts and not with 16 byte bursts.  Reads and writes
       both transfer 8192 bytes from or to the tape controller.  This
       should loop the OTIR instruction exactly 32 times for 256 byte
       bursts and exactly 512 times for 16-byte bursts.  Moreover, the
       transfer rate in either case is only one third of the tape drive
       controller's 500Kb/sec rated SCSI burst throughput.  Maybe the
       discrepancy in the number of bytes transfered is on a 16-byte
       boundary, but I find this hard to believe.  My 16-byte burst
       solution works, but maybe I'll just RTFM one more time...)

None of my experiments thus far involved frequent head seeks on the hard
disk, which are bound to add some overhead to the tape transfers, and could
cause loss of streaming.  To allow some overhead for head seeks, and still
keep the tape streaming, I relaxed the tape interleave factor from 3:1 to
4:1.

All in all, I'm quite happy with the results.  I know that I can do 12.7K
bytes/sec at 3:1 tape interleave, and nearly 10K bytes/sec at 4:1 tape
interleave.  Depending on the tape interleave I finally settle on, I have
either tripled or quadrupled the overall disk-to-tape throughput compared
to where I started, and learned a little about my disk drive, my tape drive
and the SCSI protocol in the process.

Now it's on to building a primitive file system to manage my backups on the
cartridge tape.  Since I envision the tape as just an archive of large
backups (.LBR or tar files), without alot of random access going on, I'm
inclined toward using a simple directory structure similar to the one for
Novosielski .LBR files, but based on SCSI addressing instead of CP/M tracks
and sectors.  I'm flexible though, and I'd welcome any suggestions anyone
might have regarding a file system for the cartridge tape.

Lastly, a small personal note:  Over the years I've had to put up with no
end of criticism from associates regarding my ongoing interest in Z80
computers.  Still, I'm continually amazed at my ability to continually push
the envelope of this friendly little OS and CPU.

One of my other hobbies is sailing.  I get endless pleasure from trimming
the sails, reading the wind, pushing the last 1% out of the system.  I get
the same feeling when talking to one of those so-called DOS "power users"
as I do when some muscle boat goes tearing past me on the water.  I remark
to myself "very impressive - but what do you do after the first 10 minutes
when the novelty's worn off?"

Thanks again for all the help.  It's nice to know there is still a group
that shares some of my opinions.  Perhaps I can return the favor one day.

                                Bill Stubblebine
                                Hewlett-Packard Logic Systems Div.
                                Colorado Springs, CO
                                was@hp-lsd.hp.com  (Internet)
                                (719) 590-5568