was@hp-lsd.COS.HP.COM (Bill Stubblebine) (07/31/90)
I have a question for any Ampro Little Board Z80+ BIOS hackers still left out there. My configuration: Ampro LB Z80+ (w/built-in SCSI interface) Adaptec ACB4000 (not 4000A) SCSI hard disk controller Seagate ST-125 20 MB 40 ms hard disk drive 3M MCD-403 40 MB QIC SCSI tape drive NZ-COM/Z-System I've used this system for several years. Until recently, I've never had reason to complain about the speed of the LB+ BIOS SCSI routines talking to my hard disk because most programs and editor text load within human tolerance limits, i.e., < ~1-3 seconds. Recently, I purchased the 3M MCD-403 SCSI tape drive to support backups. It was a good deal for $129 surplus at Halted Electronics in Sunnyvale. The tape drive works great, and the Ampro BIOS provides a convenient virtual machine for accessing the SCSI bus. Within a very short time I was able to exercise the tape drive's basic features via SCSI. As I started transferring real data between the hard disk and the tape drive, I discovered that I could not source or sink data from the hard disk fast enough to keep the tape drive streaming. (Streaming means keeping the tape drive motor continuously running during data transfers.) Without maintaining streaming operation, the tape transport stops, repositions the tape and starts up again to read or write each physical block on the tape. Because this extra positioning activity will probably reduce the life of the tape transport, it looks like I need to speed up the hard disk accesses slightly. A few more details on the tape drive. The tape drive reads and writes 8k byte physical blocks. A single SCSI command can transfer multiple 8k blocks to or from the tape, but never less than one block. To keep the tape drive streaming the host needs to request a read, write or seek operation from the tape drive within 250ms of a prior read, write or seek operation, otherwise the tape drive motor shuts down automatically. A few more details on the disk drive and controller. The Adaptec ACB4000 controller formats the ST-125 using 18 512-byte physical sectors (or logical blocks as the controller manual refers to them) per physical track. Thus, one physical track on the disk contains 72 logical (128-byte) CP/M sectors, with four 128-byte CP/M sectors per each 512-byte SCSI logical block. The Ampro BIOS computes CP/M sector and track numbers based on 64 128-byte sectors per track, and converts the CP/M track/sector numbers into SCSI Logical Block Addresses (LBAs) as part of processing BIOS read, write and seek requests. I mention this so that in the following discussion when I refer to logical sectors, you will know that I am not talking about CP/M sectors and tracks, but logical 512-byte SCSI logical blocks. The SCSI logical blocks are physically positioned in relation to each other on the track based on the interleave factor specified to the Adaptec controller at format time. The Adaptec controller supports interleave factors from 1:1 to 9:1, i.e., the fastest interleave (1:1) is when sequential logical sectors occupy adjacent physical locations on the track, while the slowest interleave (9:1) has eight physical SCSI sectors between each logical SCSI sector. The ST-125 spins at 3600 RPM = 60 RPS => 16.67 ms/ revolution. Thus, the drive has a basic latency of 16.67/2=8.33 ms, i.e., the average time you need to wait before the desired physical block arrives under the head, assuming, of course that the head is positioned over the desired track. I've spent some time characterizing the hard disk operation. To my surprise, even with the ST-125 formatted at the slowest interleave (9:1), the BIOS cannot transfer the contents of a 512-byte SCSI logical sector in time to read the next SCSI logical sector on the same track nine sectors away. In fact, careful measurement revealed that after reading a SCSI sector, at 9:1 interleave the the BIOS **just misses** the next available logical sector, and has to wait for the next revolution. For example, after reading physical sector 1, the nearest physical sector that the BIOS can read on the same track during the same rotation is physical sector 11 as illustrated below: One track: <--------------------- 16.67 ms --------------------> Physical: 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 Logical: 01 03 05 07 09 11 13 15 17 02 04 06 08 10 12 14 16 18 One sector: <---------- 8.33 ms -------> I did these experiments using bona fide BIOS calls just as an application program would. I transferred each 512-byte SCSI block to memory using four sequential CP/M sector requests starting with the CP/M sector that mapped onto the first of the four CP/M sectors in the SCSI physical block. (Believe me, that was an interesting exercise in integer programming.) I timed the SCSI block transfer starting from just before the first CP/M sector request till just after the fourth sequential CP/M request. These were BIOS sector reads and writes - no BDOS overhead was involved. I realize that reading 4 CP/M sectors per SCSI sector involves overhead in the BIOS deblocking code. I estimated the overhead of the deblocking code by measuring the time to transfer a 128 byte CP/M sector I knew was already buffered in the BIOS deblocker. This took a little less than 1 ms per CP/M sector - not fast, but also nowhere near the 8ms+ required for the entire 512 SCSI block. The results indicate that the BIOS is taking > ~4ms to read a measly 512 bytes per physical SCSI sector. Overall, the net throughput of the Ampro SCSI HD interface seems lower than it should be. The best it can do is four 128 byte CP/M sectors per 16.6 ms disk revolution, or 512 bytes/16.6ms. Thus, even with a 1:1 interleave so that logical sector 2 is right next to logical sector 1, transferring 8k bytes requires: (8192/512 sectors)*16.67 ms/sector = 16*16.67 = 266.72 ms This equates to only 30,713 bytes per second net throughput from the hard disk - not too impressive in my opinion. Add to this any randomness in a file's disk allocation involving head seek time, and I'm out of the ball park for streaming. If I could speed up the processing of a SCSI logical sector by one or two milliseconds, I could double the throughput at an interleave of 9:1, because the BIOS could transfer two SCSI logical sectors per revolution instead of 1 SCSI sector per revolution as it does now. If you're still with me, I wonder if anyone has managed to get more than 30.7K bytes per second net throughput to/from the hard disk out of a configuration similar to mine. I've read the Ampro BIOS source and the Adaptec technical manual several times without finding a clue to speeding things up further. What's the trick? Bill Stubblebine Hewlett-Packard Logic Systems Div. was@hp-lsd.hp.com (Internet) (719) 590-5568
dbraun@cadev6.intel.com (Doug Braun ~) (08/07/90)
In article <8190004@hp-lsd.COS.HP.COM> was@hp-lsd.COS.HP.COM (Bill Stubblebine) writes: >I have a question for any Ampro Little Board Z80+ BIOS hackers still left >out there. > >My configuration: > > Ampro LB Z80+ (w/built-in SCSI interface) > Adaptec ACB4000 (not 4000A) SCSI hard disk controller > Seagate ST-125 20 MB 40 ms hard disk drive > 3M MCD-403 40 MB QIC SCSI tape drive > NZ-COM/Z-System > . . . > >If you're still with me, I wonder if anyone has managed to get more than >30.7K bytes per second net throughput to/from the hard disk out of a >configuration similar to mine. I've read the Ampro BIOS source and the >Adaptec technical manual several times without finding a clue to speeding >things up further. What's the trick? > Since you are already directly accessing the SCSI bus to run the tape drive, you should do the same to access the disk. You could then read at least 32K at a time from the disk. In my UZI system, I swapped 32K bytes at a time. My hardware was a 4MHz Z80, a custom-built (simple) SCSI host adapter that used a Z80-DMA chip, a Shugart SCSI to ST-506 controller, and a hard disk with 8 heads. I was able to use a 2:1 interleave. With this setup, it takes about 4 revolutions to read 32K, which is ~68 ms. Allowing 2 ms for overhead, this gives you thruput of over 450K bytes/sec. The DMA chip allows me to read data fast enough for this. If you have to use programmed I/O, you will not do as well, had have to use a bigger interleave. With all these SCSI disk controllers, if you do many small reads instead of one large one, the overhead time will dominate the transfer time. I noticed on my CP/M BIOS, which uses 1K transfers (2 disk sectors at a time), that the performance is essentially independent of the disk interleave. With your tape setup, if you read 8k from disk, and write it to tape, you might keep the drive streaming msot of the time. If not, you could at least read 32K, and write 4 tape blocks per start/stop. Beware. If you always let almost 250 ms go by between writing tape blocks, you may have very large interrecord gaps, which will reduce your tape capacity. I have dealt with most of these issues while interfacing a Memtec drive to my system. Doug Braun Intel Corp CAD 408 765-4279 / decwrl \ | hplabs | -| oliveb |- !intelca!mipos3!cadev4!dbraun | amd | \ qantel / or: dbraun@scdt.intel.com
wilker@descartes.math.purdue.edu (Clarence Wilkerson) (08/07/90)
Could you implement "scsi device to scsi device" transfer without having to go through the CPU? This is possible under some circumstances ( e.g. two disks on same controller ), but I'm not sure of the generality.
josef@nixpbe.UUCP (Moellers) (08/10/90)
In <12835@mentor.cc.purdue.edu> wilker@descartes.math.purdue.edu (Clarence Wilkerson) writes: >Could you implement "scsi device to scsi device" transfer without having >to go through >the CPU? This is possible under some circumstances ( e.g. two disks on >same controller ), >but I'm not sure of the generality. From what I know about SCSI, I'd say it depends. (This is standard answer #75534) SCSI distinguishes between initiator and target. The initiator selects a target and then the target requests from the initiator whatever information is needed (command block, data, message) or sends to the initiator whatever information it holds (data, status, message). Usually, hosts are initiators and devices are targets. So, in order to do a "device to device" transfer, You'll have to have one device that can act as an initiator, communication with another device that continues to behave as a target. Some tape drives can do this. You just tell'em to read n blocks of data from target x and then leave it to do it's task. If You were to look at the SCSI bus, You'd see the tape drive selecting the disk, the drive requesting command blocks from the tape, then sending data to the tape, etc. Probably one or the other controller can do a disk-to-disk-copy locally, but that would be very controller specific. -- | Josef Moellers | c/o Nixdorf Computer AG | | USA: mollers.pad@nixbur.uucp | Abt. PXD-S14 | | !USA: mollers.pad@nixpbe.uucp | Heinz-Nixdorf-Ring | | Phone: (+49) 5251 104662 | D-4790 Paderborn |
was@hp-lsd.COS.HP.COM (Bill Stubblebine) (08/21/90)
Several weeks ago, I asked for advice on how to improve throughput for bulk data transfers from my SCSI hard disk to my SCSI QIC tape drive. For those who missed the original article, my configuration is: Ampro LB Z80+ (w/built-in SCSI interface) Adaptec ACB4000 (not 4000A) SCSI hard disk controller Seagate ST-125 20 MB 40 ms hard disk drive 3M MCD-403 40 MB QIC SCSI tape drive NZ-COM/Z-System The 3M MCD-403 SCSI tape drive was added recently to support backups. As I started transferring data between the hard disk and the tape drive, I discovered that although the SCSI disk performance was adequate for interactive and disk-to-disk operations, the hard disk could not source or sink data fast enough to keep the tape drive streaming during transfers. Before I posted my original request, I had experimented with several disk transfer strategies to try to increase throughput. All of my tests employed standard BIOS calls that transfers 128 bytes per BIOS call, based on Ampro's BIOS deblocking algorithm that reads or writes 512-byte SCSI logical blocks to the hard disk. My experiments indicated that BIOS calls could never achieve sufficient throughput to keep the cartridge tape drive streaming, no matter what the interleave factor is on the tape drive or on the disk drive. With all the stopping, repositioning and restarting of the cartridge drive, the overall throughput from disk to tape was under 3K bytes per second, plus the agony of hearing the drive stop and start for each 8K SCSI tape block transferred. Having run out of ideas, I asked the net for advice, and was gratified by the quantity and quality of the responses I received. To make a long story short, I have increased the overall throughput of disk to tape transfers from under 3K bytes per second to 12.7K bytes per second, allowing 10 megabytes to be backed up in about 13 minutes unattended. This is bliss compared to the endless attended floppy disk backups I am accustomed to. To assist anyone who may be facing similar system integration problems, I decided to keep a log of my experiments, which is summarized below. The quadrupling of throughput from 3K bytes/sec to 12.7K bytes/sec resulted from three categories of improvements to my configuration: 1. Read or write as many bytes as possible in each SCSI command, both from the SCSI hard disk and the SCSI tape drive. 2. Use the Z80 high-speed INIR/OTIR I/O instructions instead of software controlled byte-by-byte handshaking to talk to the 5380 SCSI interface chip on the Ampro LB+. 3. Once #1 and #2 are implemented, select optimal interleave factors on both the hard disk and the tape drive to maximize overall throughput. The biggest improvement came from #1. Reading 8k from the disk in one SCSI command more than doubled the overall throughput compared to normal BIOS calls, providing streaming operation in the tape drive for tape interleave factors of 6:1 or greater. HD interleave: 9:1 HD transfer mode: byte-by-byte HD transfer size: 8K x 1 Tape interleave: 6:1 Tape transfer mode: byte-by-byte Tape transfer size: 8K x 1 Net throughput: 6631 Kbytes/sec Next, I modified the disk read routine to read 8K bytes in two 4K SCSI commands, thereby simulating processing two distinct 4K CP/M disk allocation groups. The results were the same as for a single 8K SCSI operation, i.e., the tape keeps streaming. This experimental result suggests that the disk-to-tape backup program should bypass the BIOS altogether, and process CP/M allocation groups directly from the CP/M disk directory entries, converting the (4K-byte) CP/M allocation group number into a SCSI logical block number, then read all 4K of the allocation block from the disk in one SCSI command. This should be a robust strategy, because (in the Ampro system) HD space cannot be allocated in chunks of less than 4K bytes = 1 CP/M allocation group. HD interleave: 9:1 HD transfer mode: byte-by-byte HD transfer size: 4K x 2 Tape interleave: 6:1 Tape transfer mode: byte-by-byte Tape transfer size: 8K x 1 Net throughput: 6631 Kbytes/sec Next, I changed the SCSI handshakng from byte-by-byte to INIR/OTIR burst mode for both the hard disk and the MCD tape drive. This increased the burst transfer rate from 15us per byte to 5.25us per byte for both devices. Using a scope to monitor the SCSI bus, I then experimented with bulk SCSI transfers from hard disk at various disk interleave factors, obtaining the following surprising results: Hard Disk Time to transfer Interleave 8192 bytes HD->memory ---------- ---------------- 2:1 165ms 3:1 80ms 4:1 95ms 5:1 110ms 6:1 120ms 7:1 140ms 8:1 120ms 9:1 140ms At an interleave of 3:1, the fastest for bulk SCSI transfers, the hard disk supports a burst transfer rate of 5.25us per byte = 190.4K bytes/sec to the Ampro host, and a sustained data transfer rate of 102.4K bytes/sec, not bad for a lowly Z-80. Note: The previous and new interleave factors of 2:1 and 3:1, respectively, have virtually identical throughput for 512-byte BIOS transfers to and from disk. However, for multi-block transfers like the ones I intend to use for tape backups, an interleave of 3:1 produces a huge (i.e., >double) increase in disk throughput compared to an interleave factor of 2:1. With the hard disk formatted with interleave factor 3:1 and with burst mode data transfers in effect to both the hard disk and the tape drive, I then experimented with various tape drive interleave factors. The result is that I now can keep the tape drive streaming at a tape interleave factor of 4:1, which is much better than I had originally hoped. The overall disk to tape throughput increased to 9716 bytes/sec in this configuration. HD interleave: 3:1 HD transfer mode: burst HD transfer size: 4K x 2 Tape interleave: 4:1 Tape transfer mode: burst Tape transfer size: 8K x 1 Net throughput: 9716 Reading data from the hard disk in two 4K byte chunks takes about 80ms. A scope trace of SCSI bus activity indicated that a disk rotation was being lost between reading sequential 4K chunks, even when the two chunks were (logically) adjacent to one another on the same disk track, as is usually the case in large sequential files. When I repeated the experiments reading 8K from the disk in one SCSI request, the time required to fill the memory buffer from the disk dropped to around 60ms. In this configuration, the tape remained streaming at a tape interleave of 3:1, with overall throughput from the disk to the tape increasing to 12787 bytes/sec. HD interleave: 3:1 HD transfer mode: burst HD transfer size: 8K x 1 Tape interleave: 3:1 Tape transfer mode: burst Tape transfer size: 8K x 1 Net throughput: 12787 Kbytes/sec Getting writes to work to the tape was quite an adventure. The same trick that worked effectively for reads from the tape, namely setting the burst mode for 256-byte transfers, caused writes to the tape to hang in mid SCSI phase. The curious thing was that the multi-block writes worked fine when I stepped through them under manual control in the ZSID debugger, but hung when running normally. Figuring there was some race condition between the disk reads and the tape writes, I fiddled around with delays everywhere to no avail. Because the multi-block transfers worked OK with byte-by-byte handshaking, I finally concluded that 256 must be the wrong number of data bytes to transfer to the tape controller in a burst during the SCSI data-out phase. But what was the right number? I set the burst mode to 16 bytes per burst, which cut the byte-by-byte overhead by a factor of 16. This worked fine, allowing writes to the tape to stream at a tape interleave factor of 3:1, the same as for reads. Note: I still cannot explain why write transfers to the tape drive hang with 256 byte bursts and not with 16 byte bursts. Reads and writes both transfer 8192 bytes from or to the tape controller. This should loop the OTIR instruction exactly 32 times for 256 byte bursts and exactly 512 times for 16-byte bursts. Moreover, the transfer rate in either case is only one third of the tape drive controller's 500Kb/sec rated SCSI burst throughput. Maybe the discrepancy in the number of bytes transfered is on a 16-byte boundary, but I find this hard to believe. My 16-byte burst solution works, but maybe I'll just RTFM one more time...) None of my experiments thus far involved frequent head seeks on the hard disk, which are bound to add some overhead to the tape transfers, and could cause loss of streaming. To allow some overhead for head seeks, and still keep the tape streaming, I relaxed the tape interleave factor from 3:1 to 4:1. All in all, I'm quite happy with the results. I know that I can do 12.7K bytes/sec at 3:1 tape interleave, and nearly 10K bytes/sec at 4:1 tape interleave. Depending on the tape interleave I finally settle on, I have either tripled or quadrupled the overall disk-to-tape throughput compared to where I started, and learned a little about my disk drive, my tape drive and the SCSI protocol in the process. Now it's on to building a primitive file system to manage my backups on the cartridge tape. Since I envision the tape as just an archive of large backups (.LBR or tar files), without alot of random access going on, I'm inclined toward using a simple directory structure similar to the one for Novosielski .LBR files, but based on SCSI addressing instead of CP/M tracks and sectors. I'm flexible though, and I'd welcome any suggestions anyone might have regarding a file system for the cartridge tape. Lastly, a small personal note: Over the years I've had to put up with no end of criticism from associates regarding my ongoing interest in Z80 computers. Still, I'm continually amazed at my ability to continually push the envelope of this friendly little OS and CPU. One of my other hobbies is sailing. I get endless pleasure from trimming the sails, reading the wind, pushing the last 1% out of the system. I get the same feeling when talking to one of those so-called DOS "power users" as I do when some muscle boat goes tearing past me on the water. I remark to myself "very impressive - but what do you do after the first 10 minutes when the novelty's worn off?" Thanks again for all the help. It's nice to know there is still a group that shares some of my opinions. Perhaps I can return the favor one day. Bill Stubblebine Hewlett-Packard Logic Systems Div. Colorado Springs, CO was@hp-lsd.hp.com (Internet) (719) 590-5568