daveb@rtech.rtech.com (Dave Brower) (10/03/89)
some people wrote: >>Cray DD-40 disk drives can support >10MB/sec through the operating >>system (at least COS; I assume the case is also true for UNICOS). > >This brings up a point: in what processing regimes does total >sustained disk tranfer rate be the performance-limiting factor? > In many tp/database/business applications, CPU is fast enough that disk bandwidth will soon be the limiting factor for many applications. Some airline reservation systems are said to have huge farms of disk where only one or two tracks are used on the whole pack to avoid seeks, for instance. A 1000 tp/s database benchmark might easily require 10MB/sec i/o throughput. Maybe Cray should change markets... -dB -- "Did you know that 'gullible' isn't in the dictionary?" {amdahl, cbosgd, mtxinu, ptsfa, sun}!rtech!daveb daveb@rtech.uucp
philf@xymox.metaphor.com (Phil Fernandez) (10/06/89)
In article <3752@rtech.rtech.com> daveb@rtech.UUCP (Dave Brower) writes: > ... Some >airline reservation systems are said to have huge farms of disk where >only one or two tracks are used on the whole pack to avoid seeks, for >instance. No, I don't think so. I did a consulting job for United Airlines' Apollo system a couple of years ago, looking for architectures to break the 1000t/s limit. We looked at distributing transactions to many processors and disks, etc., etc., but nothing quite so profligate at using only a couple of tracks (or cyls) on a 1GB disk pack in order to minimize seeks. On the *big iron* that UAL and other reservations systems use, the operating systems (TPFII and MVS/ESA) implement very sophisticated disk management algorithms, and in particular, implement elevator seeking. With elevator seeking, disk I/O's in the queue are ordered in such a way to minimize seek latency between I/O operations. In an I/O- intensive tp application with I/O's spread across multiple disk packs, a good elevator scheduling scheme is all that's needed to get the appropriate disk I/O bandwidth. Makes for a good story, tho! phil +-----------------------------+----------------------------------------------+ | Phil Fernandez | philf@metaphor.com | | | ...!{apple|decwrl}!metaphor!philf | | Metaphor Computer Systems |"Does the body rule the mind, or does the mind| | Mountain View, CA | rule the body? I dunno..." - Morrissey | +-----------------------------+----------------------------------------------+
news@rtech.rtech.com (USENET News System) (10/10/89)
In article <829@metaphor.Metaphor.COM> philf@xymox.metaphor.com (Phil Fernandez) writes: >In article <3752@rtech.rtech.com> daveb@rtech.UUCP (Dave Brower) writes: >> ... Some >>airline reservation systems are said to have huge farms of disk where >>only one or two tracks are used on the whole pack to avoid seeks, for >>instance. >With elevator seeking, disk I/O's in the queue are ordered in such a >way to minimize seek latency between I/O operations. A number of techniques which we used on a VAX-based TP exec called the Transaction Management eXecutive-32 (TMX-32) were: - per disk seek ordering - as stated above - which disk seek ordering - with mirrored disks, choose the disk with the heads closest the part of the disk you're gonna read. (sometimes just flip-flopping between the two is enough.) - coalesced transfers - for instance, if you need to read track N, N+3 and N+7 its sometimes faster to read tracks N to N+7 and sort out the transfers in memeory. - single-read-per-spindle-per-transaction - split up heavily accessed files over N spindles, mapping logical record M to disk (M mod N), physical record (N/M), such that on the average only one disk seek needs to be made per transaction (in parallel, of course). This is worthwhile when the transactions are well defined. This task became considerably difficult when DEC introduced the HSC-50 super-smart, caching disk controller for the VAXcluster and the RA-style disks: 1) it was impossible to know the PHISICAL location of a disk block, due to dynamic, transparent bad-block revectoring and lack of on-line information about the disk geometry. We placed the files carfully on the disk so that they started on a cylinder boundary, adjacent to other files, and assumed what they were "one dimensional." 2) Some of the optimizations were done in the HSC itself so we didnt do them on HSC disks. (seek ordering and command ordering) 3) HSC volume shadowing made the optimizations to our home-grown shadowing obsolete. We kept our shadowing to use in non-HSC enviroments, like uVAXes and locally connected disks, and because it was per-file based, not per volume. Using these techniques, I ran the million-customer TP benchmark @76 TPS on a vax 8600 (~4-mips). I dont remember the $/TPS (of course), but it might have been pretty high because there were a LOT of disk drives. We might have eeked out a few more TPS if we had physical control over the placement of the disk blocks, but probably not more than a few. I also felt that I never knew what the disk was 'really doing' because so much was hidden in the HSC; being the computer programmer that I am, I wanted to know where each head was at each milli-second:->. (The 76TPS bottleneck was the mirrored journal disk, which, although it was written sequentially, it was still nescessary to write to it for the close of each transaction. The next step would have been to allow multiple journal files, but since the runner-up was about 30TPS, we never got around to it :->.) As an aside, for you HSC fans building this kind of stuff, it is possible that large write I/Os to an HSC-served disk will be broken up into multiple physical I/O operations to the disk. This means that if you are just checking headers and trailers for transaction checkpoint consistency, you may have bogus stuff in the middle with perfectly valid header and trailer information if the HSC crashed during the I/O. - bob +-------------------------+------------------------------+--------------------+ ! Bob Pasker ! Relational Technology ! ! ! pasker@rtech.com ! 1080 Marina Villiage Parkway ! INGRES/Net ! ! <use this address> ! Alameda, California 94501 ! ! ! <replies will fail> ! (415) 748-2434 ! ! +-------------------------+------------------------------+--------------------+