[comp.arch] Disk rotational speed vs. striping vs. parallel heads

earl@mips.UUCP (Earl Killian) (08/16/87)

In article <653@ima.ISC.COM>, johnl@ima.ISC.COM (John R. Levine) writes:

> In article <5557@prls.UUCP> weaver@prls.UUCP (Michael Gordon Weaver) writes:
> > If higher transfer rates were required by a large part of the
> > market, the rotation speed of the drives could be increased from
> > the current typical 3600 rpm to say 10,000 rpm. This would be
> > expensive, but much cheaper than using a vacuum.

> If your computer had the I/O speed, there are all sorts of
> straightforward tricks to speed up the disk data rate. For example,
> you could run all of the disk heads at once so that a disk cylinder
> rather than appearing as N tracks each M bytes long looks like one
> track N*M bytes long, but with the same time to read or write a
> track.

I was wondering when this would come up.  Such drives are made.  For
example, I believe I've seen an ad for a parallel head version of the
nifty little Fujitsu 8in drives.  Unfortunately, it was a lot more
expensive than the regular version.  I can't imagine why, other than
it is a specialty item not produced in high volume.  Instead of
transfering at 2.4 Mbyte/s, it probably transfers at something like
14.4 Mbyte/s (I'm guessing that it has 6 surfaces).  I don't know what
they did to the SMD interface to make it work at these speeds.

The problem is that with today's software, this may not make much
difference.  Suppose you do disk i/o in 8K byte chunks.  At 2.4
Mbyte/s it takes 3.4ms to transfer your chunk.  The average seek time
is 20ms, so shrinking the transfer time to .56ms saves you very
little.  To take advantage of these drives, you need software (either
in the os, or in the disk controller) that reads a cylinder at a time.
Since some disk controllers do read and cache a full track at a time,
this is not implausible.  Doing caching in the controller also limits
the need for high bandwidth i/o buses, since only the controller would
see the 14.4 Mbyte/s; transfers of individual blocks from its cache
would go at the maximum i/o bus rate, which is probably less than 14.4
Mbyte/s with today's i/o buses.

To get back the issue that started this all, it seems to me that there
are two reasons to do disk striping.  (1) you're already using
parallel head transfers and you need still more bandwidth for a single
task.  (2) you need more bandwidth for a single task and parallel head
drives are too expensive.  (3) you can't use parallel head transfers
because your i/o bus can't hack the bandwidth.  Striping can increase
the disk thruput within certain limits without increasing the peak
bandwidth requirement for randomly scattered (i.e. non-contiguous)
files.

To see (3), consider reading a file one (a) one drive, and (b) many
drives.

(a) The thruput is blocksize / (seek + rotate + blocksize / trate).
The peak transfer rate is trate.  If blocksize / trate << seek +
rotate, then this is seek time limited.

(b) You simultaneously seek on all the drives, and one-at-time transfer
blocks.  The peak transfer rate is still trate, because you serialize
the actual transfers.  The thruput is
	N * blocksize / (seek + rotate + queue + blocksize / trate)
where queue is the time waiting to serialize.  If N * blocksize / trate
<< seek + rotate, then the queuing delay is small and can be ignored,
so you've achieved N times the thruput.  (My queuing theory text
doesn't have a formula for a M/D/1 system with finite population, and
I'm not going to grind through the Markov model to figure out, so I'll
leave it as "small" if <<.)

This is of course the effect of overlapped seeks in a time sharing
system made to work for a single task.  However, you'd probably have
been better off simply using contiguous allocation instead of
striping, to get blocksize / trate > seek + rotate.  For example,
going from 8K byte to 64K byte chunks raises your overall thruput from
350 Kbyte/s to 1385 Kbyte/s, which is almost a 4x improvement.  With
64K byte transfers the disk is now spending 58% of the time actually
doing i/o instead of just 15%.  Maybe fragmentation makes striping
attractive compared to contiguous allocation?

If you allow simultaneous transfers in addition to simultaneous seeks,
then striping's peak bandwith requirement is no longer the transfer
rate of a single disk, but N times that (but the queuing delay goes
away).  No advantage over a parallel head disk then.

Anyway, I guess this is a long-winded way of saying striping looks
interesting once you've gotten yourself a parallel head disk, done
contiguous disk allocation, and found you still don't have enough
bandwidth for a single task.  And you've got enough i/o channel
bandwidth to support it all.  And I guess that's exactly where some of
the small memory supercomputers are these days.

brian@casemo.UUCP (Brian Cuthie ) (08/18/87)

In article <600@gumby.UUCP>, earl@mips.UUCP (Earl Killian) writes:
> ... I believe I've seen an ad for a parallel head version of the
> nifty little Fujitsu 8in drives.  Unfortunately, it was a lot more
> expensive than the regular version.  I can't imagine why, other than
> it is a specialty item not produced in high volume.  Instead of
> transfering at 2.4 Mbyte/s, it probably transfers at something like
> 14.4 Mbyte/s (I'm guessing that it has 6 surfaces).  I don't know what
> they did to the SMD interface to make it work at these speeds.
> 

More than likely it has six times the drive electronics.  That is it
requires five more read/write head amplifiers and such.  In fact I would
bet good money that such a drive has six seperate read/write data lines
between the SMD controller (not part of the drive) and the drive.  Thus
the use of such a beast requires a very expensive disk controller as well.

> The problem is that with today's software, this may not make much
> difference.  Suppose you do disk i/o in 8K byte chunks.  At 2.4
> Mbyte/s it takes 3.4ms to transfer your chunk.  The average seek time
> is 20ms, so shrinking the transfer time to .56ms saves you very
> little.  To take advantage of these drives, you need software (either

This is very untrue in a multi-drive environment.  Most systems have
several disk drives under the direction of one disk controller.  This
allows the overlapping of seeks so that while a drive is seeking, data
may be transfered to a drive that has already completed it's seek.  This
method of disk throughput enhancement is quite common in any but the
smallest, cheapest systems.  The faster a drive can transfer data, the
more drives you can put on one controller with no speed degredation.

> in the os, or in the disk controller) that reads a cylinder at a time.
> Since some disk controllers do read and cache a full track at a time,
> this is not implausible.  Doing caching in the controller also limits
> the need for high bandwidth i/o buses, since only the controller would
> see the 14.4 Mbyte/s; transfers of individual blocks from its cache
> would go at the maximum i/o bus rate, which is probably less than 14.4
> Mbyte/s with today's i/o buses.

This buys you NOTHING.  The real problem is i/o bottleneck.   The faster
data can be transfered for task A, the less time task B has to wait for
his data to be transfered.

> 
> To get back the issue that started this all, it seems to me that there
> are two reasons to do disk striping.  (1) you're already using
> parallel head transfers and you need still more bandwidth for a single
> task.  (2) you need more bandwidth for a single task and parallel head
> drives are too expensive.  (3) you can't use parallel head transfers
> because your i/o bus can't hack the bandwidth.  Striping can increase
> the disk thruput within certain limits without increasing the peak
> bandwidth requirement for randomly scattered (i.e. non-contiguous)
> files.
> 

In all the cases I've seen disk striping was not used to increase mbytes/sec
throughput at all.  Rather, striping reduces the number of seeks necessary
to access a given chunk of info.  In effect disk striping gives you a disk
that looks to have #drives*TRACKS/CYL tracks per cylinder.  This is the
*real* advantage to striping.  This technique may be totally implimented
in software (in the driver).  Any form a parallel data transfer requires
duplicate read/write electronics for each drive.  Few systems can afford
such excotic hardware.


Cheers,
-Brian

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Brian Cuthie
CASE Communications
Columbia, Md. 21046
(301) 290 - 7443

UUCP:	...seismo!mimsy!aplcen!casemo!brian

earl@mips.UUCP (Earl Killian) (08/20/87)

In article <219@casemo.UUCP>, brian@casemo.UUCP (Brian Cuthie) writes:

> More than likely it has six times the drive electronics.  That is it
> requires five more read/write head amplifiers and such.  In fact I would
> bet good money that such a drive has six seperate read/write data lines
> between the SMD controller (not part of the drive) and the drive.  Thus
> the use of such a beast requires a very expensive disk controller as well.

I knew you needed multiple head amplifiers, but I didn't think head
amplifiers were the big cost item in a disk drive (I am a little naive
on these matters).  Are they really costly?  And six separate
read/write data lines to the controller don't sound like big ticket
items either.  (The extra rams required in the controller for
bandwidth and buffering probably add more to the cost.)  Yes, the
controller is probably expensive, but I suspect that's not inherent,
but rather an artifact of not being the volume product.  With the rate
that cpu performance is increasing, it may soon be that such
drives/controllers become the norm.  Let's hope so.  Can someone tell
us where the $ go in drives and controllers?

> > The problem is that with today's [transfer sizes, parallel head
> > disks] may not make much difference.
> This is very untrue in a multi-drive environment.  Most systems have
> several disk drives under the direction of one disk controller.  This
> allows the overlapping of seeks so that while a drive is seeking...

Yes, but the original comment was in the context of speeding up i/o
for a single application, which I thought to be the motivation behind
striping.  Is this incorrect?  Is striping supposed to speed up
multiprogramming i/o in some way?  The message you replied to even
said the point of striping appears to be to garner some of the
well-known multiprogramming advantages you refer to above for a single
application.

> In all the cases I've seen disk striping was not used to increase mbytes/sec
> throughput at all.  Rather, striping reduces the number of seeks necessary
> to access a given chunk of info.  In effect disk striping gives you a disk
> that looks to have #drives*TRACKS/CYL tracks per cylinder.  This is the
> *real* advantage to striping.

I think we're talking about two different mbyte/s here.  You seem to
be talking about mbyte/s during transfer.  I was talking about
effective data rates, which includes the seek time.  The whole idea to
reducing the number of seeks and thus the average seek time is to make
disk i/o faster, is it not?  I.e. to increase the NET mbyte/s rate
seen by the application.

brian@casemo.UUCP (Brian Cuthie ) (08/24/87)

In article <611@gumby.UUCP>, earl@mips.UUCP (Earl Killian) writes:
> In article <219@casemo.UUCP>, brian@casemo.UUCP (Brian Cuthie) writes:
> 
> > More than likely it has six times the drive electronics.  That is it
...	
> I knew you needed multiple head amplifiers, but I didn't think head
> amplifiers were the big cost item in a disk drive (I am a little naive
> on these matters).  Are they really costly?  And six separate

Yes, they are costly.  Also, because of the way data is written to most
drives (MFM) the data must be processed by a seperate SERDES (SERializer
DESerializer) for *each* channel in the controler.  This is a substantial
part of any disk controler and costs $$$ to duplicate.

> > > The problem is that with today's [transfer sizes, parallel head
> > > disks] may not make much difference.
> > This is very untrue in a multi-drive environment.  Most systems have
> > several disk drives under the direction of one disk controller.  This
> > allows the overlapping of seeks so that while a drive is seeking...
> 
> Yes, but the original comment was in the context of speeding up i/o
> for a single application, which I thought to be the motivation behind


I didn't realize we were talking about a single threaded machine.  I guess
it makes less of a difference when one task has to wait for the disks anyway.

> 
> > In all the cases I've seen disk striping was not used to increase mbytes/sec
> > throughput at all.  Rather, striping reduces the number of seeks necessary
> > to access a given chunk of info.  In effect disk striping gives you a disk
> > that looks to have #drives*TRACKS/CYL tracks per cylinder.  This is the
> > *real* advantage to striping.
> 
> I think we're talking about two different mbyte/s here.  You seem to
> be talking about mbyte/s during transfer.  I was talking about
> effective data rates, which includes the seek time.  The whole idea to
> reducing the number of seeks and thus the average seek time is to make
> disk i/o faster, is it not?  I.e. to increase the NET mbyte/s rate
> seen by the application.


Sorry again.  I mis-read ths context of the statement.  I spent several
years designing disk controlers and just assumed we were talking about 
bus bandwidth.  I oftern forget that most people are actually interesed
in throughput. Thus two different Mbytes/s.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Brian Cuthie
CASE Communications
Columbia, Md. 21046

(301) 290 - 7443

UUCP:	...seismo!mimsy!aplcen!casemo!brian

lamaster@pioneer.arpa (Hugh LaMaster) (08/26/87)

In article <222@casemo.UUCP> brian@casemo.UUCP (Brian Cuthie ) writes:

>I didn't realize we were talking about a single threaded machine.  I guess
>it makes less of a difference when one task has to wait for the disks anyway.
>

>
>Sorry again.  I mis-read ths context of the statement.  I spent several
>years designing disk controlers and just assumed we were talking about 
>bus bandwidth.  I oftern forget that most people are actually interesed
>in throughput. Thus two different Mbytes/s.

Big Iron users (Cray, Cyber 205, etc.) need speed.  You can multiprogram
up to a point (4 to 6 tasks in memory per processor, for example), but
beyond that point disk "throughput" doesn't help, because Big Iron jobs
usually contain some very large memory tasks that won't fit in memory with
each other.  The only way to make these faster is speed.  Now, as was pointed
out, there are two kinds of speed:  speed on random accesses, and speed on
sequential accesses.  Fortunately, a lot of big jobs can be structured to
reduce the amount of randomness (and number of seeks).  So, in the big
machine world, the CDC 819 (four parallel heads, 36 Mbit/sec peak speed) was
popular.  The newer drives (CDC, Ibis) run at around 100 Mbit/sec peak speeds.
These drives are sometimes used WITH striping also, either with or without the
help of the operating system.  Because even these speeds are not enough, the
Cray X-MP comes with an solid state "disk" which runs at up to 10Gigabit/sec
rates.  It is also much, much, much faster at random accesses than a magnetic
disk.  An effective aggregate transfer rate of 200MBit/sec is a reasonable
requirement for running at job at 100MFLOPS.  Some newer machines are expected
to run at actual job speeds of 1 GFLOPS.





  Hugh LaMaster, m/s 233-9,  UUCP {seismo,topaz,lll-crg,ucbvax}!
  NASA Ames Research Center                ames!pioneer!lamaster
  Moffett Field, CA 94035    ARPA lamaster@ames-pioneer.arpa
  Phone:  (415)694-6117      ARPA lamaster@pioneer.arc.nasa.gov


                 "IBM will have it soon"


(Disclaimer: "All opinions solely the author's responsibility")