rayan@utegc.UUCP (09/07/87)
With the faster CPUs we've been seeing lately, we have noticed that too
many processes spend their wall clock time in disk waits. The disk i/o
subsystem indeed seems to be one of the top 2 bottlenecks (along with
memory starvation) for our systems. We've been looking for something better
than our trusty Xylogics 451's, which has been the default controller
around here until now. To our knowledge, there are three companies in the
market with high-performance controllers that are/will be used in Suns,
namely Interphase, Ciprico, and Xylogics. The contending products are:
Interphase: 4200 "Cheetah"
Ciprico: 3200 "Rimfire"
Xylogics: 75* Sun default
At this time, I can release my evaluation of two of these boards.
What follows are my impressions of the products, along with a few numbers
for people who like to see the mundane stuff.
The boards compared are the Cheetah and the Rimfire. In summary, the
Cheetah is very fast on flat out reads (at 2MB/sec read, it approaches
the theoretical limit of 2.4MB/Sec for this disk, especially considering
sector control data). The Cheetah firmware fell flat on its face when asked
to do I *and* O interleaved, and degraded to about Xy451 performance levels.
In comparison, the Rimfire did 1.8MB/sec on a flat out read, but did much
better on mixed I/O. Based on this gross error in the Cheetah, and the
results of the benchmarks, I'd give the current overall performance edge to
the Ciprico board.
Here are my recollections of the products:
=====
Cheetah:
No problem with installation once I figured out what the bits meant...
This controller has a control byte which is apparently stored in the
sector header of each sector upon a format. The bits include flags like
Runt Sector Enable, Spare Sector Enable (sector slipping), Cache
Enable, STatus Change, Retry, etc. The status change bit is supposed to
send reports when the disk goes offline/online, unfortunately it
doesn't work too well... almost any kind of disk access with this bit
turned on would crash the driver we got for the Cheetah. I figured out
the appropriate bits (RSE,SSE,CE,...) and started formatting the disk.
Disk maintenance is through a diag utility (modified Sun diag). While
the format was going on I read some more in the manual and discovered
that the board does sector slipping and bad track forwarding, but not
bad sector forwarding. If sector slipping is enabled, bad track
forwarding is done after the spare sector on the track has been used
up. Still, I think this is rather wasteful of the spare cylinders at
the end of the disk. In my limited experience, bad spots on the disk
don't usually render much of a track unusable. The format and 5 verify
passes finally finished after 75 minutes (c.f. 6 hours for an
xy451/SuperEagle). During format, the thing does bad track forwarding
if I recall right. 2 bad spots, one around cylinder 8 and one somewhere
in the middle of the h partition. The forwarded tracks didn't
measurably affect the benchmarks, although degraded performance was
observable from diag/read. The disk was formatted using the same
parameters as the other board (5 sector skew, gap1/gap2 = 16/19).
Boy this controller screams! If you're into doing image copies of your
disk, this thing is Real Quick. I was therefore rather surprised when I
tried some mixed I/O and saw performance no better than an Xy451. We
called up Interphase to make sure our configuration and setup was
correct. It seems that if one gets 2 MB/sec reads, one's configuration
is correct. The problem seems to be silly firmware that flushes the
read cache on every write. The engineer we spoke to said they were
working actively with Sun on new firmware that would fix this ("improve
performance 30%"), unfortunately at this time the new PROMs aren't yet
available. Incidentally, we got a note from Interphase mentioning an
upgrade program for people with old firmware, so they do keep their
customers in mind. Most benchmarks reinforced this initial impression
(great at reading, lousy at reading *and* writing). One more thing I
noted was that performance seems to degrade exessively when one
accesses different partitions in e.g. a disk-to-disk copy or with
parallel activity on two filesystems. Since it happens in this latter
case too, I doubt it has much to do with the cache algorithm syndrome.
Addendum: we got the new firmware PROMs but couldn't get the disk
formatted properly. The Interphase engineer mentioned something about
the Runt sector possibly being too small due to a bug in the
experimental PROMs we got. Eventually we decided things (the entire
evaluation period for all boards) had dragged on long enough and we
sent the board back. I have heard they are now shipping 'final
production' 4200's. Even if the performance on some of the tests was
upped 30%, it wouldn't make much overall difference.
=====
Rimfire:
This one was real easy to set up the first time around, because one of
Ciprico's globetrotting engineers flew up to install the board and make
sure everything was ok. That first time, things went by in a bit of a
blur, but I then had to duplicate the installation on my own later.
Downtime is minimal when installing a Rimfire, because all disk
maintenance is done online with a special utility program. This is a
full-screen maintenance program from which one can do all the usual
diag stuff, and in addition has some cache twiddling and reporting
functions. To set up the disk, I chose a disk template from a list of a
score or so, that included the popular Fujitsu, NEC, CDC and Toshiba
disks, and some miscellany. I then modify the format parameters
appropriately (interleave on, headskew=5, cylskew=21, hgskew=0,
recovery=0, idpre=16, datapre=19) to values recommended by the Ciprico
engineer. The idpre and datapre parameters are apparently the
synchronization gap sizes (a.k.a. gap1/gap2). The parameters not
present on the Cheetah are cylinder skew, 'hg' skew, and the recovery
options on the Rimfire are slightly better. This setup information is
then passed to the driver, and I could start formatting. Nine (9) (sic!)
minutes afterwards, the format is done. The five verify passes take
another 30-odd minutes, but who is complaining... When the Ciprico
fellow did this in front of me, everything worked fine. When I did it
by myself, the verify went through 7 cylinders and then happily stopped
as if it was all done. I tracked it down to a compiler bug (removing a
'register' in a declaration fixed things) that showed up with the pre-release
'rfutil' we got. There were a few other problems with that version, but a
later and more mature program we received worked beautifully with no bugs
and no surprises. As regards handling bad spots, sector slipping, and bad
track and bad sector mapping are all supported.
Caching algorithm or parameters are specified by the driver at disk
access time, and the driver can differentiate between three kinds of
accesses; raw disk I/O, file system I/O in 4k block size multiple, file
system I/O not in 4k bs multiple. The caching scheme, read-ahead size
and retry options, are specified separately for each kind of access.
Read-ahead when enabled can be up to 244 sectors. The control bits for
caching are: search cache (SEA), save read data in cache (CRD), read
ahead priority (RAP), save write data in cache (CWT), sort reads (SRD),
sort writes (SWT), read ahead across head (XHD), and read ahead across
cylinder (XCY). The boot-time parameters are hardwired into the kernel
in a table in a header file. The default parameters (unless otherwise
mentioned) are SEA|XHD for filesystem I/O, and SEA|XHD|XCY for raw I/O,
both with 100 sector read ahead. These parameters are defaulted from an
array in the kernel, and can be modified online, and cache statistics
examined (the best we got was 99.97% hit rate on a long raw disk read).
The Rimfire is slightly slower than the Cheetah in terms of maximum
throughput, coming in at 1.8MB/sec max. read speed. It has much better
behavior when doing mixed reads/writes (or rather, it has 'expected'
behavior whereas the Cheetah had broken firmware). It degrades linearly
when kept busy on different parts of the disk, in my tests. Essentially,
it is consistent with expectations about the overall improvements relative
to an Xy451.
A passtime with this controller is fiddling the cache control bits to
see what effect it would have on things.
It turned out that several of the cache parameters were quite expensive
if you didn't actually need them. I found readahead a good thing on
file i/o, and pretty much irrelevant on long raw i/o. Since fsck is
short raw i/o, I ended up with the parameters 301-10/101-100 as pretty
much the optimal ones for my purposes (301-10 == XHD,XCY,SEA, 10
sectors readahead raw i/o; 101-10 == XHD,SEA, 100 sectors readahead
file i/o). There is of course overhead in the firmware depending on
the extra work you want the controller to do by setting the cache
options. One of my tuning experiment was with more-or-less-random-access
fsck where time taken would change from 3/4th to twice of the Cheetah.
Note that the parameters that are optimal for fsck may be lousy for dd,
and vice versa (adaptive algorithms would be nice). Turning on
fancy options, for example cross-cylinder readahead or sorts for file i/o,
may be a mistake. After speaking with some engineers from various companies,
I get the impression the Ciprico board is the only one around that does NOT
flush the read cache on any write operation. To do this they do some data
structure manipulation in firmware, which is not easy in a hardware cache
design. I don't design these things for a living, but invalidating the
cache on every write operation is a *bad* thing to do.
=====
Comparison:
The impression I get is that the Ciprico is about 1.9-2.0 times an
Xy451 on most things, across the board. The Cheetah is 2.1-2.2 times an
Xy451 on a certain set of activities, and degrades to 1.0-1.1 times an
Xy451 on heavy read/write I/O mixes. The software with the Ciprico is
nicer and gives finer control than the Interphase (having a Cache
Enable bit with nothing else, doesn't give much choice). However, one
uses the setup program once, when the disk is installed, and forget
about it. Online maintenance is nice of course. Improving the
Interphase by 30% or so as promised by new firmware, is not likely to
change the picture too much apart from making the Interphase more
palatable. The Rimfire is comparatively "involved" to set up the disk
and tune the cache properly, whereas the Cheetah is a turn-key kind of
product as far as configuration goes.
We have firsthand experience with another board, but are still under
a non-disclosure agreement.
In a nutshell, if you have a readonly disk, the Cheetah is for you;
otherwise, look into the Rimfire board. We're buying and recommending
Rimfires for the foreseeable future.
All companies we have dealt with have been very professional and attentive
to problems. We have most and best experience with Ciprico (since they sent
someone up here to babysit the installation and listen to reactions/concerns).
Finally, neither I personally, nor the Computer Systems Research Institute
through which the evaluations were arranged, nor anyone in it, has any
connection with any of the companies mentioned except as potential (and in
some cases present) customers. Use the information in this message at your
own risk. You may redistribute as you see fit. (Of course, if you do cause
us to get some better service by referring to the UofToronto name and this
test here and there, we won't complain :-)
rayan
Rayan Zachariassen, AI group, University of Toronto
(rayan@ai.toronto.edu)
=====
Data:
First, the test configuration:
A Sun 3/180S (normally configured with two Eagles on an xy451) with a
Swallow-II (Fujitsu 2344 -- great disk btw, cutest thing you ever saw!)
on loan for the duration of the test. All timing results are the minima
of several runs in multiuser on an idle system (normal daemons running)
at -20 priority.
Interphase 4200 Cheetah (firmware revision X0E).
Ciprico 3200 Rimfire (firmware rev 14 of 87/04/13, engin. rev level 38)
Disk layout is illustrated by this partial dkinfo output (the 2344 was set
up for 69 sectors per track -- that's 67 data, 1 spare, 1 runt):
620 cylinders 27 heads 67 sectors/track
a: 18090 sectors (10 cyls)
starting cylinder 0
b: 36180 sectors (20 cyls)
starting cylinder 10
c: 1121580 sectors (620 cyls)
starting cylinder 0
d: 280395 sectors (155 cyls)
starting cylinder 30
f: 280395 sectors (155 cyls)
starting cylinder 185
h: 506520 sectors (280 cyls)
starting cylinder 340
The filesystems used for file I/O tests were set up with 'tunefs -a 40 -d 0'.
I mounted d, f, and h, on /u1, /u2, and /u3 respectively.
The 'bigfile' mentioned was approx 8.5Megabytes (12 * /vmunix).
All dd file I/O on a recently newfs'ed filesystem.
/tmp and /local are on the Xy451 of the Sun, on an Eagle (2351).
Throughput figures are in kilobytes per second, based on real time of course.
1 cylinder == 1809 sectors.
NB! These measurements are not scientific or controlled to any degree, nor
claimed as such. They satisfy our purposes. If you have measurements for
other controllers, or can fill in the blanks, I'd be happy to act as a
collection point for such data.
Format format 5verify total
minutes minutes minutes
----------------------------------------------------------------------------
Ciprico 3200 9 27 36
Interphase 4200 15 60 75
Command blocksize
8k 1 track 1 cylinder
kb/s kb/s kb/s
dd if=/dev/rXX0a of=/dev/null
----------------------------------------------------------------------------
Ciprico 3200 1116 1447 1781
Interphase 4200 1304 ? ?
dd if=/dev/rXX0f of=/dev/null
----------------------------------------------------------------------------
Ciprico 3200 1129 1447 1824
Interphase 4200 ? ? 2005
dd if=/u2/bigfile of=/u3/bigfile
----------------------------------------------------------------------------
Ciprico 3200 737 745 866
Interphase 4200 260 ? 560
dd if=/u2/bigfile of=/u2/bigfile.new
----------------------------------------------------------------------------
Ciprico 3200 733 775 889
Interphase 4200 270 ? 530
dd if=/u2/bigfile of=/dev/null
----------------------------------------------------------------------------
Ciprico 3200 1216 1212 1191
Interphase 4200 1435 ? 1536
dd if=/u2/bigfile of=/tmp/bigfile
----------------------------------------------------------------------------
Ciprico 3200 910 ? ?
Interphase 4200 796 ? ?
dd if=/dev/rXX0a of=/dev/rXX0b
----------------------------------------------------------------------------
Ciprico 3200 400 990 1447
Interphase 4200 370 ? 1460
Miscellaneous:
Command blocksize Cheetah Rimfire
----------------------------------------------------------------------------
dd if=/u2/bin/gnuemacs of=/dev/null 8k 2.2*Xy451 1.9*Xy451
cd /u2; dump 0f - /local | restore rf - 1524.2 real<1> 1490.0 real<6>
cd /u1; dump 0f - /u2 | restore rf - 1782.0 real<2> 1585.2 real<7>
fsck /dev/rXX0f 40.0 real<3> 29.1 real<8>
4 parallel fsck's, offset 5 secs each, of ip0f 159.4 real<4> 117.0 real<9>
2 parallel fsck's, of ip0f and ip0d 100.7 real<5> 57.3 real<10>
footnotes:
1) 1524.2 real 60.7 user 275.4 sys
2) 1782.0 real 58.5 user 270.0 sys
3) 40.0 real 8.0 user 3.3 sys
4) 159.4 real 25.4 user 11.4 sys
5) 100.7 real 8.6 user 3.7 sys
Cheetah dump/restore/fsck test filesystem:
Filesystem kbytes used avail capacity Mounted on
/dev/xy0g 110631 94633 4934 95% /local
/dev/ip0f 137460 95010 28704 77% /u2
6) 1490.0 real 59.2 user 263.1 sys
7) 1585.2 real 60.3 user 258.9 sys
8) Cache bits Readahead
48.4 real 7.9 user 3.3 sys 0 60
37.0 real 7.0 user 3.3 sys SEA|CRD 60
29.1 real 7.8 user 3.3 sys SEA|CRD 10
37.0 real 8.0 user 2.4 sys SEA|CRD|RAP|SRD 60
79.5 real 8.1 user 2.8 sys SEA|XHD(|XCY) 60
9) 117.0 real 31.9 user 15.3 sys
10) 57.3 real 8.4 user 3.5 sys
Rimfire dump/restore/fsck test filesystem:
Filesystem kbytes used avail capacity Mounted on
/dev/xy0g 110631 90478 9089 91% /local
/dev/rf0f 137460 90838 32876 73% /u2
As you may have noticed, the test filesystems unfortunately aren't identical.
The table above should therefore really include a 5% or so adjustment in one
column, but this is not significant.ron@topaz.rutgers.edu (Ron Natalie) (09/07/87)
I have the Ciprico Rim Fire 2220 (Multibus II) controller and it seems to be pretty nice. One interesting aspect of the driver they provide is the absence of any I/O queue in the driver. Since you can tag Multibus II Requests/Completions with an arbitrary number (like the buffer pointer) they just stuff them all into the controller as they come in and then just iodone 'em when the completion comes back. Amusing. Multibus II message passing is very handy for UNIX drivers. Everything falls into place nicely. -RON
dan@rna.UUCP (Dan Ts'o) (09/08/87)
In article <8709062258.AA22818@ephemeral.ai.toronto.edu> rayan@utegc.UUCP writes: > >With the faster CPUs we've been seeing lately, we have noticed that too >many processes spend their wall clock time in disk waits. The disk i/o >subsystem indeed seems to be one of the top 2 bottlenecks (along with >memory starvation) for our systems. We've been looking for something better >than our trusty Xylogics 451's, which has been the default controller >around here until now. To our knowledge, there are three companies in the >market with high-performance controllers that are/will be used in Suns, >namely Interphase, Ciprico, and Xylogics. The contending products are: > >Interphase: 4200 "Cheetah" >Ciprico: 3200 "Rimfire" >Xylogics: 75* Sun default > >At this time, I can release my evaluation of two of these boards. >What follows are my impressions of the products, along with a few numbers >for people who like to see the mundane stuff. > >The boards compared are the Cheetah and the Rimfire. In summary, the >Cheetah is very fast on flat out reads (at 2MB/sec read, it approaches >the theoretical limit of 2.4MB/Sec for this disk, especially considering >sector control data). The Cheetah firmware fell flat on its face when asked >to do I *and* O interleaved, and degraded to about Xy451 performance levels. >In comparison, the Rimfire did 1.8MB/sec on a flat out read, but did much >better on mixed I/O. Based on this gross error in the Cheetah, and the >results of the benchmarks, I'd give the current overall performance edge to >the Ciprico board. I went through a similar decision cycle for the Sun several months ago. (For better or worse), we decided on the Xylogics 752. Well, we've had to wait a long time, something I might not have done, had I known. But we just did get a 752 with the Sun driver and boot. Actually, the hardware has been ready for at least 3 months, but they contracted Sun consulting to do the software, and it has been slow in coming. In any case, the installation has been nearly flawless (a very minor glitch occurred, nothing substantial), and we have ben running for a little while. I'll try to digest your long posting to see if I can arrive at comparable stats for the 752. If you have any canned shell scripts or distilled benchmarks, I'd be happy to run them. Anyways, I think the Xylogics (now) is very much a contender in the VME disk controller race. Certainly Sun decided on them. We almost went with Interphase, but the promise of continued Sun compatibility persuaded us to wait (what we thought would be just a short while). Cheers, Dan Ts'o Dept. Neurobiology 212-570-7671 Rockefeller Univ. ...cmcl2!rna!dan 1230 York Ave. rna!dan@nyu.arpa NY, NY 10021