[comp.periphs] VME disk controller review

rayan@utegc.UUCP (09/07/87)

With the faster CPUs we've been seeing lately, we have noticed that too
many processes spend their wall clock time in disk waits. The disk i/o
subsystem indeed seems to be one of the top 2 bottlenecks (along with
memory starvation) for our systems. We've been looking for something better
than our trusty Xylogics 451's, which has been the default controller
around here until now. To our knowledge, there are three companies in the
market with high-performance controllers that are/will be used in Suns,
namely Interphase, Ciprico, and Xylogics. The contending products are:

Interphase:	4200 "Cheetah"
Ciprico:	3200 "Rimfire"
Xylogics:	75* Sun default

At this time, I can release my evaluation of two of these boards.
What follows are my impressions of the products, along with a few numbers
for people who like to see the mundane stuff.

The boards compared are the Cheetah and the Rimfire. In summary, the
Cheetah is very fast on flat out reads (at 2MB/sec read, it approaches
the theoretical limit of 2.4MB/Sec for this disk, especially considering
sector control data). The Cheetah firmware fell flat on its face when asked
to do I *and* O interleaved, and degraded to about Xy451 performance levels.
In comparison, the Rimfire did 1.8MB/sec on a flat out read, but did much
better on mixed I/O. Based on this gross error in the Cheetah, and the
results of the benchmarks, I'd give the current overall performance edge to
the Ciprico board.

Here are my recollections of the products:

=====
Cheetah:

No problem with installation once I figured out what the bits meant...
This controller has a control byte which is apparently stored in the
sector header of each sector upon a format. The bits include flags like
Runt Sector Enable, Spare Sector Enable (sector slipping), Cache
Enable, STatus Change, Retry, etc. The status change bit is supposed to
send reports when the disk goes offline/online, unfortunately it
doesn't work too well... almost any kind of disk access with this bit
turned on would crash the driver we got for the Cheetah. I figured out
the appropriate bits (RSE,SSE,CE,...) and started formatting the disk.
Disk maintenance is through a diag utility (modified Sun diag). While
the format was going on I read some more in the manual and discovered
that the board does sector slipping and bad track forwarding, but not
bad sector forwarding.  If sector slipping is enabled, bad track
forwarding is done after the spare sector on the track has been used
up. Still, I think this is rather wasteful of the spare cylinders at
the end of the disk. In my limited experience, bad spots on the disk
don't usually render much of a track unusable.  The format and 5 verify
passes finally finished after 75 minutes (c.f. 6 hours for an
xy451/SuperEagle). During format, the thing does bad track forwarding
if I recall right. 2 bad spots, one around cylinder 8 and one somewhere
in the middle of the h partition. The forwarded tracks didn't
measurably affect the benchmarks, although degraded performance was
observable from diag/read. The disk was formatted using the same
parameters as the other board (5 sector skew, gap1/gap2 = 16/19).

Boy this controller screams! If you're into doing image copies of your
disk, this thing is Real Quick. I was therefore rather surprised when I
tried some mixed I/O and saw performance no better than an Xy451. We
called up Interphase to make sure our configuration and setup was
correct. It seems that if one gets 2 MB/sec reads, one's configuration
is correct. The problem seems to be silly firmware that flushes the
read cache on every write. The engineer we spoke to said they were
working actively with Sun on new firmware that would fix this ("improve
performance 30%"), unfortunately at this time the new PROMs aren't yet
available. Incidentally, we got a note from Interphase mentioning an
upgrade program for people with old firmware, so they do keep their
customers in mind. Most benchmarks reinforced this initial impression
(great at reading, lousy at reading *and* writing). One more thing I
noted was that performance seems to degrade exessively when one
accesses different partitions in e.g. a disk-to-disk copy or with
parallel activity on two filesystems. Since it happens in this latter
case too, I doubt it has much to do with the cache algorithm syndrome.

Addendum: we got the new firmware PROMs but couldn't get the disk
formatted properly. The Interphase engineer mentioned something about
the Runt sector possibly being too small due to a bug in the
experimental PROMs we got. Eventually we decided things (the entire
evaluation period for all boards) had dragged on long enough and we
sent the board back. I have heard they are now shipping 'final
production' 4200's. Even if the performance on some of the tests was
upped 30%, it wouldn't make much overall difference.

=====
Rimfire:

This one was real easy to set up the first time around, because one of
Ciprico's globetrotting engineers flew up to install the board and make
sure everything was ok. That first time, things went by in a bit of a
blur, but I then had to duplicate the installation on my own later.
Downtime is minimal when installing a Rimfire, because all disk
maintenance is done online with a special utility program. This is a
full-screen maintenance program from which one can do all the usual
diag stuff, and in addition has some cache twiddling and reporting
functions. To set up the disk, I chose a disk template from a list of a
score or so, that included the popular Fujitsu, NEC, CDC and Toshiba
disks, and some miscellany. I then modify the format parameters
appropriately (interleave on, headskew=5, cylskew=21, hgskew=0,
recovery=0, idpre=16, datapre=19) to values recommended by the Ciprico
engineer. The idpre and datapre parameters are apparently the
synchronization gap sizes (a.k.a. gap1/gap2). The parameters not
present on the Cheetah are cylinder skew, 'hg' skew, and the recovery
options on the Rimfire are slightly better. This setup information is
then passed to the driver, and I could start formatting. Nine (9) (sic!)
minutes afterwards, the format is done. The five verify passes take
another 30-odd minutes, but who is complaining... When the Ciprico
fellow did this in front of me, everything worked fine. When I did it
by myself, the verify went through 7 cylinders and then happily stopped
as if it was all done. I tracked it down to a compiler bug (removing a
'register' in a declaration fixed things) that showed up with the pre-release
'rfutil' we got. There were a few other problems with that version, but a
later and more mature program we received worked beautifully with no bugs
and no surprises.  As regards handling bad spots, sector slipping, and bad
track and bad sector mapping are all supported.

Caching algorithm or parameters are specified by the driver at disk
access time, and the driver can differentiate between three kinds of
accesses; raw disk I/O, file system I/O in 4k block size multiple, file
system I/O not in 4k bs multiple. The caching scheme, read-ahead size
and retry options, are specified separately for each kind of access.
Read-ahead when enabled can be up to 244 sectors. The control bits for
caching are: search cache (SEA), save read data in cache (CRD), read
ahead priority (RAP), save write data in cache (CWT), sort reads (SRD),
sort writes (SWT), read ahead across head (XHD), and read ahead across
cylinder (XCY). The boot-time parameters are hardwired into the kernel
in a table in a header file. The default parameters (unless otherwise
mentioned) are SEA|XHD for filesystem I/O, and SEA|XHD|XCY for raw I/O,
both with 100 sector read ahead. These parameters are defaulted from an
array in the kernel, and can be modified online, and cache statistics
examined (the best we got was 99.97% hit rate on a long raw disk read).

The Rimfire is slightly slower than the Cheetah in terms of maximum
throughput, coming in at 1.8MB/sec max. read speed. It has much better
behavior when doing mixed reads/writes (or rather, it has 'expected'
behavior whereas the Cheetah had broken firmware). It degrades linearly
when kept busy on different parts of the disk, in my tests.  Essentially,
it is consistent with expectations about the overall improvements relative
to an Xy451.

A passtime with this controller is fiddling the cache control bits to
see what effect it would have on things.

It turned out that several of the cache parameters were quite expensive
if you didn't actually need them. I found readahead a good thing on
file i/o, and pretty much irrelevant on long raw i/o. Since fsck is
short raw i/o, I ended up with the parameters 301-10/101-100 as pretty
much the optimal ones for my purposes (301-10 == XHD,XCY,SEA, 10
sectors readahead raw i/o; 101-10 == XHD,SEA, 100 sectors readahead
file i/o).  There is of course overhead in the firmware depending on
the extra work you want the controller to do by setting the cache
options. One of my tuning experiment was with more-or-less-random-access
fsck where time taken would change from 3/4th to twice of the Cheetah.
Note that the parameters that are optimal for fsck may be lousy for dd,
and vice versa (adaptive algorithms would be nice). Turning on
fancy options, for example cross-cylinder readahead or sorts for file i/o,
may be a mistake. After speaking with some engineers from various companies,
I get the impression the Ciprico board is the only one around that does NOT
flush the read cache on any write operation. To do this they do some data
structure manipulation in firmware, which is not easy in a hardware cache
design. I don't design these things for a living, but invalidating the
cache on every write operation is a *bad* thing to do.

=====
Comparison:

The impression I get is that the Ciprico is about 1.9-2.0 times an
Xy451 on most things, across the board. The Cheetah is 2.1-2.2 times an
Xy451 on a certain set of activities, and degrades to 1.0-1.1 times an
Xy451 on heavy read/write I/O mixes. The software with the Ciprico is
nicer and gives finer control than the Interphase (having a Cache
Enable bit with nothing else, doesn't give much choice). However, one
uses the setup program once, when the disk is installed, and forget
about it. Online maintenance is nice of course. Improving the
Interphase by 30% or so as promised by new firmware, is not likely to
change the picture too much apart from making the Interphase more
palatable. The Rimfire is comparatively "involved" to set up the disk
and tune the cache properly, whereas the Cheetah is a turn-key kind of
product as far as configuration goes.

We have firsthand experience with another board, but are still under
a non-disclosure agreement.

In a nutshell, if you have a readonly disk, the Cheetah is for you;
otherwise, look into the Rimfire board. We're buying and recommending
Rimfires for the foreseeable future.

All companies we have dealt with have been very professional and attentive
to problems. We have most and best experience with Ciprico (since they sent
someone up here to babysit the installation and listen to reactions/concerns).

Finally, neither I personally, nor the Computer Systems Research Institute
through which the evaluations were arranged, nor anyone in it, has any
connection with any of the companies mentioned except as potential (and in
some cases present) customers. Use the information in this message at your
own risk. You may redistribute as you see fit. (Of course, if you do cause
us to get some better service by referring to the UofToronto name and this
test here and there, we won't complain :-)

rayan

Rayan Zachariassen, AI group, University of Toronto
(rayan@ai.toronto.edu)

=====
Data:

First, the test configuration:

A Sun 3/180S (normally configured with two Eagles on an xy451) with a
Swallow-II (Fujitsu 2344 -- great disk btw, cutest thing you ever saw!)
on loan for the duration of the test. All timing results are the minima
of several runs in multiuser on an idle system (normal daemons running)
at -20 priority.

Interphase 4200 Cheetah (firmware revision X0E).
Ciprico 3200 Rimfire (firmware rev 14 of 87/04/13, engin. rev level 38)

Disk layout is illustrated by this partial dkinfo output (the 2344 was set
up for 69 sectors per track -- that's 67 data, 1 spare, 1 runt):

	620 cylinders 27 heads 67 sectors/track
	a: 18090 sectors (10 cyls)
	   starting cylinder 0
	b: 36180 sectors (20 cyls)
	   starting cylinder 10
	c: 1121580 sectors (620 cyls)
	   starting cylinder 0
	d: 280395 sectors (155 cyls)
	   starting cylinder 30
	f: 280395 sectors (155 cyls)
	   starting cylinder 185
	h: 506520 sectors (280 cyls)
	   starting cylinder 340

The filesystems used for file I/O tests were set up with 'tunefs -a 40 -d 0'.
I mounted d, f, and h, on /u1, /u2, and /u3 respectively.
The 'bigfile' mentioned was approx 8.5Megabytes (12 * /vmunix).
All dd file I/O on a recently newfs'ed filesystem.
/tmp and /local are on the Xy451 of the Sun, on an Eagle (2351).
Throughput figures are in kilobytes per second, based on real time of course.
1 cylinder == 1809 sectors.

NB! These measurements are not scientific or controlled to any degree, nor
claimed as such. They satisfy our purposes. If you have measurements for
other controllers, or can fill in the blanks, I'd be happy to act as a
collection point for such data.

Format					format	5verify total
					minutes	minutes	minutes
----------------------------------------------------------------------------
Ciprico 3200				9	27	36
Interphase 4200				15	60	75

Command						blocksize
					8k	1 track	1 cylinder
					kb/s	kb/s	kb/s
dd if=/dev/rXX0a of=/dev/null
----------------------------------------------------------------------------
Ciprico 3200				1116	1447	1781
Interphase 4200				1304	?	?

dd if=/dev/rXX0f of=/dev/null
----------------------------------------------------------------------------
Ciprico 3200				1129	1447	1824
Interphase 4200				?	?	2005

dd if=/u2/bigfile of=/u3/bigfile
----------------------------------------------------------------------------
Ciprico 3200				737	745	866
Interphase 4200				260	?	560

dd if=/u2/bigfile of=/u2/bigfile.new
----------------------------------------------------------------------------
Ciprico 3200				733	775	889
Interphase 4200				270	?	530

dd if=/u2/bigfile of=/dev/null
----------------------------------------------------------------------------
Ciprico 3200				1216	1212	1191
Interphase 4200				1435	?	1536

dd if=/u2/bigfile of=/tmp/bigfile
----------------------------------------------------------------------------
Ciprico 3200				910	?	?
Interphase 4200				796	?	?

dd if=/dev/rXX0a of=/dev/rXX0b
----------------------------------------------------------------------------
Ciprico 3200				400	990	1447
Interphase 4200				370	?	1460


Miscellaneous:

Command				blocksize	Cheetah		Rimfire
----------------------------------------------------------------------------
dd if=/u2/bin/gnuemacs of=/dev/null	8k	2.2*Xy451	1.9*Xy451
cd /u2; dump 0f - /local | restore rf -		1524.2 real<1>	1490.0 real<6>
cd /u1; dump 0f - /u2 | restore rf -		1782.0 real<2>	1585.2 real<7>
fsck /dev/rXX0f					  40.0 real<3>	  29.1 real<8>
4 parallel fsck's, offset 5 secs each, of ip0f	 159.4 real<4>	 117.0 real<9>
2 parallel fsck's, of ip0f and ip0d		 100.7 real<5>	  57.3 real<10>

footnotes:

1)	1524.2 real        60.7 user       275.4 sys  
2)	1782.0 real        58.5 user       270.0 sys
3)	  40.0 real         8.0 user         3.3 sys  
4)	 159.4 real        25.4 user        11.4 sys
5)	 100.7 real         8.6 user         3.7 sys

Cheetah dump/restore/fsck test filesystem:
Filesystem            kbytes    used   avail capacity  Mounted on
/dev/xy0g             110631   94633    4934    95%    /local
/dev/ip0f             137460   95010   28704    77%    /u2

6)	1490.0 real        59.2 user       263.1 sys
7)	1585.2 real        60.3 user       258.9 sys
8)							Cache bits   Readahead
	  48.4 real         7.9 user         3.3 sys	0		60
	  37.0 real         7.0 user         3.3 sys	SEA|CRD		60
	  29.1 real         7.8 user         3.3 sys	SEA|CRD		10
	  37.0 real         8.0 user         2.4 sys	SEA|CRD|RAP|SRD	60
	  79.5 real         8.1 user         2.8 sys	SEA|XHD(|XCY)	60

9)	 117.0 real        31.9 user        15.3 sys
10)       57.3 real         8.4 user         3.5 sys

Rimfire dump/restore/fsck test filesystem:
Filesystem            kbytes    used   avail capacity  Mounted on
/dev/xy0g             110631   90478    9089    91%    /local
/dev/rf0f             137460   90838   32876    73%    /u2

As you may have noticed, the test filesystems unfortunately aren't identical.
The table above should therefore really include a 5% or so adjustment in one
column, but this is not significant.

ron@topaz.rutgers.edu (Ron Natalie) (09/07/87)

I have the Ciprico Rim Fire 2220 (Multibus II) controller and it seems
to be pretty nice.  One interesting aspect of the driver they provide
is the absence of any I/O queue in the driver.  Since you can tag Multibus
II Requests/Completions with an arbitrary number (like the buffer pointer)
they just stuff them all into the controller as they come in and then
just iodone 'em when the completion comes back.  Amusing.  Multibus II
message passing is very handy for UNIX drivers.  Everything falls into
place nicely.

-RON

dan@rna.UUCP (Dan Ts'o) (09/08/87)

In article <8709062258.AA22818@ephemeral.ai.toronto.edu> rayan@utegc.UUCP writes:
>
>With the faster CPUs we've been seeing lately, we have noticed that too
>many processes spend their wall clock time in disk waits. The disk i/o
>subsystem indeed seems to be one of the top 2 bottlenecks (along with
>memory starvation) for our systems. We've been looking for something better
>than our trusty Xylogics 451's, which has been the default controller
>around here until now. To our knowledge, there are three companies in the
>market with high-performance controllers that are/will be used in Suns,
>namely Interphase, Ciprico, and Xylogics. The contending products are:
>
>Interphase:	4200 "Cheetah"
>Ciprico:	3200 "Rimfire"
>Xylogics:	75* Sun default
>
>At this time, I can release my evaluation of two of these boards.
>What follows are my impressions of the products, along with a few numbers
>for people who like to see the mundane stuff.
>
>The boards compared are the Cheetah and the Rimfire. In summary, the
>Cheetah is very fast on flat out reads (at 2MB/sec read, it approaches
>the theoretical limit of 2.4MB/Sec for this disk, especially considering
>sector control data). The Cheetah firmware fell flat on its face when asked
>to do I *and* O interleaved, and degraded to about Xy451 performance levels.
>In comparison, the Rimfire did 1.8MB/sec on a flat out read, but did much
>better on mixed I/O. Based on this gross error in the Cheetah, and the
>results of the benchmarks, I'd give the current overall performance edge to
>the Ciprico board.

	I went through a similar decision cycle for the Sun several months
ago. (For better or worse), we decided on the Xylogics 752. Well, we've had
to wait a long time, something I might not have done, had I known. But we
just did get a 752 with the Sun driver and boot. Actually, the hardware
has been ready for at least 3 months, but they contracted Sun consulting
to do the software, and it has been slow in coming.
	In any case, the installation has been nearly flawless (a very minor
glitch occurred, nothing substantial), and we have ben running for a little
while. I'll try to digest your long posting to see if I can arrive at
comparable stats for the 752. If you have any canned shell scripts or
distilled benchmarks, I'd be happy to run them.
	Anyways, I think the Xylogics (now) is very much a contender in the
VME disk controller race. Certainly Sun decided on them. We almost went with
Interphase, but the promise of continued Sun compatibility persuaded us to
wait (what we thought would be just a short while).

				Cheers,
				Dan Ts'o
				Dept. Neurobiology	212-570-7671
				Rockefeller Univ.	...cmcl2!rna!dan
				1230 York Ave.		rna!dan@nyu.arpa
				NY, NY 10021