[comp.sys.sun] Problems with format/mkfs on Wren V

dupuy@cs.columbia.edu (03/04/90)

Ever had one of those days when everything was going wrong, and in
desperation you start composing a message for the net, hoping someone will
tell you what you're doing wrong, and as you start to write up a detailed
description of the totally inexplicable thing which is screwing you over
inside and out, you start wondering about these other probably unrelated,
but possible sources of the problem, and then you think "maybe I should
just reboot the machine and try again", and you do and the problem goes
away, and you realize you have just wasted four hours wracking your brain
because of a "feature" of somebody's disk driver?  I just did.  And so
that nobody else has to waste four hours over the same "feature"/design
assumption, I'll tell you two things; a short explanation of how to avoid
my problem, and the gory gory details of how I found it.

thing1	

A feature of the sd driver in SunOS 4.0.3 is that if you access a disk,
and then reformat it with a different geometry (quite a reasonable thing
to do for some SCSI disks), REBOOT THE MACHINE before you try to use the
newly reformatted disk.  This may also be a feature of the xy, xd, or ip
disk drivers, but it's much less common to change the disk geometry
(unless you need more inodes, but that's another saga).

thing2

So I was upgrading a Sun-3/160 (#732E0964) with two Wren Vs attached to an
"sc" SCSI-2 controller, from 3.5 to 4.0.3, ya see, and having read all
this keen stuff on sun-managers about how the Wrens use zoned bit
recording and how to squeeze out the last bit of storage with the right
format.dat parameters and all, and I thought, why not try it?  I booted
the machine as a diskless client running 4.0.3, backed up the disks, and
then ran format, without rebooting.

Well, there's a problem.  I used Barry Lustig's enhanced parameters that
he posted in sun-managers, and formatted the drive, no problem.  And I
newfs'd the 'c' partition (the whole disk) no problem too.  And his
parameters give me about 10 Meg more of storage than the parameters I got
on the drive before, courtesy of BoxHill (plug), which in turn gave me
about 15 Meg more than Sun's.  So that's great, and kudos to Barry, whose
excellent posting I'm appending to this message.

Here's the format.dat before:

disk_type = "CDC Wren V 94181-702" \
	: ctlr = MD21 : fmt_time = 4 \
	: cache = 0x11 : trks_zone = 1 : asect = 1 : atrks = 30 \
	: ncyl = 716 : acyl = 2 : pcyl = 718 : nhead = 15 : nsect = 109 \
	: rpm = 3600 : bpt = 20833

and after:

disk_type = "CDC Wren V 94181-702" \
	: ctlr = MD21 : fmt_time = 4 \
	: cache = 0x11 : trks_zone = 15 : asect = 1 : atrks = 30 \
	: ncyl = 1530 : acyl = 2 : pcyl = 1532 : nhead = 15 : nsect = 52 \
	: rpm = 3597 : bpt = 20833

Notice the completely different ncyl and nsect values.

But although it always worked for the 'c' partition, it didn't work for
other partitions which weren't the whole disk, but were large (>225 cyl?),
and go out towards the end of the disk.  The limits on this were vague,
and it's hard to define the space of (start cyl, size) combinations which
failed, but I can provide a few datapoints:

success/failure		start cyl	size (cyls)

succeeds		0		1530
succeeds		1		1529
succeeds		2		1528
write error: 1191059	3		1527
succeeds		3		1000

write error: 64		1280		250
succeeds		1293		237


At this point I thought it might be a bug in the sd driver having
something to do with the defect management zones - BoxHill used
trks_zone=1, Barry (and Sun) use trks_zone=15 (one real cylinder).  Or
perhaps a bug in mkfs. But it wasn't.

After I rebooted the machine and power-cycled the disks, everything worked
just fine.  No problems for any partitions, of any size, anywhere.

I'm a bit too tired of this to try again with just rebooting, or just
power-cycling, so I can't say which (or both) is the thing that fixed it.
But my guess at the problem is that when the sd driver goes out and
accesses a disk, it reads the label, so it knows what the partitioning is.
While it's getting the label, it stashes a copy of the disk geometry,
which is recorded in the label, into some secret memory locations, and
uses this information to generate SCSI requests.  If you reformat with a
different geometry, the sd driver doesn't notice this, and starts
calculating wrong.

It would be nice if Sun could fix the sd driver and/or format so that when
you reformat, the driver reloads its copy of the disk geometry.  But if
they can't, they should document this "feature"/design assumption in the
the BUGS sections of the man pages for sd and format (not that that would
have helped me, I know all this stuff, so I only looked at the man pages
after I figured this all out :-).

Alexander Dupuy
Computer Science Department
Columbia University
New York, NY  10027-6699
(212) 854-4290
<dupuy@cs.columbia.edu>

P.S. As I promised, here's Barry's original posting which inspired me:
_______________________________________________________________________________
X-From: Barry Lustig <barry@gdp.com>
X-Subject: Wren V
X-Date: Fri, 25 Aug 89 12:45:11 PST

I hope that the following message will clear up some confusion that many
people seem to be having with Imprimis Wren V disk drives running on their
Sun systems.  I learned most of this information from my disk drive
vendor, General Data Peripherals, from carefully studying the Wren V
product specification manual and using that information to experiment with
a drive.

I've included some responses to an article posted by Tom Leach.  I believe
that he had the right idea in a number of cases, but also spread a fair
amount of incorrect information.  The article from Dave Kemp@dockmaster is
right on the mark.

| From: leach@OCE.ORST.EDU (Tom Leach)
| Subject: Wren V formatting under SunOS 4.0.3
| Date: 23 Aug 89 23:42:47 GMT
|   
| Casual use of the format program can damage a brand new disk beyond
| repair!  ...

I have sent raw SCSI commands to my Wren using an SCSI editing tool on the
Mac.  I have changed all of the changeable parameters on the drive to many
different values.  Changing the values will not damage the drive in any
way.

The Wren V uses an intelligent embedded SCSI controller to manage the
disk.  The controller stores parameter information on spare cylinders at
the end of the drive.  These cylinders are *NOT* user accessable.  As hard
as you may try, you cannot accidentally overwrite them.  These parameters
can only be accessed through the use of the structured SCSI mode select
commands.  This is how you can enable features such as the read ahead
cache, change the drive's error correction policy, etc.  These same values
can be retrieved using the SCSI modem sense commands.

| If anyone has tried to format a Wren V (94181-702) under SunOS 4.0.3, you
| will find that the entry in /etc/format.dat is very poor.  (after
| formatting and partitioning, I got ~320M in the C partition).  ...

Very true.

    HOW TO GET THE VALUES:

The first thing to look at when determing the format.dat entries is if the
disk is already formatted.  If it is, you can use the format program to
display the current format info.  The numbers that your most interested in
are the number of cylinders (ncyl), the number of alternate cylinders
(acyl), the number of heads (nhead) and the number of sectors per track
(nsect).  This is by far the safest way to lay a partition table onto the
disk since the manufacturer (you hope!) should have formatted the disk
without overwriting the geometry.  ...

I want to address a couple of points in the paragraph above.  First, the
company that you buy your drive from doesn't always use the best
parameters for the drive. Make sure that your vendor understands their
product. Some companies do not know how to make use of the advanced
features of the Wren drives.  Others may not know the best way to format
the drive.  The second point is it is untrue that you can overwrite the
geometry of the drive.  The drive parameters are located on a non-user
accessable portion of the drive.  They can only be accessed via a mode
select command.  A request to write a block past the end of the user
accessable part of the drive will return an "illegal command" response.

The Wren V uses a data recording technique called Zoned Bit Recording
(ZBR).  The essence of ZBR for those who do not know, is that there are
more sectors per track on the outer cylinders of the drive than on the
inner cylinders.  This allows Imprimis to pack more data onto a drive.
Since these are SCSI drives, the data is requested from the drive by block
number, not by requesting a particular cylinder, head, sector tuple.  If
the system had to determine the tuple it would have to know about the ZBR
zone size.  Unfortunately, BSD based UNIX systems (I don't know about
sysV) generally use the latter form of disk access rather than the former.
Systems with the fast filesystem try to lay out the filesystem in an
intelligent manner; making sure that filesystems end on cylinder
boundaries and other optimizations.  For this reason, we need to give the
Sun a geometry to use for the drive.  The format.dat table below is it
what we use at GDP recommend.  It gives us approximately 611 MB of
formatted disk space.

    disk_type = "CDC Wren V 94181-702" \
	    : ctlr = MD21 : fmt_time = 4 : atrks = 30 \
	    : cache = 0x11 : trks_zone = 15 : asect = 1 \
	    : ncyl = 1530 : acyl = 2 : pcyl = 1532 : nhead = 15 : nsect = 52 \
	    : rpm = 3597 : bpt = 20833

I'll explain each of the important entries one by one.

1) atrks:     Number of alternates tracks per drive.  These tracks are
              used if there are no available sectors on a defect
	      management zone.  This leaves 2 spare cylinders,  (Note
	      these are different than the 2 spare cylinders that the
	      Sun requires.

2) cache:     This turns the Wren V's read ahead cache on.  0x01 would
              turn it off.

3) trks_zone: This is the number of tracks in a defect management
              zone.  On a Wren V the only allowable numbers are 0, 1,
	      or 15.  A value of 0 says that there will be no defect
	      management zones.  This will cause all bad blocks to be
	      mapped to sectors on the alternate tracks at the end
	      of the drive.  A value of one says that a defect
	      management zone will consist of 1 track.  A value of 15
	      means that a zone will be 15 tracks, or 1 cylinder.

4) asect:     The number of alternate sectors per defect management zone.
              If trks_zone is 15, there will be one spare sector per
	      cylinder for bad blocks.  If trks_zone is 1, there will be
	      one per track (the same as slipping sectors on the SMD
	      drives).  If trks_zone is 0, this entry will be ignored.

5) ncyl:      This is the number of user available cylinders on the drive.
              On a Sun running SunOS 4.0 or greater, the OS wants to
	      reserve 2 cylinders for itself.  SunOs wants to manage
	      the drive defects by itself, so it reserves 2 cylinders
	      at the end of the drive for that purpose.  This is
	      unnecessary on the Wren, because the Wren manages bad
	      blocks in the drive.  The Sun also will store a backup
	      partition table and label here.

6) pcyl:      The number of user available cylinders on the drive.  In
              essence, the number of cylinders that SunOS can access.

7) nsect:     Number of sectors per track.  This number doesn't mean
              anything to the Wren.  It is for the Sun to figure out where
	      cylinder boundaries are.

    WARNING:

Be VERY careful on how you use the format program.  If you try to get to
much out of your disk, you will overwrite the geometry which lives on the
last cylinder of the disk.  This will render the disk useless and you're
only recourse is to buy a new disk.  The moral of this story is to now be
TOO greedy on cylinder allocation.

    Tom Leach
    leach@oce.orst.edu


You cannot overwrite the "geometry" of your disk.  What you can do is to
format your drive with error management parameters that leave you with
less usable space than you thought.  All you have to do is to reformat
your drive again.