dupuy@cs.columbia.edu (03/04/90)
Ever had one of those days when everything was going wrong, and in desperation you start composing a message for the net, hoping someone will tell you what you're doing wrong, and as you start to write up a detailed description of the totally inexplicable thing which is screwing you over inside and out, you start wondering about these other probably unrelated, but possible sources of the problem, and then you think "maybe I should just reboot the machine and try again", and you do and the problem goes away, and you realize you have just wasted four hours wracking your brain because of a "feature" of somebody's disk driver? I just did. And so that nobody else has to waste four hours over the same "feature"/design assumption, I'll tell you two things; a short explanation of how to avoid my problem, and the gory gory details of how I found it. thing1 A feature of the sd driver in SunOS 4.0.3 is that if you access a disk, and then reformat it with a different geometry (quite a reasonable thing to do for some SCSI disks), REBOOT THE MACHINE before you try to use the newly reformatted disk. This may also be a feature of the xy, xd, or ip disk drivers, but it's much less common to change the disk geometry (unless you need more inodes, but that's another saga). thing2 So I was upgrading a Sun-3/160 (#732E0964) with two Wren Vs attached to an "sc" SCSI-2 controller, from 3.5 to 4.0.3, ya see, and having read all this keen stuff on sun-managers about how the Wrens use zoned bit recording and how to squeeze out the last bit of storage with the right format.dat parameters and all, and I thought, why not try it? I booted the machine as a diskless client running 4.0.3, backed up the disks, and then ran format, without rebooting. Well, there's a problem. I used Barry Lustig's enhanced parameters that he posted in sun-managers, and formatted the drive, no problem. And I newfs'd the 'c' partition (the whole disk) no problem too. And his parameters give me about 10 Meg more of storage than the parameters I got on the drive before, courtesy of BoxHill (plug), which in turn gave me about 15 Meg more than Sun's. So that's great, and kudos to Barry, whose excellent posting I'm appending to this message. Here's the format.dat before: disk_type = "CDC Wren V 94181-702" \ : ctlr = MD21 : fmt_time = 4 \ : cache = 0x11 : trks_zone = 1 : asect = 1 : atrks = 30 \ : ncyl = 716 : acyl = 2 : pcyl = 718 : nhead = 15 : nsect = 109 \ : rpm = 3600 : bpt = 20833 and after: disk_type = "CDC Wren V 94181-702" \ : ctlr = MD21 : fmt_time = 4 \ : cache = 0x11 : trks_zone = 15 : asect = 1 : atrks = 30 \ : ncyl = 1530 : acyl = 2 : pcyl = 1532 : nhead = 15 : nsect = 52 \ : rpm = 3597 : bpt = 20833 Notice the completely different ncyl and nsect values. But although it always worked for the 'c' partition, it didn't work for other partitions which weren't the whole disk, but were large (>225 cyl?), and go out towards the end of the disk. The limits on this were vague, and it's hard to define the space of (start cyl, size) combinations which failed, but I can provide a few datapoints: success/failure start cyl size (cyls) succeeds 0 1530 succeeds 1 1529 succeeds 2 1528 write error: 1191059 3 1527 succeeds 3 1000 write error: 64 1280 250 succeeds 1293 237 At this point I thought it might be a bug in the sd driver having something to do with the defect management zones - BoxHill used trks_zone=1, Barry (and Sun) use trks_zone=15 (one real cylinder). Or perhaps a bug in mkfs. But it wasn't. After I rebooted the machine and power-cycled the disks, everything worked just fine. No problems for any partitions, of any size, anywhere. I'm a bit too tired of this to try again with just rebooting, or just power-cycling, so I can't say which (or both) is the thing that fixed it. But my guess at the problem is that when the sd driver goes out and accesses a disk, it reads the label, so it knows what the partitioning is. While it's getting the label, it stashes a copy of the disk geometry, which is recorded in the label, into some secret memory locations, and uses this information to generate SCSI requests. If you reformat with a different geometry, the sd driver doesn't notice this, and starts calculating wrong. It would be nice if Sun could fix the sd driver and/or format so that when you reformat, the driver reloads its copy of the disk geometry. But if they can't, they should document this "feature"/design assumption in the the BUGS sections of the man pages for sd and format (not that that would have helped me, I know all this stuff, so I only looked at the man pages after I figured this all out :-). Alexander Dupuy Computer Science Department Columbia University New York, NY 10027-6699 (212) 854-4290 <dupuy@cs.columbia.edu> P.S. As I promised, here's Barry's original posting which inspired me: _______________________________________________________________________________ X-From: Barry Lustig <barry@gdp.com> X-Subject: Wren V X-Date: Fri, 25 Aug 89 12:45:11 PST I hope that the following message will clear up some confusion that many people seem to be having with Imprimis Wren V disk drives running on their Sun systems. I learned most of this information from my disk drive vendor, General Data Peripherals, from carefully studying the Wren V product specification manual and using that information to experiment with a drive. I've included some responses to an article posted by Tom Leach. I believe that he had the right idea in a number of cases, but also spread a fair amount of incorrect information. The article from Dave Kemp@dockmaster is right on the mark. | From: leach@OCE.ORST.EDU (Tom Leach) | Subject: Wren V formatting under SunOS 4.0.3 | Date: 23 Aug 89 23:42:47 GMT | | Casual use of the format program can damage a brand new disk beyond | repair! ... I have sent raw SCSI commands to my Wren using an SCSI editing tool on the Mac. I have changed all of the changeable parameters on the drive to many different values. Changing the values will not damage the drive in any way. The Wren V uses an intelligent embedded SCSI controller to manage the disk. The controller stores parameter information on spare cylinders at the end of the drive. These cylinders are *NOT* user accessable. As hard as you may try, you cannot accidentally overwrite them. These parameters can only be accessed through the use of the structured SCSI mode select commands. This is how you can enable features such as the read ahead cache, change the drive's error correction policy, etc. These same values can be retrieved using the SCSI modem sense commands. | If anyone has tried to format a Wren V (94181-702) under SunOS 4.0.3, you | will find that the entry in /etc/format.dat is very poor. (after | formatting and partitioning, I got ~320M in the C partition). ... Very true. HOW TO GET THE VALUES: The first thing to look at when determing the format.dat entries is if the disk is already formatted. If it is, you can use the format program to display the current format info. The numbers that your most interested in are the number of cylinders (ncyl), the number of alternate cylinders (acyl), the number of heads (nhead) and the number of sectors per track (nsect). This is by far the safest way to lay a partition table onto the disk since the manufacturer (you hope!) should have formatted the disk without overwriting the geometry. ... I want to address a couple of points in the paragraph above. First, the company that you buy your drive from doesn't always use the best parameters for the drive. Make sure that your vendor understands their product. Some companies do not know how to make use of the advanced features of the Wren drives. Others may not know the best way to format the drive. The second point is it is untrue that you can overwrite the geometry of the drive. The drive parameters are located on a non-user accessable portion of the drive. They can only be accessed via a mode select command. A request to write a block past the end of the user accessable part of the drive will return an "illegal command" response. The Wren V uses a data recording technique called Zoned Bit Recording (ZBR). The essence of ZBR for those who do not know, is that there are more sectors per track on the outer cylinders of the drive than on the inner cylinders. This allows Imprimis to pack more data onto a drive. Since these are SCSI drives, the data is requested from the drive by block number, not by requesting a particular cylinder, head, sector tuple. If the system had to determine the tuple it would have to know about the ZBR zone size. Unfortunately, BSD based UNIX systems (I don't know about sysV) generally use the latter form of disk access rather than the former. Systems with the fast filesystem try to lay out the filesystem in an intelligent manner; making sure that filesystems end on cylinder boundaries and other optimizations. For this reason, we need to give the Sun a geometry to use for the drive. The format.dat table below is it what we use at GDP recommend. It gives us approximately 611 MB of formatted disk space. disk_type = "CDC Wren V 94181-702" \ : ctlr = MD21 : fmt_time = 4 : atrks = 30 \ : cache = 0x11 : trks_zone = 15 : asect = 1 \ : ncyl = 1530 : acyl = 2 : pcyl = 1532 : nhead = 15 : nsect = 52 \ : rpm = 3597 : bpt = 20833 I'll explain each of the important entries one by one. 1) atrks: Number of alternates tracks per drive. These tracks are used if there are no available sectors on a defect management zone. This leaves 2 spare cylinders, (Note these are different than the 2 spare cylinders that the Sun requires. 2) cache: This turns the Wren V's read ahead cache on. 0x01 would turn it off. 3) trks_zone: This is the number of tracks in a defect management zone. On a Wren V the only allowable numbers are 0, 1, or 15. A value of 0 says that there will be no defect management zones. This will cause all bad blocks to be mapped to sectors on the alternate tracks at the end of the drive. A value of one says that a defect management zone will consist of 1 track. A value of 15 means that a zone will be 15 tracks, or 1 cylinder. 4) asect: The number of alternate sectors per defect management zone. If trks_zone is 15, there will be one spare sector per cylinder for bad blocks. If trks_zone is 1, there will be one per track (the same as slipping sectors on the SMD drives). If trks_zone is 0, this entry will be ignored. 5) ncyl: This is the number of user available cylinders on the drive. On a Sun running SunOS 4.0 or greater, the OS wants to reserve 2 cylinders for itself. SunOs wants to manage the drive defects by itself, so it reserves 2 cylinders at the end of the drive for that purpose. This is unnecessary on the Wren, because the Wren manages bad blocks in the drive. The Sun also will store a backup partition table and label here. 6) pcyl: The number of user available cylinders on the drive. In essence, the number of cylinders that SunOS can access. 7) nsect: Number of sectors per track. This number doesn't mean anything to the Wren. It is for the Sun to figure out where cylinder boundaries are. WARNING: Be VERY careful on how you use the format program. If you try to get to much out of your disk, you will overwrite the geometry which lives on the last cylinder of the disk. This will render the disk useless and you're only recourse is to buy a new disk. The moral of this story is to now be TOO greedy on cylinder allocation. Tom Leach leach@oce.orst.edu You cannot overwrite the "geometry" of your disk. What you can do is to format your drive with error management parameters that leave you with less usable space than you thought. All you have to do is to reformat your drive again.