larry@nstar.UUCP (Larry Snyder) (02/28/90)
I recently installed a 1542 as a secondary controller along with a 2372. I'm booting off the 2372, mounting /usr off the 1542, and then mounting /usr2 of the 1542. I've been getting numerous errors on the SCSI drive. When installing the drive, I did a low level format using the 1542 BIOS and ran the complete DMA check for 2 hours (which didn't produce a single error). I tried adding the errors for the SCSI drive using mkpart -A <abs sector> and I get an error - No root partition. I also tried "mkpart -A <sec> -f /etc/partitions" - likewise the same error, no the bad blocks were not added to /etc/partitions. Finally, I added the bad blocks manually to /etc/partitions, unmounted and remounted the device (I assume that /etc/partitions is read at mount time). The errors that I am still getting are "illegal or erroneous command, absolute sector 1364708". My DMA was running at SYCLK and I changed it to SYSCLK/2 (I have the AMI bios). The 1542 is jumpered in the default configuration with the exception of the floppies disabled. Any ideas out there? -- Larry Snyder, Northern Star Communications, Notre Dame, IN USA uucp: larry@nstar -or- ...!iuvax!ndmath!nstar!larry 4 inbound dialup high speed line public access system
pcg@odin.cs.aber.ac.uk (Piercarlo Grandi) (03/04/90)
In article <511211@nstar.UUCP> larry@nstar.UUCP (Larry Snyder) writes:
I tried adding the errors for the SCSI drive using mkpart -A <abs sector>
and I get an error - No root partition. I also tried "mkpart -A <sec> -f
/etc/partitions" - likewise the same error, no the bad blocks were not added
to /etc/partitions.
This is completely wrong. You must first build the VTOC, with 'mkpart
-p', and then the 'rsrvd' and 'alts' partitionlets.
Finally, I added the bad blocks manually to /etc/partitions, unmounted and
remounted the device (I assume that /etc/partitions is read at mount time).
This is also completely wrong. Look at the 'last accessed time' using
'ls -ltur'.
What happens with bad block handling is that the bad block table is an
array of block numbers in the 'rsrvd' partitionlet on the disc. Whwn the
disc is mounted, the table is read into memory by the driver.
The on-disc table is *initialized* by mkpart, which reads the initial
map off the /etc/partitions file; after this, the contents of
/etc/partitions are ignored, unless you rebuild the VTOC.
You can print the current contents of the on-disc partition table
with 'mkpart -ta' by the way.
The kernel, on encountering an IO error, whether soft or hard, will
automtically find a spare block (from the 'alts' partitionlet), update
the in-core bad block table, and write it back to the 'rsrvd'
partitionlet.
If you want to add manually a bad block to the list, an ioctl, used
by 'mkpart -A', allows you to trigger the mechanism yourself.
The original System V.3.2 from AT&T had a catastrophic bug, documented
in the utilities release notes, that means that adding a bad block
number manually only updates the in core kernel bad block table; the
on-disc bad block table is _not_ updated. This means that you will get
into big, big trouble, because the revectoring information will be lost
on every boot.
I seem to remember that ISC, like most other vendors, have not corrected
this bug; it may still be there in 2.01, but I don't know for sure, as
I am not familiar with ISC's latest releases.
This bug makes manual bad block assignment virtually impossible, or
dangerous (the only work around is to keep a file with the bad block
information and reinvoke 'mkpart -A' for each of them at every boot up;
woe befall you if you fail to update this file correctly).
This bug, and the automatic revectoring of soft errors mean that System
V.3.2 bad block handling is badly broken; most soft errors should not
be revectored, because they are transient (vibrations, etc...). The best
bad block handling system _logs_ IO errors, and then the system
administrator should revector blocks that either exhibit hard errors or
repeated soft errors.
As to what is giving you all those errors on the disc, I cannot help
you a lot. I am not familiar with your device. I will observe however
that many SCSI discs will, regrettably (but in a particular case
usefully), automatically revector bad blocks to give the illusion of
a defect free volume. Probably your disc/controller will allow you to
format the disc reserving a sector per track as spare.
--
Piercarlo "Peter" Grandi | ARPA: pcg%cs.aber.ac.uk@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth | UUCP: ...!mcvax!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk
pcg@odin.cs.aber.ac.uk (Piercarlo Grandi) (03/04/90)
In article <511211@nstar.UUCP> larry@nstar.UUCP (Larry Snyder) writes:
I tried adding the errors for the SCSI drive using mkpart -A <abs sector>
and I get an error - No root partition. I also tried "mkpart -A <sec> -f
/etc/partitions" - likewise the same error, no the bad blocks were not added
to /etc/partitions.
This is completely wrong. You must first build the VTOC, with 'mkpart
-p', and then the 'rsrvd' and 'alts' partitionlets.
Finally, I added the bad blocks manually to /etc/partitions, unmounted and
remounted the device (I assume that /etc/partitions is read at mount time).
This is also completely wrong. Look at the 'last accessed time' using
'ls -ltur'.
What happens with bad block handling is that the bad block table is an
array of block numbers in the 'rsrvd' partitionlet on the disc. Whwn the
disc is mounted, the table is read into memory by the driver.
The on-disc table is *initialized* by mkpart, which reads the initial
map off the /etc/partitions file; after this, the contents of
/etc/partitions are ignored, unless you rebuild the VTOC.
You can print the current contents of the on-disc partition table
with 'mkpart -ta' by the way.
The kernel, on encountering an IO error, whether soft or hard, will
automtically find a spare block (from the 'alts' partitionlet), update
the in-core bad block table, and write it back to the 'rsrvd'
partitionlet.
If you want to add manually a bad block to the list, an ioctl, used
by 'mkpart -A', allows you to trigger the mechanism yourself.
The original System V.3.2 from AT&T had a catastrophic bug, documented
in the utilities release notes, that means that adding a bad block
number manually only updates the in core kernel bad block table; the
on-disc bad block table is _not_ updated. This means that you will get
into big, big trouble, because the revectoring information will be lost
on every boot.
I seem to remember that ISC, like most other vendors, have not corrected
this bug; it may still be there in 2.01, but I don't know for sure, as
I am not familiar with ISC's latest releases.
This bug makes manual bad block assignment virtually impossible, or
dangerous (the only work around is to keep a file with the bad block
information and reinvoke 'mkpart -A' for each of them at every boot up;
woe befall you if you fail to update this file correctly).
This bug, and the automatic revectoring of soft errors mean that System
V.3.2 bad block handling is badly broken; most soft errors should not
be revectored, because they are transient (vibrations, etc...). The best
bad block handling system _logs_ IO errors, and then the system
administrator should revector blocks that either exhibit hard errors or
repeated soft errors.
As to what is giving you all those errors on the disc, I cannot help
you a lot. I am not familiar with your device. I will observe however
that many SCSI discs will, regrettably (but in a particular case
usefully), automatically revector bad blocks to give the illusion of
a defect free volume. Probably your disc/controller will allow you to
format the disc reserving a sector per track as spare.
--
Piercarlo "Peter" Grandi | ARPA: pcg%cs.aber.ac.uk@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth | UUCP: ...!mcvax!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk
darryl@ism780c.isc.com (Darryl Richman) (03/06/90)
There seems to be much confusion as to how mkpart, /etc/partitions, and bad block handling occur on 386/ix. There is nothing magical about it and the whole system is rather manual in nature, so that nothing really happens unless the user drives it. When the disk driver first opens a disk, it reads in the pdinfo, vtoc, and if indicated by the alts_ptr field, an alternates table. The alternates indicated are then maintained in an incore table by the driver, until either a last close occurs or a V_REMOUNT ioctl is successful. (V_REMOUNT can only be successful if no other partitions on the disk are open.) So in general, remaps do not occur right away, but rather after the next reboot. The contents of /etc/partitions are never used except by mkpart. Mkpart strictly manipulates the on-disk structures such as the pdinfo, vtoc, and alternates. It does attempt to perform a V_REMOUNT afterwards, but this will fail on a normally running system. (If this were not prevented, imagine what would happen to anything using the disk if things like the vtoc were to change at a random point: partitions might change size or location, or even disappear.) By running mkpart -A, you update the on-disk structure. When you next perform a first open (e.g., reboot), these bad sectors are then remapped. This is a better approach than automatically remapping because that prevents the user from attempting to reread the bad sector and possibly retrieving the data. AT&T had originally expected to use the V_ADDBAD ioctl to update the driver's list of bad blocks in memory, but the V.3.2.0 version, on which 386/ix 2.0.2 is based, did not include this functionality. (I believe that they have added it in a later release.) The upcoming 2.2 release of 386/ix will have the V_ADDBAD capability as well as a new alternate sectoring scheme that does not rely on fixed table sizes (that have proven to be too small). --Darryl Richman -- Copyright (c) 1990 Darryl Richman The views expressed are the author's alone darryl@ism780c.isc.com INTERACTIVE Systems Corp.-A Kodak Company "For every problem, there is a solution that is simple, elegant, and wrong." -- H. L. Mencken
keithe@tekgvs.LABS.TEK.COM (Keith Ericson) (03/06/90)
In article <511211@nstar.UUCP> larry@nstar.UUCP (Larry Snyder) writes: > I tried adding the errors for the SCSI drive using mkpart -A <abs sector> > and I get an error - No root partition... Bad blocks? On a _SCSI_ drive? The SCSI drives I've tried (all were various CDC/Imprimis) all handle the bad blocks internally and present an error-free appearance to the "outside world" (i.e., the SCSI bus). Did I just get lucky, or did Larry get stuck with a less-than-wonderful drive of some kind? kEITHe
cpcahil@virtech.uucp (Conor P. Cahill) (03/06/90)
In article <39560@ism780c.isc.com> darryl@ism780c.UUCP (Darryl Richman) writes: >This is a better approach than automatically remapping because that >prevents the user from attempting to reread the bad sector and possibly >retrieving the data. PLEASE do not add automatic remapping of bad blocks to 386/ix. Once upon a time I happened to be using Bell Technologies System V Rel 3.2 (which had automatic re-mapping) and a static charge caused the disk controller to report a bad read for any blocks that were currently in process of being read/written. This wrecked havoc on the file system when portions of the root, etc, and usr directories were automatically remapped. Even portions of several executables were hit. This is a real pain in the a**. The capability to add entries to the bad block table while running is a good capability, but PLEASE DO NOT MAKE IT AN AUTOMATIC PART OF THE DISK DRIVER. -- Conor P. Cahill (703)430-9247 Virtual Technologies, Inc., uunet!virtech!cpcahil 46030 Manekin Plaza, Suite 160 Sterling, VA 22170
root@maxed.amg.com (0000-Admin(0000)) (03/07/90)
In article <7027@tekgvs.LABS.TEK.COM> keithe@tekgvs.LABS.TEK.COM (Keith Ericson) writes: >In article <511211@nstar.UUCP> larry@nstar.UUCP (Larry Snyder) writes: > >> I tried adding the errors for the SCSI drive using mkpart -A <abs sector> >> and I get an error - No root partition... > >Bad blocks? On a _SCSI_ drive? The SCSI drives I've tried (all were various >CDC/Imprimis) all handle the bad blocks internally and present an error-free >appearance to the "outside world" (i.e., the SCSI bus). > >Did I just get lucky, or did Larry get stuck with a less-than-wonderful drive >of some kind? The SCSI is inherently bad-block free to the op. sys., and his problem is not what it appears to be. -- Ed Whittemore uunet!maxed!ed American Micro Group 201 944 3293
darryl@ism780c.isc.com (Darryl Richman) (03/10/90)
In article <1990Mar6.141232.668@virtech.uucp> cpcahil@virtech.UUCP (Conor P. Cahill) writes: "In article <39560@ism780c.isc.com> darryl@ism780c.UUCP (Darryl Richman) writes: ">This is a better approach than automatically remapping because that ">prevents the user from attempting to reread the bad sector and possibly ">retrieving the data. " "PLEASE do not add automatic remapping of bad blocks to 386/ix. Once upon a Clearly, I didn't make myself understood. I merely mentioned this possibility in passing; we have no intention of doing it. Sorry. --Darryl Richman -- Copyright (c) 1990 Darryl Richman The views expressed are the author's alone darryl@ism780c.isc.com INTERACTIVE Systems Corp.-A Kodak Company "For every problem, there is a solution that is simple, elegant, and wrong." -- H. L. Mencken