[comp.unix.i386] adding bad blocks using 386/ix

larry@nstar.UUCP (Larry Snyder) (02/28/90)

I recently installed a 1542 as a secondary controller along with
a 2372.  I'm booting off the 2372, mounting /usr off the 1542,
and then mounting /usr2 of the 1542.

I've been getting numerous errors on the SCSI drive.  When installing
the drive, I did a low level format using the 1542 BIOS and ran the
complete DMA check for 2 hours (which didn't produce a single error).

I tried adding the errors for the SCSI drive using mkpart -A <abs sector>
and I get an error - No root partition.  I also tried "mkpart -A <sec> -f
/etc/partitions" - likewise the same error, no the bad blocks were not added
to /etc/partitions.  

Finally, I added the bad blocks manually to /etc/partitions, unmounted and
remounted the device (I assume that /etc/partitions is read at mount time).

The errors that I am still getting are "illegal or erroneous command, absolute
sector 1364708". 

My DMA was running at SYCLK and I changed it to SYSCLK/2 (I have the AMI
bios).  The 1542 is jumpered in the default configuration with the exception
of the floppies disabled.

Any ideas out there?
 

 
-- 
          Larry Snyder, Northern Star Communications, Notre Dame, IN USA 
                uucp: larry@nstar -or- ...!iuvax!ndmath!nstar!larry
               4 inbound dialup high speed line public access system

pcg@odin.cs.aber.ac.uk (Piercarlo Grandi) (03/04/90)

In article <511211@nstar.UUCP> larry@nstar.UUCP (Larry Snyder) writes:

   I tried adding the errors for the SCSI drive using mkpart -A <abs sector>
   and I get an error - No root partition.  I also tried "mkpart -A <sec> -f
   /etc/partitions" - likewise the same error, no the bad blocks were not added
   to /etc/partitions.  

This is completely wrong. You must first build the VTOC, with 'mkpart
-p', and then the 'rsrvd' and 'alts' partitionlets.

   Finally, I added the bad blocks manually to /etc/partitions, unmounted and
   remounted the device (I assume that /etc/partitions is read at mount time).

This is also completely wrong. Look at the 'last accessed time' using
'ls -ltur'.

What happens with bad block handling is that the bad block table is an
array of block numbers in the 'rsrvd' partitionlet on the disc. Whwn the
disc is mounted, the table is read into memory by the driver.

The on-disc table is *initialized* by mkpart, which reads the initial
map off the /etc/partitions file; after this, the contents of
/etc/partitions are ignored, unless you rebuild the VTOC.

You can print the current contents of the on-disc partition table
with 'mkpart -ta' by the way.

The kernel, on encountering an IO error, whether soft or hard, will
automtically find a spare block (from the 'alts' partitionlet), update
the in-core bad block table, and write it back to the 'rsrvd'
partitionlet.

If you want to add manually a bad block to the list, an ioctl, used
by 'mkpart -A', allows you to trigger the mechanism yourself.

The original System V.3.2 from AT&T had a catastrophic bug, documented
in the utilities release notes, that means that adding a bad block
number manually only updates the in core kernel bad block table; the
on-disc bad block table is _not_ updated. This means that you will get
into big, big trouble, because the revectoring information will be lost
on every boot.

I seem to remember that ISC, like most other vendors, have not corrected
this bug; it may still be there in 2.01, but I don't know for sure, as 
I am not familiar with ISC's latest releases.

This bug makes manual bad block assignment virtually impossible, or
dangerous (the only work around is to keep a file with the bad block
information and reinvoke 'mkpart -A' for each of them at every boot up;
woe befall you if you fail to update this file correctly).

This bug, and the automatic revectoring of soft errors mean that System
V.3.2 bad block handling is badly broken; most soft errors should not
be revectored, because they are transient (vibrations, etc...). The best
bad block handling system _logs_ IO errors, and then the system
administrator should revector blocks that either exhibit hard errors or
repeated soft errors.


As to what is giving you all those errors on the disc, I cannot help
you a lot. I am not familiar with your device. I will observe however
that many SCSI discs will, regrettably (but in a particular case
usefully), automatically revector bad blocks to give the illusion of
a defect free volume. Probably your disc/controller will allow you to
format the disc reserving a sector per track as spare.
--
Piercarlo "Peter" Grandi           | ARPA: pcg%cs.aber.ac.uk@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth        | UUCP: ...!mcvax!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk

pcg@odin.cs.aber.ac.uk (Piercarlo Grandi) (03/04/90)

In article <511211@nstar.UUCP> larry@nstar.UUCP (Larry Snyder) writes:

  I tried adding the errors for the SCSI drive using mkpart -A <abs sector>
  and I get an error - No root partition.  I also tried "mkpart -A <sec> -f
  /etc/partitions" - likewise the same error, no the bad blocks were not added
  to /etc/partitions.  

This is completely wrong. You must first build the VTOC, with 'mkpart
-p', and then the 'rsrvd' and 'alts' partitionlets.

  Finally, I added the bad blocks manually to /etc/partitions, unmounted and
  remounted the device (I assume that /etc/partitions is read at mount time).

This is also completely wrong. Look at the 'last accessed time' using
'ls -ltur'.

What happens with bad block handling is that the bad block table is an
array of block numbers in the 'rsrvd' partitionlet on the disc. Whwn the
disc is mounted, the table is read into memory by the driver.

The on-disc table is *initialized* by mkpart, which reads the initial
map off the /etc/partitions file; after this, the contents of
/etc/partitions are ignored, unless you rebuild the VTOC.

You can print the current contents of the on-disc partition table
with 'mkpart -ta' by the way.

The kernel, on encountering an IO error, whether soft or hard, will
automtically find a spare block (from the 'alts' partitionlet), update
the in-core bad block table, and write it back to the 'rsrvd'
partitionlet.

If you want to add manually a bad block to the list, an ioctl, used
by 'mkpart -A', allows you to trigger the mechanism yourself.

The original System V.3.2 from AT&T had a catastrophic bug, documented
in the utilities release notes, that means that adding a bad block
number manually only updates the in core kernel bad block table; the
on-disc bad block table is _not_ updated. This means that you will get
into big, big trouble, because the revectoring information will be lost
on every boot.

I seem to remember that ISC, like most other vendors, have not corrected
this bug; it may still be there in 2.01, but I don't know for sure, as 
I am not familiar with ISC's latest releases.

This bug makes manual bad block assignment virtually impossible, or
dangerous (the only work around is to keep a file with the bad block
information and reinvoke 'mkpart -A' for each of them at every boot up;
woe befall you if you fail to update this file correctly).

This bug, and the automatic revectoring of soft errors mean that System
V.3.2 bad block handling is badly broken; most soft errors should not
be revectored, because they are transient (vibrations, etc...). The best
bad block handling system _logs_ IO errors, and then the system
administrator should revector blocks that either exhibit hard errors or
repeated soft errors.


As to what is giving you all those errors on the disc, I cannot help
you a lot. I am not familiar with your device. I will observe however
that many SCSI discs will, regrettably (but in a particular case
usefully), automatically revector bad blocks to give the illusion of
a defect free volume. Probably your disc/controller will allow you to
format the disc reserving a sector per track as spare.
--
Piercarlo "Peter" Grandi           | ARPA: pcg%cs.aber.ac.uk@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth        | UUCP: ...!mcvax!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk

darryl@ism780c.isc.com (Darryl Richman) (03/06/90)

There seems to be much confusion as to how mkpart, /etc/partitions, and
bad block handling occur on 386/ix.  There is nothing magical about it
and the whole system is rather manual in nature, so that nothing really
happens unless the user drives it.

When the disk driver first opens a disk, it reads in the pdinfo, vtoc,
and if indicated by the alts_ptr field, an alternates table.  The
alternates indicated are then maintained in an incore table by the
driver, until either a last close occurs or a V_REMOUNT ioctl is
successful.  (V_REMOUNT can only be successful if no other partitions
on the disk are open.) So in general, remaps do not occur right away,
but rather after the next reboot.

The contents of /etc/partitions are never used except by mkpart.

Mkpart strictly manipulates the on-disk structures such as the pdinfo,
vtoc, and alternates.  It does attempt to perform a V_REMOUNT
afterwards, but this will fail on a normally running system.  (If this
were not prevented, imagine what would happen to anything using the
disk if things like the vtoc were to change at a random point:
partitions might change size or location, or even disappear.)  By
running mkpart -A, you update the on-disk structure.  When you next
perform a first open (e.g., reboot), these bad sectors are then
remapped.

This is a better approach than automatically remapping because that
prevents the user from attempting to reread the bad sector and possibly
retrieving the data.

AT&T had originally expected to use the V_ADDBAD ioctl to update the
driver's list of bad blocks in memory, but the V.3.2.0 version, on which
386/ix 2.0.2 is based, did not include this functionality.  (I believe
that they have added it in a later release.)  The upcoming 2.2 release
of 386/ix will have the V_ADDBAD capability as well as a new alternate
sectoring scheme that does not rely on fixed table sizes (that have
proven to be too small).

		--Darryl Richman

-- 
Copyright (c) 1990 Darryl Richman    The views expressed are the author's alone
darryl@ism780c.isc.com 		      INTERACTIVE Systems Corp.-A Kodak Company
 "For every problem, there is a solution that is simple, elegant, and wrong."
	-- H. L. Mencken

keithe@tekgvs.LABS.TEK.COM (Keith Ericson) (03/06/90)

In article <511211@nstar.UUCP> larry@nstar.UUCP (Larry Snyder) writes:

>   I tried adding the errors for the SCSI drive using mkpart -A <abs sector>
>   and I get an error - No root partition...

Bad blocks?  On a _SCSI_ drive?  The SCSI drives I've tried (all were various
CDC/Imprimis) all handle the bad blocks internally and present an error-free
appearance to the "outside world" (i.e., the SCSI bus).

Did I just get lucky, or did Larry get stuck with a less-than-wonderful drive
of some kind?

kEITHe

cpcahil@virtech.uucp (Conor P. Cahill) (03/06/90)

In article <39560@ism780c.isc.com> darryl@ism780c.UUCP (Darryl Richman) writes:
>This is a better approach than automatically remapping because that
>prevents the user from attempting to reread the bad sector and possibly
>retrieving the data.

PLEASE do not add automatic remapping of bad blocks to 386/ix.  Once upon a
time I happened to be using Bell Technologies System V Rel 3.2 (which had
automatic re-mapping) and a static charge caused the disk controller to 
report a bad read for any blocks that were currently in process of
being read/written.  This wrecked havoc on the file system when portions
of the root, etc, and usr directories were automatically remapped.
Even portions of several executables were hit.

This is a real pain in the a**.  The capability to add entries to the
bad block table while running is a good capability, but PLEASE DO NOT
MAKE IT AN AUTOMATIC PART OF THE DISK DRIVER.


-- 
Conor P. Cahill            (703)430-9247        Virtual Technologies, Inc.,
uunet!virtech!cpcahil                           46030 Manekin Plaza, Suite 160
                                                Sterling, VA 22170 

root@maxed.amg.com (0000-Admin(0000)) (03/07/90)

In article <7027@tekgvs.LABS.TEK.COM> keithe@tekgvs.LABS.TEK.COM (Keith Ericson) writes:
>In article <511211@nstar.UUCP> larry@nstar.UUCP (Larry Snyder) writes:
>
>>   I tried adding the errors for the SCSI drive using mkpart -A <abs sector>
>>   and I get an error - No root partition...
>
>Bad blocks?  On a _SCSI_ drive?  The SCSI drives I've tried (all were various
>CDC/Imprimis) all handle the bad blocks internally and present an error-free
>appearance to the "outside world" (i.e., the SCSI bus).
>
>Did I just get lucky, or did Larry get stuck with a less-than-wonderful drive
>of some kind?

The SCSI is inherently bad-block free to the op. sys., and his problem
is not what it appears to be.
-- 
 Ed Whittemore 		uunet!maxed!ed
 American Micro Group 		201 944 3293

darryl@ism780c.isc.com (Darryl Richman) (03/10/90)

In article <1990Mar6.141232.668@virtech.uucp> cpcahil@virtech.UUCP (Conor P. Cahill) writes:
"In article <39560@ism780c.isc.com> darryl@ism780c.UUCP (Darryl Richman) writes:
">This is a better approach than automatically remapping because that
">prevents the user from attempting to reread the bad sector and possibly
">retrieving the data.
"
"PLEASE do not add automatic remapping of bad blocks to 386/ix.  Once upon a

Clearly, I didn't make myself understood.  I merely mentioned this possibility
in passing;  we have no intention of doing it.  Sorry.

		--Darryl Richman
-- 
Copyright (c) 1990 Darryl Richman    The views expressed are the author's alone
darryl@ism780c.isc.com 		      INTERACTIVE Systems Corp.-A Kodak Company
 "For every problem, there is a solution that is simple, elegant, and wrong."
	-- H. L. Mencken