[comp.sys.3b1] Hard Disk Repair Tricks on a Two-Drive System

vince@tc.fluke.COM (Craig Johnson) (03/26/91)

This posting covers several tricks I came up with while trying to repair
a 20M Miniscribe hard disk by connecting it to a two-drive 3B1 with the
bad disk set up as the second drive.  Most or all of these tricks can be
applied to hard disk repairs on a single-drive system if you boot from a
floppy file system and have the appropriate utilities available on it.

Remember to Set The Drive Select Jumpers
----------------------------------------
   When using a two-drive system for implementing hard disk repairs
   don't forget to change the drive select jumpers or DIP switches to
   specify the bad drive as the second drive.

Partition Assignments
---------------------
   Those of you with two-drive systems should already know this, but
   the bad drive will be seen residing at the /dev entries [r]fp010 -
   [r]fp012.  These devices can be accessed as follows,

	rfp010  - Use with iv commands to do formats, surface checks, etc.
	rfp011	- Can build tmp file system here, eg. "mkfs /dev/rfp011"
	rfp012	- Use with fsck, fsdb, and ncheck
	fp010	- (not used)
	fp011	- Use to mount tmp file system, eg. "mount /dev/fp011 /mnta"
	fp012	- Use to mount the broken file system and recover files

Mounting the Bad Disk to Access its Binarys
-------------------------------------------
   You may find you want to execute programs still on the bad hard disk,
   particularly if you are trying to repair a disk on a single-drive
   system by booting up from a floppy disk.  If you mount the disk and
   are careful to only read from it, you should not do any additional
   damage, and most of the files should still be readable unless the
   disk is really in bad shape.  You may want to execute something like,

	PATH=/mntb/usr/bin:$PATH

   to add the hard disk binary directories to your PATH.

Disk Error Logging
------------------
   When accessing the second drive through shell commands (and iv in
   particular) disk errors are logged to /usr/adm/unix.log.  Look for bit
   0x0008 in the "SDH:" value to see if errors are associated with drive 0
   or drive 1.  I found this information invaluable and I know no way to
   get the same information when exercising the disk using the diagnostic
   disk commands.

Formatting the Second Drive
---------------------------
   Iv knows how to format the second drive, and even says "formatting
   second drive" when doing it.  However, the disk drive descriptor file
   is required to have "HD2" specified on the "type" line.  For example,
   for the 20M Miniscribe 3425,

	type            HD2
	name            WINCHE
	cylinders       612
	heads           4
	sectors         17
	steprate        0
	$
	badblocktable   1
	loader          /usr/lib/iv/loader
	$
	$
	0
	4
	504
	$
	$

Reformatting Just the Swap Partition
------------------------------------
   If you have a disk whose swap partition has gone bad, but is otherwise
   in good shape, you can chose to just reformat the swap partition.  This
   takes a couple of steps, because you are going to fool iv into thinking
   the disk is smaller than it really is while it is formatting.  Start by
   obtaining the disk descriptor using "iv -d".  You get a descriptor that
   looks similar to the one above, but perhaps with some bad block
   entries.  Next, figure out how many cylinders are actually used up
   through the end of the swap partition.  For the example above, 504
   tracks divided by 4 tracks per cylinder equals 126 cylinders.  Edit the
   descriptor so that the smaller number of cylinders is specified and so
   that only two partitions are defined.  I chose to decrement my cylinder
   count by one just in case there was boundary condition bug in my (or
   iv's) reasoning (this simple results in one cylinder not being
   reformatted (it still has its original format), which shouldn't be a
   problem unless you customarily use up all your swap space).  Using the
   same disk example from above,

	type            HD2
	name            WINCHE
	cylinders       125
	heads           4
	sectors         17
	steprate        0
	$
	badblocktable   1
	loader          /usr/lib/iv/loader
	$
	$
	0
	4
	$
	$

   Now when you run format, specify "iv -i /dev/rfp010 fake_desc", where
   "fake_desc" is the one you just edited.  After formatting, iv will
   rewrite the VHB and loader tracks.  If you run iv -t, you will see that
   iv now thinks the disk is smaller.  To complete the process and regain
   use of the whole disk, you run iv -u with the original descriptor
   specified (the 612 cylinder descriptor in the example).  This simply
   causes the correct descriptor to be rewritten to the VHB and that
   redefines the disk size and partitions properly.

   You can use the same method of formatting a small portion of the disk
   to just reformat the VHB and loader tracks too.  This might be useful
   if you want change your loader to be the verbose loader,
   /usr/lib/iv/s4load.verbose.

Building a Temporary File System on /dev/rfp011
-----------------------------------------------
   While the bad disk is connected as the second drive, its swap partition
   is unused.  You can reformat it as described above, then build a file
   system on it to gain several Meg. of temporary storage.  Simply run
   "mkfs /dev/rfp011", then "mount /dev/fp011 /mnta" to build the file
   system and mount it.  I found it very useful to copy junky, hard to
   read stuff from the bad partition to the temporary partition prior to
   writing it out to floppies.  Note, cpio archives beyond the point of
   the failing file are not usable if cpio fails on a read while trying
   to archive a file.  Cpio apparently commits itself with a file header
   on the output stream prior to testing to see if it can actually read
   the file, and then isn't smart enough to write out a null file body.

Surface Checks
--------------
   Once you've copied all the files you want (or can get) off the bad disk
   you can reformat the whole disk and run surface tests on it.  The
   iv command "iv -sw[l]" works great, and again any disk errors occurring
   get logged to /usr/adm/unix.log.  The -l (long) option causes the test
   to repeat 10 times.  It takes about 3 hours on a 20M disk.

DRUN Patch
----------
   As I mentioned in a previous posting, if you are getting disk errors
   appearing in unix.log after formatting a disk and running surface
   checks on it, you may need to install the DRUN rework on your system.
   This applies regardless of whether you have one or two drives, and a
   WD1010 or a WD2010 disk controller chip.  I was getting about a dozen
   disk errors each time I ran a surface check until I installed the DRUN
   rework.  Since then I haven't seen a single disk error appear in the
   log.


I hope you find this information helpful and can find it to refer to when
you need it.  I know it would have saved me some time.  Also, I'd like to
see someone post a decent tutorial on the use of fsck and fsdb to repair
bad disks.  I mean a disk with unreadable blocks, some of which might be
directory blocks.  I've found a way that didn't work.  I'd like to see
if someone knows a better way.

---
	Craig V. Johnson		...!fluke!vince
	John Fluke Mfg. Co.			or
	Everett, WA			vince@tc.fluke.com

DISCLAIMER (I supposed it's necessary): Muck with your hard disks at your
own risk.  Don't believe anything I said, and legally we will both be
happy.  Nothing stated in this posting has anything to do with John Fluke
Mfg. Co., so leave them out of it.

floyd@ims.alaska.edu (Floyd Davidson) (03/26/91)

In article <1991Mar26.074933.868@tc.fluke.COM> vince@tc.fluke.COM (Craig Johnson) writes:
>This posting covers several tricks I came up with while trying to repair
>a 20M Miniscribe hard disk by connecting it to a two-drive 3B1 with the
>bad disk set up as the second drive.  Most or all of these tricks can be
>applied to hard disk repairs on a single-drive system if you boot from a
>floppy file system and have the appropriate utilities available on it.

A very interesting article.  It should be archived on osu-cis.

I don't think Craig had that in mind when he wrote it, but maybe
he could be encouraged to give it a once over edit for just that
purpose (Craig?).

Floyd

-- 
Floyd L. Davidson  |  floyd@ims.alaska.edu   |  Alascom, Inc. pays me
Salcha, AK 99714   |    Univ. of Alaska      |  but not for opinions.

dnichols@ceilidh.beartrack.com (DoN Nichols) (03/27/91)

In article <1991Mar26.074933.868@tc.fluke.COM> vince@tc.fluke.COM (Craig Johnson) writes:
>This posting covers several tricks I came up with while trying to repair

	[ ... ]

>Mounting the Bad Disk to Access its Binarys
>-------------------------------------------
>   You may find you want to execute programs still on the bad hard disk,
>   particularly if you are trying to repair a disk on a single-drive
>   system by booting up from a floppy disk.  If you mount the disk and
>   are careful to only read from it, you should not do any additional
>   damage, and most of the files should still be readable unless the
>   disk is really in bad shape.  You may want to execute something like,

	You can absolutly prevent writing on the bad disk by mounting it
read-only:

/etc/mount /dev/fp012 /mntb -r
                            ^^
Then you don't have to worry about the system modifying the contents of the
hard disk when it trys to update last-accessed and/or last-modified times.

>
>	PATH=/mntb/usr/bin:$PATH
>
>   to add the hard disk binary directories to your PATH.
>

	[ ... ]

>   You can use the same method of formatting a small portion of the disk
>   to just reformat the VHB and loader tracks too.  This might be useful
>   if you want change your loader to be the verbose loader,
>   /usr/lib/iv/s4load.verbose.

	You can do this more simply with /etc/ldrcpy


>Building a Temporary File System on /dev/rfp011
>-----------------------------------------------
>   While the bad disk is connected as the second drive, its swap partition
>   is unused.  You can reformat it as described above, then build a file
>   system on it to gain several Meg. of temporary storage.  Simply run
>   "mkfs /dev/rfp011", then "mount /dev/fp011 /mnta" to build the file
>   system and mount it.  I found it very useful to copy junky, hard to

	Yes, I use this partition mounted as /tmp, which keeps a greedy
compile or other operation from running the root file system out of space.
(I have also formatted the drive to assign more than the default to these
partitions on both disks - I like more swap space for those BIG compiles
like groff, which still runs me out of virtual memory on one module unless I
turn off the -o option to the compiler.

>   read stuff from the bad partition to the temporary partition prior to
>   writing it out to floppies.  Note, cpio archives beyond the point of
>   the failing file are not usable if cpio fails on a read while trying
>   to archive a file.  Cpio apparently commits itself with a file header
>   on the output stream prior to testing to see if it can actually read
>   the file, and then isn't smart enough to write out a null file body.

	Gnu tar is smart enough to seek another header entry if the current
header/file are corrupted.  You can't recover the file, but you can get at
what is past it.  I think that afio will do the same for you.

>Surface Checks
>--------------
>   Once you've copied all the files you want (or can get) off the bad disk
>   you can reformat the whole disk and run surface tests on it.  The
>   iv command "iv -sw[l]" works great, and again any disk errors occurring
>   get logged to /usr/adm/unix.log.  The -l (long) option causes the test
>   to repeat 10 times.  It takes about 3 hours on a 20M disk.

	NICE!  I've always used the diagnostic floppy (improved) for
formatting hard disks.

>DRUN Patch
>----------
>   As I mentioned in a previous posting, if you are getting disk errors
>   appearing in unix.log after formatting a disk and running surface
>   checks on it, you may need to install the DRUN rework on your system.
>   This applies regardless of whether you have one or two drives, and a
>   WD1010 or a WD2010 disk controller chip.  I was getting about a dozen
>   disk errors each time I ran a surface check until I installed the DRUN
>   rework.  Since then I haven't seen a single disk error appear in the
>   log.

	Another possible source of errors of this sort, if you have
installed the ICUS mods, is that SOME 74ls02 chips don't have adequate drive
for systems, even with the DRUN patch installed.  A 74S02 or 74F02 chip will
be sure to have adequate drive.

>
>I hope you find this information helpful and can find it to refer to when
>you need it.  I know it would have saved me some time.  Also, I'd like to
>see someone post a decent tutorial on the use of fsck and fsdb to repair
>bad disks.  I mean a disk with unreadable blocks, some of which might be
>directory blocks.  I've found a way that didn't work.  I'd like to see
>if someone knows a better way.

	Thanks!  Now, if someone could tell me how to make the system do a
format on a drive which it thinks won't re-cal because track 000 is wiped
(not physically destroyed, just erased, so the system can't find sector
headers to verify that it has reached track 000 after a re-cal request. :-)

	Good luck to all
		DoN.
-- 
Donald Nichols (DoN.)		| Voice (Days):	(703) 664-1585
D&D Data			| Voice (Eves):	(703) 938-4564
Disclaimer: from here - None	| Email:     <dnichols@ceilidh.beartrack.com>
	--- Black Holes are where God is dividing by zero ---