vince@tc.fluke.COM (Craig Johnson) (03/26/91)
This posting covers several tricks I came up with while trying to repair a 20M Miniscribe hard disk by connecting it to a two-drive 3B1 with the bad disk set up as the second drive. Most or all of these tricks can be applied to hard disk repairs on a single-drive system if you boot from a floppy file system and have the appropriate utilities available on it. Remember to Set The Drive Select Jumpers ---------------------------------------- When using a two-drive system for implementing hard disk repairs don't forget to change the drive select jumpers or DIP switches to specify the bad drive as the second drive. Partition Assignments --------------------- Those of you with two-drive systems should already know this, but the bad drive will be seen residing at the /dev entries [r]fp010 - [r]fp012. These devices can be accessed as follows, rfp010 - Use with iv commands to do formats, surface checks, etc. rfp011 - Can build tmp file system here, eg. "mkfs /dev/rfp011" rfp012 - Use with fsck, fsdb, and ncheck fp010 - (not used) fp011 - Use to mount tmp file system, eg. "mount /dev/fp011 /mnta" fp012 - Use to mount the broken file system and recover files Mounting the Bad Disk to Access its Binarys ------------------------------------------- You may find you want to execute programs still on the bad hard disk, particularly if you are trying to repair a disk on a single-drive system by booting up from a floppy disk. If you mount the disk and are careful to only read from it, you should not do any additional damage, and most of the files should still be readable unless the disk is really in bad shape. You may want to execute something like, PATH=/mntb/usr/bin:$PATH to add the hard disk binary directories to your PATH. Disk Error Logging ------------------ When accessing the second drive through shell commands (and iv in particular) disk errors are logged to /usr/adm/unix.log. Look for bit 0x0008 in the "SDH:" value to see if errors are associated with drive 0 or drive 1. I found this information invaluable and I know no way to get the same information when exercising the disk using the diagnostic disk commands. Formatting the Second Drive --------------------------- Iv knows how to format the second drive, and even says "formatting second drive" when doing it. However, the disk drive descriptor file is required to have "HD2" specified on the "type" line. For example, for the 20M Miniscribe 3425, type HD2 name WINCHE cylinders 612 heads 4 sectors 17 steprate 0 $ badblocktable 1 loader /usr/lib/iv/loader $ $ 0 4 504 $ $ Reformatting Just the Swap Partition ------------------------------------ If you have a disk whose swap partition has gone bad, but is otherwise in good shape, you can chose to just reformat the swap partition. This takes a couple of steps, because you are going to fool iv into thinking the disk is smaller than it really is while it is formatting. Start by obtaining the disk descriptor using "iv -d". You get a descriptor that looks similar to the one above, but perhaps with some bad block entries. Next, figure out how many cylinders are actually used up through the end of the swap partition. For the example above, 504 tracks divided by 4 tracks per cylinder equals 126 cylinders. Edit the descriptor so that the smaller number of cylinders is specified and so that only two partitions are defined. I chose to decrement my cylinder count by one just in case there was boundary condition bug in my (or iv's) reasoning (this simple results in one cylinder not being reformatted (it still has its original format), which shouldn't be a problem unless you customarily use up all your swap space). Using the same disk example from above, type HD2 name WINCHE cylinders 125 heads 4 sectors 17 steprate 0 $ badblocktable 1 loader /usr/lib/iv/loader $ $ 0 4 $ $ Now when you run format, specify "iv -i /dev/rfp010 fake_desc", where "fake_desc" is the one you just edited. After formatting, iv will rewrite the VHB and loader tracks. If you run iv -t, you will see that iv now thinks the disk is smaller. To complete the process and regain use of the whole disk, you run iv -u with the original descriptor specified (the 612 cylinder descriptor in the example). This simply causes the correct descriptor to be rewritten to the VHB and that redefines the disk size and partitions properly. You can use the same method of formatting a small portion of the disk to just reformat the VHB and loader tracks too. This might be useful if you want change your loader to be the verbose loader, /usr/lib/iv/s4load.verbose. Building a Temporary File System on /dev/rfp011 ----------------------------------------------- While the bad disk is connected as the second drive, its swap partition is unused. You can reformat it as described above, then build a file system on it to gain several Meg. of temporary storage. Simply run "mkfs /dev/rfp011", then "mount /dev/fp011 /mnta" to build the file system and mount it. I found it very useful to copy junky, hard to read stuff from the bad partition to the temporary partition prior to writing it out to floppies. Note, cpio archives beyond the point of the failing file are not usable if cpio fails on a read while trying to archive a file. Cpio apparently commits itself with a file header on the output stream prior to testing to see if it can actually read the file, and then isn't smart enough to write out a null file body. Surface Checks -------------- Once you've copied all the files you want (or can get) off the bad disk you can reformat the whole disk and run surface tests on it. The iv command "iv -sw[l]" works great, and again any disk errors occurring get logged to /usr/adm/unix.log. The -l (long) option causes the test to repeat 10 times. It takes about 3 hours on a 20M disk. DRUN Patch ---------- As I mentioned in a previous posting, if you are getting disk errors appearing in unix.log after formatting a disk and running surface checks on it, you may need to install the DRUN rework on your system. This applies regardless of whether you have one or two drives, and a WD1010 or a WD2010 disk controller chip. I was getting about a dozen disk errors each time I ran a surface check until I installed the DRUN rework. Since then I haven't seen a single disk error appear in the log. I hope you find this information helpful and can find it to refer to when you need it. I know it would have saved me some time. Also, I'd like to see someone post a decent tutorial on the use of fsck and fsdb to repair bad disks. I mean a disk with unreadable blocks, some of which might be directory blocks. I've found a way that didn't work. I'd like to see if someone knows a better way. --- Craig V. Johnson ...!fluke!vince John Fluke Mfg. Co. or Everett, WA vince@tc.fluke.com DISCLAIMER (I supposed it's necessary): Muck with your hard disks at your own risk. Don't believe anything I said, and legally we will both be happy. Nothing stated in this posting has anything to do with John Fluke Mfg. Co., so leave them out of it.
floyd@ims.alaska.edu (Floyd Davidson) (03/26/91)
In article <1991Mar26.074933.868@tc.fluke.COM> vince@tc.fluke.COM (Craig Johnson) writes: >This posting covers several tricks I came up with while trying to repair >a 20M Miniscribe hard disk by connecting it to a two-drive 3B1 with the >bad disk set up as the second drive. Most or all of these tricks can be >applied to hard disk repairs on a single-drive system if you boot from a >floppy file system and have the appropriate utilities available on it. A very interesting article. It should be archived on osu-cis. I don't think Craig had that in mind when he wrote it, but maybe he could be encouraged to give it a once over edit for just that purpose (Craig?). Floyd -- Floyd L. Davidson | floyd@ims.alaska.edu | Alascom, Inc. pays me Salcha, AK 99714 | Univ. of Alaska | but not for opinions.
dnichols@ceilidh.beartrack.com (DoN Nichols) (03/27/91)
In article <1991Mar26.074933.868@tc.fluke.COM> vince@tc.fluke.COM (Craig Johnson) writes: >This posting covers several tricks I came up with while trying to repair [ ... ] >Mounting the Bad Disk to Access its Binarys >------------------------------------------- > You may find you want to execute programs still on the bad hard disk, > particularly if you are trying to repair a disk on a single-drive > system by booting up from a floppy disk. If you mount the disk and > are careful to only read from it, you should not do any additional > damage, and most of the files should still be readable unless the > disk is really in bad shape. You may want to execute something like, You can absolutly prevent writing on the bad disk by mounting it read-only: /etc/mount /dev/fp012 /mntb -r ^^ Then you don't have to worry about the system modifying the contents of the hard disk when it trys to update last-accessed and/or last-modified times. > > PATH=/mntb/usr/bin:$PATH > > to add the hard disk binary directories to your PATH. > [ ... ] > You can use the same method of formatting a small portion of the disk > to just reformat the VHB and loader tracks too. This might be useful > if you want change your loader to be the verbose loader, > /usr/lib/iv/s4load.verbose. You can do this more simply with /etc/ldrcpy >Building a Temporary File System on /dev/rfp011 >----------------------------------------------- > While the bad disk is connected as the second drive, its swap partition > is unused. You can reformat it as described above, then build a file > system on it to gain several Meg. of temporary storage. Simply run > "mkfs /dev/rfp011", then "mount /dev/fp011 /mnta" to build the file > system and mount it. I found it very useful to copy junky, hard to Yes, I use this partition mounted as /tmp, which keeps a greedy compile or other operation from running the root file system out of space. (I have also formatted the drive to assign more than the default to these partitions on both disks - I like more swap space for those BIG compiles like groff, which still runs me out of virtual memory on one module unless I turn off the -o option to the compiler. > read stuff from the bad partition to the temporary partition prior to > writing it out to floppies. Note, cpio archives beyond the point of > the failing file are not usable if cpio fails on a read while trying > to archive a file. Cpio apparently commits itself with a file header > on the output stream prior to testing to see if it can actually read > the file, and then isn't smart enough to write out a null file body. Gnu tar is smart enough to seek another header entry if the current header/file are corrupted. You can't recover the file, but you can get at what is past it. I think that afio will do the same for you. >Surface Checks >-------------- > Once you've copied all the files you want (or can get) off the bad disk > you can reformat the whole disk and run surface tests on it. The > iv command "iv -sw[l]" works great, and again any disk errors occurring > get logged to /usr/adm/unix.log. The -l (long) option causes the test > to repeat 10 times. It takes about 3 hours on a 20M disk. NICE! I've always used the diagnostic floppy (improved) for formatting hard disks. >DRUN Patch >---------- > As I mentioned in a previous posting, if you are getting disk errors > appearing in unix.log after formatting a disk and running surface > checks on it, you may need to install the DRUN rework on your system. > This applies regardless of whether you have one or two drives, and a > WD1010 or a WD2010 disk controller chip. I was getting about a dozen > disk errors each time I ran a surface check until I installed the DRUN > rework. Since then I haven't seen a single disk error appear in the > log. Another possible source of errors of this sort, if you have installed the ICUS mods, is that SOME 74ls02 chips don't have adequate drive for systems, even with the DRUN patch installed. A 74S02 or 74F02 chip will be sure to have adequate drive. > >I hope you find this information helpful and can find it to refer to when >you need it. I know it would have saved me some time. Also, I'd like to >see someone post a decent tutorial on the use of fsck and fsdb to repair >bad disks. I mean a disk with unreadable blocks, some of which might be >directory blocks. I've found a way that didn't work. I'd like to see >if someone knows a better way. Thanks! Now, if someone could tell me how to make the system do a format on a drive which it thinks won't re-cal because track 000 is wiped (not physically destroyed, just erased, so the system can't find sector headers to verify that it has reached track 000 after a re-cal request. :-) Good luck to all DoN. -- Donald Nichols (DoN.) | Voice (Days): (703) 664-1585 D&D Data | Voice (Eves): (703) 938-4564 Disclaimer: from here - None | Email: <dnichols@ceilidh.beartrack.com> --- Black Holes are where God is dividing by zero ---