john@spirit.UUCP (John F. Godfrey) (02/06/90)
Early last week I installed Fixdisk 2.0 on spirit (3.5mb RAM and a 67mb ST-4096). I have a DOS-73 installed as well. Shortly after installation I received the panic message which will follow. It took three reboots to get me up and running again, and shortly thereafter, it paniced again. I removed the fixdisk and everything ran (and has been running) fine, since. Here is the panic message: ---------------------------------------------------------------------- #WD1010 ST=/Sekg/Err/ EF=/Id?/ cy=710. sc=14. hd=7. dr#=0. MCR2:0x0 #HDERR ST:51 EF:10 CL:C6 CH:2 SN:E SC:2 SDH:27 DMACNT:FFFF DCRREG:9F MCCREG:8300 panic: Hard disk timeout ---------------------------------------------------------------------- Anyone have any idea what this is from and what, if anything, I can do to correct it? Thanks, John -- John F. Godfrey, Pastor "Jesus said to him, 'I am the way and the truth and the life. No man comes to the Father except through Me'" (John 14:6). PHONE: 1-(616)-896-8309 NET ADDRESS: john@spirit.UUCP ..sharkey!spirit!john ATTMAIL: attmail!spirit!john US DOMAIN spirit.grle.mi.us
jcm@mtune.ATT.COM (John McMillan) (02/07/90)
In article <111@spirit.UUCP> john@spirit.UUCP (John F. Godfrey) writes: >Early last week I installed Fixdisk 2.0 on spirit (3.5mb RAM and a >67mb ST-4096). I have a DOS-73 installed as well. Shortly after >installation I received the panic message which will follow. It took >three reboots to get me up and running again, and shortly thereafter, >it paniced again. I removed the fixdisk and everything ran (and has >been running) fine, since. > >Here is the panic message: >---------------------------------------------------------------------- >#WD1010 ST=/Sekg/Err/ EF=/Id?/ cy=710. sc=14. hd=7. dr#=0. MCR2:0x0 >#HDERR ST:51 EF:10 CL:C6 CH:2 SN:E SC:2 SDH:27 DMACNT:FFFF DCRREG:9F MCCREG:8300 > >panic: Hard disk timeout >---------------------------------------------------------------------- > >Anyone have any idea what this is from and what, if anything, I can do to >correct it? : It could not find a sector-id on the disk. The only ways this can be "fixed" are to re-write the sector-id -- -- aka re-FORMAT the disk -- or enter the sector in the bad-block list. It looks like bad karma: in loading the FixDisk you finally USED this bad sector. If you test this sector with the OLD kernel you should get some comparable message/warning: dd < /dev/rfp002 > /dev/null ST=/Sekg/Err/ -- While Seeking [a sector] an Error occurred. EF=/Id?/ -- An Error occurred: Missing [sector] Id. There's no reason the new kernel should exacerbate such problems, just a coincidence -- I hope 8-] john mcmillan -- att!mtune!jcm
pat@rwing.UUCP (Pat Myrto) (02/08/90)
In article <111@spirit.UUCP>, john@spirit.UUCP (John F. Godfrey) writes: > ... [ edited to reduce length ] ... > Early last week I installed Fixdisk 2.0 on spirit ... [with a] 67mb > ST-4096 and DOS-73... Shortly after installation I received the panic > message which will follow... [after reboot] ... it paniced again. > > Here is the panic message: > ---------------------------------------------------------------------- > #WD1010 ST=/Sekg/Err/ EF=/Id?/ cy=710. sc=14. hd=7. dr#=0. MCR2:0x0 > #HDERR ST:51 EF:10 CL:C6 CH:2 SN:E SC:2 SDH:27 DMACNT:FFFF DCRREG:9F > MCCREG:8300 > > panic: Hard disk timeout > ---------------------------------------------------------------------- It's hard to say - I have seen that sort of panic before, but only once. It sounds like the drive wasn't seeking - like the seek mech was jammed, or something. I had it happen with a ST 251, and rebooting didn't help, till the power was cycled - from the sounds it made, that sort of "kicked" it loose. It is possible your problems are of a similar nature. Even with you changing back to the old kernel and things appearing to be fixed due to this, its still possible that it was a coincidence, the operations, reboot cycles, etc that got done when you restored the old kernel was what restored sanity. I have also installed the new fixdisk, and it has been running fine for over a week, till today, where I got a "kernel parity" panic. I didn't copy the message down, but it mentioned a disk parity error (though nothing was in unix.log). I am convinced that occasionally things such as this do happen. If it happens again, with the same problem, then I will be concerned. Obviously things are running fine now, as the involved system is the one I am typing this prose on. Once I had an entry in unix.log appear where it couldn't read head 0, sector 0 cylinder 0, and bailed out with a "drive not ready" error - if for real, a very grave symptom. However, this was months ago, and after rebooting, it hasn't happened since. I did selectively installed the fixdisk, instead of using the provided Install script (because some stuff in the FIXDISK pkg I don't use anymore, and because I tend to be leery of Install scripts in general, especially ones that do such sweeping things as the FIXDISK one must do). Following is what *I* would do, if I were in the same situation. I probably am going into excess detail, but in this case that might be preferable than assuming too much. The procedure I used for installing the FIXDISK worked for me, and this is being written in good faith, but since I have no control over how this will be read or interpreted, *YOU ARE ON YOUR OWN*. NO CLAIMS ARE MADE AS TO THIS BEING CORRECT OR BEING FREE OF LOGICAL OR TYPOGRAPHICAL ERRORS, OR BEING WITHOUT CRITICAL OMISSIONS. Before writing off the FIXDISK2.0, I would suggest re-trying the FIXDISK (a different copy of it, if it was a downloaded copy), and installing it BY HAND, rather than using the Install script - this allows one to selectively install fixes, and to do it in stages, as I suggest below, starting with the kernel, which provides most of the major fixes, other than the uucico (uucico not being relevant if HDB is installed), and the fix for the occasional corrupted /etc/utmp file. I suggest you try unarchiving the fixdisk into a work subdir, (its a cpio archive, and assuming FIXDISK2.0+IN is in the parent subdir, the command ``cpio -iBcdm <../FIXDISK20+IN'' run as root, into an empty subdir will extract the contents, preserving the original dates, perms, and ownership of the files). If its on the floppies, replace the "../FIXDISK2.0+IN" with "/dev/rfp021". In the subdir 'kernel', unpack the kernel file (`` unpack UNIX3.51m'') and then copy the new kernel to /UNIX3.51m. Verify the permissions are at least 754, owner/group root/sys (depending on how things are set up, you may need to have world read perms on the kernel). Follow with ``mv /unix /unix.old'', (to preserve the old kernel, in case the UNIX3.5? link isn't there) and then do ``ln /UNIX3.51m /unix''. Once the above steps are done and checked for correctness, do a normal shutdown and reboot. If the system comes up OK, and gets past the time interval where you originally experienced the problems, then I would try replacing /etc/lddrv/wind.o, /etc/init, /bin/login, and /bin/getty, etc., MANUALLY, BY HAND, with the files provided in the kernel, utmp, subdirs, preserving the original versions as /bin/login.old, /etc/lddrv/wind.o.old, etc. You can inspect the Install script for the proper permissions and owner/group to use on each file (most will be owner=bin, group=bin). Be sure that after the new init is copied in, to rm /bin/telinit and then do ``ln /bin/telinit /etc/init'' (some stuff does look for /bin/telinit, even possibly during reboot sequence). After verifying everything is right, again doing the shutdown and reboot. If the panics happen again, I have no suggestions. Perhaps someone can answer - does 3.51 require a new format on the drive that had previously been formatted with, say, 3.0 or 3.5? As I said, your mileage may vary, but good luck - just proceed slowly and carefully. -- pat@rwing (Pat Myrto), Seattle, WA ...!uunet!pilchuck!rwing!pat ...!uw-beaver!uw-entropy!dataio!/ WISDOM: "Travelling unarmed is like boating without a life jacket"
wtm@neoucom.UUCP (Bill Mayhew) (02/10/90)
After having read Frank's article, I thought I'd try dd </dev/fp002 >/dev/null to see if I might be able to catch some bad blocks on my disk (yes, I know, I could use the diagnostics disk). I ran the command in the background as root, and then returned to ksh to do some other work such as reading mail. A few minutes later, I got a page fault from the 3.51m kernel. I wasn't doing anything extraordinary that I haven't done many times before, other than the dd command. I have a very plain machine with 2 meg RAM, miniscribe 6085, and the stock disk controller. No 2010s or hardware modifications. I was in ksh, and the metermaid display was on. Just before the crash, metermaid look normal. The %serial buffers was ~100%, %clists ~90%, %ram pages ~ 50%, %CPU idle ~0%, %CPU user ~40%, %CPU kernel ~60%, %CPU wait ~0%, printer selected and no errors. This is what the panic message looked like. The panic scribbled over the metermaid and some of the other stuff on the screen, so it was a bit tough to read. Seems somewhat elusive because I tried to duplicate the conditions again and have not been able to get the crash. type = 0x02, pid = 17144, pc = 0x6C09, rps = 0x2000, .... 0x4BB5C GSR = 8D00, BSR0 = 7C07, BSR1 = 2400 PHYSPF = 0 D0 = ff, D1 = 6030, D2 = 301, D3 = 5 D4 = 52, D5 = 400, D6 C000, D7 = 400 A0 = 30300, A1 = 72400, A2 = 4BB5C, A3 = 70E08 A4 = 70884, A5 = 413B8, A6 = 70820, userA7=2FF138/kernA7=707C4 KI-RAM@6BC:000422D8 51C8FFFC 4E75227C 004000E0 KS-RAM@707C4: 000101C6 000302FC 00072400 00000400 000270EC 00000000 00000034 08000007 0B560000 13312002 00051052 000001F4 00400602 00000000 00000001 FFFB7C30 panic: page fault in kernel I'm not sure what goes in the space where the .... is above. The display was too cluttered to make it out. Also the last number on the first line might be 0x4BB50 isntead of 0x4BB5C. Anybody have any ideas? Bill Mayhew (wtm@neoucom.edu) North Eastern Ohio Universites College of Medicine Rootstown, OH 44272-9995 ph: 216-235-2511