elsen@esat.kuleuven.ac.be (02/25/90)
Recently I have been getting the following errorlog entries : ( It happens a couple of times each day ) uerf version 3.1-003 (113) ********************************* ENTRY 1. ********************************* ----- EVENT INFORMATION ----- EVENT CLASS ERROR EVENT OS EVENT TYPE 102. DISK ERROR SEQUENCE NUMBER 12. OPERATING SYSTEM ULTRIX 32 OCCURRED/LOGGED ON Fri Feb 24 05:37:53 1989 WET OCCURRED ON SYSTEM khvive1 SYSTEM ID x08000000 SYSTYPE REG. x01010000 FIRMWARE REV = 1. PROCESSOR TYPE KA630 ----- UNIT INFORMATION ----- UNIT CLASS DSA DISK UNIT TYPE RD54 CONTROLLER NO. 0. UNIT NO. 0. ERROR SYNDROME SMALL DISK ERROR CYLINDER 574. -------------------------- What kind of action should I undertake with respect to this ? (I am kind of worried since this beasty is the system disk of my VAXstation II/GPX (it contains / and /usr)).I have also noted that the same CYLINDER is always involved. What does 'SMALL DISK ERROR' mean ? Does it mean that a correctable ECC error has occurred ? Or is it possible that an uncorrectable ECC with Bad Block Replacement has occurred ? (I prefer the first one...) This leads me to further questions (since I am relatively new with respect to Ultrix) : Does Ultrix (3.1) support online Bad Block Replacement ? If so then why is the program 'radisk' still needed ? Thanks for all your clarifications, -- Marc Elsen (System Manager/Software Engineer) Katholieke Universiteit Leuven Dep. E.S.A.T. Kard. Mercierlaan 94 3030 HEVERLEE Belgium tel. 32(0)16220931(ext. 1080) EMAIL : elsen@esat.kuleuven.ac.be ...!kulcs!kulesat!elsen (UUCP) elsen%kulesat.uucp@blekul60 (BITNET) psi%02062166012::elsen (VMS PSI MAIL)
grr@cbmvax.commodore.com (George Robbins) (03/01/90)
In article <2490.25e7e086@esat.kuleuven.ac.be> elsen@esat.kuleuven.ac.be writes: > > Recently I have been getting the following errorlog entries : ... > > What kind of action should I undertake with respect to this ? > (I am kind of worried since this beasty is the system disk of > my VAXstation II/GPX (it contains / and /usr)).I have also noted that > the same CYLINDER is always involved. > > What does 'SMALL DISK ERROR' mean ? Does it mean that a correctable > ECC error has occurred ? No, it means that you have a "SMALL DISK" and that you have some kind of "ERROR" (not obvious from the printout) that probably needs to be addressed. Try "/etc/uerf -o full -D" to see if you get more information on exactly what kind of error condition is occuring... -- George Robbins - now working for, uucp: {uunet|pyramid|rutgers}!cbmvax!grr but no way officially representing: domain: grr@cbmvax.commodore.com Commodore, Engineering Department phone: 215-431-9349 (only by moonlite)
alan@shodha.dec.com ( Alan's Home for Wayward Notes File.) (03/02/90)
In article <2490.25e7e086@esat.kuleuven.ac.be>, elsen@esat.kuleuven.ac.be writes: > > Recently I have been getting the following errorlog entries : > ( It happens a couple of times each day ) [ A "Small Disk Error" follows. ] > > What kind of action should I undertake with respect to this ? Step 1: Get the "full" listing of the error from uerf(8). The option to do this is: uerf -o full [ other options you might want ] > What does 'SMALL DISK ERROR' mean ? Does it mean that a correctable > ECC error has occurred ? Once upon a time, somebody told me the difference between a "small disk" error and a normal disk error. It may have something as simple as the error being from a small disk. The full uerf(8) listing should tell you exactly what the error was. > Or is it possible that an uncorrectable ECC with Bad Block Replacement > has occurred ? (I prefer the first one...) If this was the case you'd either be seeing Forced Errors on the LBN or you have have a good block and the error would go away. > > This leads me to further questions (since I am relatively new with > respect to Ultrix) : > Does Ultrix (3.1) support online Bad Block Replacement ? > If so then why is the program 'radisk' still needed ? Yes, ULTRIX V3.1 (and every version since V2.0) had dynamic BBR. The operations performed by radisk are: -s scan -c clear forced errors -r replace The scan could be done just by using dd(1) and if the bad block is encounted the BBR code of the host or controller will be envoked (the RQDX3 does BBR itself). Of course dd(1) will probably fail and you'll have to start it over. The conv=noerror switch may be enough to avoid this. The "scan" switch of radisk on the other hand merely tells the controller to read the entire disk without transfering the data back to the host. This ends up being faster than dd(1). You can clear the forced errors just by writting over the block. Of course since you can't read the block first because the forced error will cause an input error. radisk(8) will at least preserve the corrupted contents which may be enough to reconstruct the rest of the block. The replace could be done with a simple program that reads the block over and over it fails or gives up (a built in threshold). All of the functions of radisk(8) could be provided by other program, but it has the convience of putting them all in one place. > > -- > Marc Elsen (System Manager/Software Engineer) -- Alan Rollow alan@nabeth.enet.dec.com