[comp.unix.ultrix] What is 'SMALL DISK ERROR' ?

elsen@esat.kuleuven.ac.be (02/25/90)

            Recently I have been getting the following errorlog entries :
            ( It happens a couple of times each day )

						  uerf version 3.1-003 (113)


********************************* ENTRY     1. *********************************

----- EVENT INFORMATION -----

EVENT CLASS                             ERROR EVENT 
OS EVENT TYPE                  102.     DISK ERROR 
SEQUENCE NUMBER                 12.
OPERATING SYSTEM                        ULTRIX 32 
OCCURRED/LOGGED ON                      Fri Feb 24 05:37:53 1989 WET
OCCURRED ON SYSTEM                      khvive1
SYSTEM ID                 x08000000
SYSTYPE REG.              x01010000
                                        FIRMWARE REV = 1.
PROCESSOR TYPE                          KA630 

----- UNIT INFORMATION -----

UNIT CLASS                              DSA DISK 
UNIT TYPE                               RD54 
CONTROLLER NO.                   0.
UNIT NO.                         0.
ERROR SYNDROME                          SMALL DISK ERROR 
CYLINDER                       574.

  -------------------------- 
   
    What kind of action should I undertake with respect to this ?
    (I am kind of worried since this beasty is the system disk of
my VAXstation II/GPX (it contains /  and /usr)).I have also noted that
the same CYLINDER is always involved.

    What does 'SMALL DISK ERROR' mean ? Does it mean that a correctable
    ECC error has occurred ?
    Or is it possible that an uncorrectable ECC with Bad Block Replacement
    has occurred ? (I prefer the first one...)

    This leads me to further questions (since I am relatively new with
    respect to Ultrix) :
    Does Ultrix (3.1) support online Bad Block Replacement ?
    If so then why is the program 'radisk' still needed ?


                          Thanks for all your clarifications,

-- 


  Marc Elsen (System Manager/Software Engineer)
  Katholieke Universiteit Leuven
  Dep. E.S.A.T.
  Kard. Mercierlaan 94
  3030 HEVERLEE
  Belgium
              tel. 32(0)16220931(ext. 1080)

               EMAIL : elsen@esat.kuleuven.ac.be

                       ...!kulcs!kulesat!elsen (UUCP)
                       elsen%kulesat.uucp@blekul60 (BITNET)
                       psi%02062166012::elsen  (VMS PSI MAIL)

grr@cbmvax.commodore.com (George Robbins) (03/01/90)

In article <2490.25e7e086@esat.kuleuven.ac.be> elsen@esat.kuleuven.ac.be writes:
> 
>             Recently I have been getting the following errorlog entries :
...
> 
>     What kind of action should I undertake with respect to this ?
>     (I am kind of worried since this beasty is the system disk of
> my VAXstation II/GPX (it contains /  and /usr)).I have also noted that
> the same CYLINDER is always involved.
> 
>     What does 'SMALL DISK ERROR' mean ? Does it mean that a correctable
>     ECC error has occurred ?

No, it means that you have a "SMALL DISK" and that you have some kind of
"ERROR" (not obvious from the printout) that probably needs to be addressed.

Try "/etc/uerf -o full -D" to see if you get more information on exactly
what kind of error condition is occuring...
-- 
George Robbins - now working for,     uucp:   {uunet|pyramid|rutgers}!cbmvax!grr
but no way officially representing:   domain: grr@cbmvax.commodore.com
Commodore, Engineering Department     phone:  215-431-9349 (only by moonlite)

alan@shodha.dec.com ( Alan's Home for Wayward Notes File.) (03/02/90)

In article <2490.25e7e086@esat.kuleuven.ac.be>, elsen@esat.kuleuven.ac.be writes:
> 
>             Recently I have been getting the following errorlog entries :
>             ( It happens a couple of times each day )

	[ A "Small Disk Error" follows. ]
> 
>     What kind of action should I undertake with respect to this ?

	Step 1: Get the "full" listing of the error from uerf(8).
	The option to do this is:

		uerf -o full [ other options you might want ]

>     What does 'SMALL DISK ERROR' mean ? Does it mean that a correctable
>     ECC error has occurred ?

	Once upon a time, somebody told me the difference between
	a "small disk" error and a normal disk error.  It may have
	something as simple as the error being from a small disk.
	The full uerf(8) listing should tell you exactly what the
	error was.

>     Or is it possible that an uncorrectable ECC with Bad Block Replacement
>     has occurred ? (I prefer the first one...)

	If this was the case you'd either be seeing Forced Errors
	on the LBN or you have have a good block and the error
	would go away.
> 
>     This leads me to further questions (since I am relatively new with
>     respect to Ultrix) :
>     Does Ultrix (3.1) support online Bad Block Replacement ?
>     If so then why is the program 'radisk' still needed ?

	Yes, ULTRIX V3.1 (and every version since V2.0) had
	dynamic BBR.  The operations performed by radisk are:

		-s	scan
		-c	clear forced errors
		-r	replace

	The scan could be done just by using dd(1) and if the
	bad block is encounted the BBR code of the host or
	controller will be envoked (the RQDX3 does BBR itself).
	Of course dd(1) will probably fail and you'll have to
	start it over.  The conv=noerror switch may be enough
	to avoid this.  The "scan" switch of radisk on the other
	hand merely tells the controller to read the entire disk
	without transfering the data back to the host.  This ends
	up being faster than dd(1).

	You can clear the forced errors just by writting over the
	block.  Of course since you can't read the block first
	because the forced error will cause an input error.  radisk(8)
	will at least preserve the corrupted contents which may be
	enough to reconstruct the rest of the block.

	The replace could be done with a simple program that reads
	the block over and over it fails or gives up (a built in
	threshold).  All of the functions of radisk(8) could be 
	provided by other program, but it has the convience of 
	putting them all in one place.
> 
> -- 
>   Marc Elsen (System Manager/Software Engineer)
-- 
Alan Rollow				alan@nabeth.enet.dec.com