[comp.unix.ultrix] fsck hanging

jwe@che.utexas.edu (John W. Eaton) (03/03/91)

I'm running Ultrix 3.1 on a Vaxstation 3200 and have run into a
problem with a new Hitachi DK515C disk and CMD-220/TM controller.

The installation of the controller and the disk seemed to go smoothly.
I can create filesystems on the disk with newfs, mount them, and even
read and write files.  Unfortunately, when I try to run fsck on any
partition, it almost always hangs after displaying the pass 1 message.
When it hangs, I can't kill it and ps displays something like

  1313 p0 D   00:00 fsck /dev/rra2a ...

What could cause fsck to hang, and what makes this (or any) process
impossible to kill?

I set up the kernel configuration file so that ra2 is drive 0 for the
CMD controller.  At boot time, a message is displayed to the console
which indicates that somewhere, something thinks ra2 is an RA82 disk.
Is it possible to partition this disk as an RA82?  If not, does anyone
have a correct disktab entry for a DK515C?

Does anyone else out there have this hardware configuration?  If so,
how did you make it work?

Any help would be greatly appreciated.


John W. Eaton                          `Questions, questions, questions...'
jwe@che.utexas.edu
Department of Chemical Engineering
The University of Texas at Austin

torek@elf.ee.lbl.gov (Chris Torek) (03/04/91)

In article <45009@ut-emx.uucp> jwe@che.utexas.edu (John W. Eaton) writes:
>I'm running Ultrix 3.1 on a Vaxstation 3200 and have run into a
>problem with a new Hitachi DK515C disk and CMD-220/TM controller.
>
>The installation of the controller and the disk seemed to go smoothly.

`Seemed' is the key word.

>I can create filesystems on the disk with newfs, mount them, and even
>read and write files.  Unfortunately, when I try to run fsck on any
>partition, it almost always hangs after displaying the pass 1 message.
>When it hangs, I can't kill it and ps displays something like
>
>  1313 p0 D   00:00 fsck /dev/rra2a ...
>
>What could cause fsck to hang, and what makes this (or any) process
>impossible to kill?

The process is sound asleep in the kernel, waiting for some event
to happen which is certain to happen very soon (e.g., the disk controller
interrupts, saying it has completed its I/O request), but which has
never happened and probably never will, since Something Is Broken.

There is not enough information here to tell what it is that is broken.

>I set up the kernel configuration file so that ra2 is drive 0 for the
>CMD controller.  At boot time, a message is displayed to the console
>which indicates that somewhere, something thinks ra2 is an RA82 disk.

The MSCP protocol defines a `drive type' field (actually two such
fields, but one appears to be unreliable) in which the names of the
controller and drive are encoded in little 5-bit fields with a trailing
7-bit number.  Since some drivers refuse to work at all with drives
they do not recognise, CMD and Emulex and anyone else who make an MSCP
controller substitute their favourite drive names for whatever it is
you really have.  Your controller is lying, in other words.

Fortunately, MSCP defines a `size' field giving the size of the drive
in sectors (as well as geometry fields giving the layout), so it is
possible to work around (or ignore) the lie.  You should be able to
put a partition table on the drive that differs from the usual RA82
table, and which uses all (and only) the sectors that actually exist.
-- 
In-Real-Life: Chris Torek, Lawrence Berkeley Lab EE div (+1 415 486 5427)
Berkeley, CA		Domain:	torek@ee.lbl.gov