[comp.sys.hp] bad m/o disks?

paul@eye.com (Paul B. Booth) (01/07/91)

OK, I'm baffled.  Maybe someone out there's seen this before:
I'm using an hp 6300.650a optical drive to do nightly backups for a group
of about 20 hp900's (300's/800's).  The o/m drive is mounted on a 350 running
hpux 7.05 (straight, no patches).  Starting a week ago, I come in to find this
node halted due to a bus error panic that occured while the backup was writing
to the o/m.
I bring everything back up, but the o/m disk won't fsck because it has an
unreadable block.  OK, I grab what I can off this disk and mediainit it
(should spare bad blocks, right?).  Put a new filesystem on it, and... makes
no difference.  The node panics again when it tries to write to the o/m, and
when it comes back up, the o/m again has an unreadable block.

Now I guess I can replace the o/m disk, but the thought of chucking a $250 disk
because of a bad block is upsetting.  Shouldn't the scsi driver be able to
handle this kind of thing more gracefully?  Is there a better way (than
mediainit) to find and spare bad blocks on an o/m disk?  Is there a scsi driver
patch that I need?  Thanks in advance for any wisdom y'all can lend.
--
Paul B. Booth  (paul@eye.com) (...!hplabs!hpfcla!eye!paul)
-------------------------------------------------------------------------------
3D/EYE, Inc., 2359 N. Triphammer Rd., Ithaca, NY  14850    voice: (607)257-1381
                                                             fax: (607)257-7335

glen@hpfcmgw.HP.COM (Glen Robinson) (01/09/91)

/ paul@eye.com (Paul B. Booth) / writes -
>I'm using an hp 6300.650a optical drive to do nightly backups for a group
>of about 20 hp900's (300's/800's).  The o/m drive is mounted on a 350 running
>hpux 7.05 (straight, no patches).  Starting a week ago, I come in to find this
>node halted due to a bus error panic that occured while the backup was writing
>to the o/m.
 
etc.
----------

I won't comment on the m/o drive, however - how did you get 7.05, and what
date is it?  Unless it is very late, i.e., close to production bits (and
as far as I know it is not there yet), you can have a version which does
NOT have the apropriate patches in it.

Otherwise, if it is in fact 7.0 or 7.03 you really need to get p107 aka
the August bits.

--Usual drivel about not being a position et. al.

Glen R.

goossens@prl.philips.nl (goossens lmc) (01/09/91)

In article <1991Jan07.150748.415@eye.com> paul@eye.com (Paul B. Booth) writes:
>
>The o/m drive is mounted on a 350 running hpux 7.05 (straight, no patches).
>Starting a week ago, I come in to find this node halted due to a bus error panic
>
Some patches problaby solves your panics. I installed:

PATCH_7.0:$Header: scsi_if.c,v 1.2.17.11 90/08/07 16:17:02 paul Exp $
PATCH_7.0:$Header: scsi.c,v 1.2.17.12 90/08/07 16:16:42 paul Exp $
PATCH_7.0:$Header: scsi_ccs.c,v 1.2.17.6 90/08/07 16:17:18 paul Exp $
PATCH_7.0:$Header: ac.c,v 1.2.17.7 90/01/23 12:56:27 prem Exp $


>
>but the o/m disk won't fsck because it has an unreadable block.
>
I've had same problems with the 20GB/A and a 650A on my HP9000/370
(HP-UX 7.0 with the latest SCSI patches, see above). Sometimes bad 
blocks were introduced (fsck: cannot read blocks).  After HP swapped 
the prom on the conroller-board of each drive, my problems were solved.
The old EPROM was numbered: C1700-89000 CCP 2.13, the new one is
numbered: C1700-89601 CCP 3.02 (probably sony numbers). I do not know  
the HP part-number. They told me that some error-recovering routine in
the firmware caused the problem.


  			   			    Louis Goossens

glad@daimi.aau.dk (Michael Glad) (01/10/91)

In article <2448@prles2.prl.philips.nl> goossens@prl.philips.nl writes:

>..................................... After HP swapped 
>the prom on the conroller-board of each drive, my problems were solved.
>The old EPROM was numbered: C1700-89000 CCP 2.13, the new one is
>numbered: C1700-89601 CCP 3.02 (probably sony numbers). I do not know  
>the HP part-number. They told me that some error-recovering routine in
>the firmware caused the problem.

Can anyone reveal how one check these numbers. I suppose it's not the
numbers in Information Log 1 (Firmware Version Number). My numbers
are '4 76'.

I'm in the process of moving a tape based archival system onto our
newly aquired autochanger.

The only problem I've had until now was my little mediainit script failing
for the last 3 or 4 MO surfaces due to hardware errors. It had run for
many hours, so perhaps everything got a bit too hot...
When I later initialized the surfaces one at a time, there were no problems.


Michael Glad,
Computer Science Department,
Aarhus University,
Aarhus,
Denmark

email: glad@daimi.aau.dk