adam@hyper.lap.upenn.edu (Adam Feigin) (02/14/88)
I've got a DSP90 with an MSD-500 that is exhibiting the following problem. I'm getting lots of "disk block header errors" at the first address on the disk. Has anybody out there seen this behavior ?? Is it a known bug/problem ??? We've had this problem for over a year, and everything has been replaced, including the MSD-500 itself, but the problem does not go away. Our local SSE thought that it could be a grounding problem, but everything checked out okay. The DSP doesn't crash or anything, but I've got a bad feeling that eventually nasty things are going to start happening. An excerpt from the system error log is included below. Wednesday, January 13, 1988 2:12:54 pm (EST) disk error storage module, volx=1, daddr=1: disk block header error (OS/disk manager) Sunday, January 24, 1988 12:50:42 pm (EST) disk error storage module, volx=1, daddr=1: disk block header error (OS/disk manager) Wednesday, February 3, 1988 2:52:13 pm (EST) disk error storage module, volx=1, daddr=1: disk block header error (OS/disk manager) 2:54:44 pm (EST) disk error storage module, volx=1, daddr=1: disk block header error (OS/disk manager) Friday, February 5, 1988 9:31:36 am (EST) disk error storage module, volx=1, daddr=1: disk block header error (OS/disk manager) Monday, February 8, 1988 10:16:45 am (EST) disk error storage module, volx=1, daddr=1: disk block header error (OS/disk manager) If anybody has a solution to this problem, I'd sure appreciate hearing about it !!!! Adam ------------------------------------------------------------------------------ ARPAnet: {root,adam}@{hyper,apollo}.lap.upenn.edu UUCP: {harvard,decwrl,rutgers,ihnp4}!super.upenn.edu!hyper.lap.upenn.edu!adam Adam Feigin Network Administrator Language Analysis Project University of Pennsylvania -----------------------------------------------------------------------------
krowitz@mit-richter.UUCP (David Krowitz) (02/16/88)
Hmm ... most of our DN460/600 disk drives, our DN560 MSD-190, and our DSP80 with two MSD-500's get these errors from time to time. We have seen no long term problem with this, but if you wanted to you could backup your disk, shut down the node, run FBS (ie. EX FBS) to find the bad spots on the disk (this will take a *long* time), run INVOL and make certain the bad blocks are marked in the system list, reformat the disk with INVOL, and restore you file system. You will spend a lot of time doing this, and I'm not certain it's worth it, but it might make you feel more secure. -- David Krowitz krowitz@richter.mit.edu (18.83.0.109) krowitz%richter@eddie.mit.edu mit-erl!mit-richter!krowitz@eddie.mit.edu mit-erl!mit-richter!krowitz@mit-eddie.arpa krowitz@mit-mc.arpa (in order of decreasing preference)
johnm@CAEN.ENGIN.UMICH.EDU (John Muckler) (02/16/88)
We're experiencing the same problems at the University of Michigan. Especially on DN3000's & 4000's with 348 meg disks. A UCR has been submitted but no response has been recieved yet. It is not very reassuring... Some assitance would be greatly appreciated! -------------------------- John E. Muckler Mgr. Computer Oper. University of Michigan College of Engineering/CAEN
rye@CAEN.ENGIN.UMICH.EDU (Ryland S. Marshall) (02/16/88)
We were having the same problem. It is a bogus error. If data address 1 was bad you would not be able to read the disk. Apollo sent us a patch tape that contains a new version of invol. It seems to have taken care of the problem (so far). Ask your sales rep for the version of invol that was last modified 11-30-87. Best of luck, rye.
adam@hyper.lap.upenn.edu (Adam Feigin) (02/16/88)
Well, I'm starting to suspect that there's something wrong with the disk controller (either the multibus SMD board or the drive electronics itself) as the MSD-500 was replaced in June with a BRAND NEW CDC drive, and yet these errors keep occuring (I'm thankful that Apollo ISN'T vanilla U**X, as errors on the first disk address would be disastrous !!) When you get these errors, are they at the first address on the disk, or are they at various places on the media (We also get errors at assorted disk addresses, but the errors at the first disk address happen at least once or twice a week) ?? Adam ------------------------------------------------------------------------------ ARPAnet: {root,adam}@{hyper,apollo}.lap.upenn.edu UUCP: {harvard,decwrl,rutgers,ihnp4}!super.upenn.edu!hyper.lap.upenn.edu!adam Adam Feigin Network Administrator Language Analysis Project University of Pennsylvania -----------------------------------------------------------------------------
kwongj@caldwr.caldwr.gov (James Kwong) (02/18/88)
In article <3392@super.upenn.edu>, adam@hyper.lap.upenn.edu (Adam Feigin) writes: > Well, I'm starting to suspect that there's something wrong with the > disk controller (either the multibus SMD board or the drive electronics > itself) as the MSD-500 was replaced in June with a BRAND NEW CDC drive, > and yet these errors keep occuring (I'm thankful that Apollo ISN'T > vanilla U**X, as errors on the first disk address would be disastrous !!) > When you get these errors, are they at the first address on the disk, or > are they at various places on the media (We also get errors at assorted > disk addresses, but the errors at the first disk address happen at least > once or twice a week) ?? > > Adam > > ------------------------------------------------------------------------------ We're having the same problem here with our DSP90/Control Data Corp.500 mb. storage module except the header errors always occur on vol. 1 addr 1. A service rep. came out and moved the label to another spot on the storage module (ran some kind of fix_vol or something like that) and salvol the disk to no avail. We still get the disk block header error now and then. I noticed that if I partnered the DSP 90 to a diskless node the frequency of the error messages increased from once every several months to once every few days. I was told basically the same thing; the problem was probably caused by a grounding problem somewhere, that it was common among the DSP 90s and that the errors probably occurred when the disk is in a reading mode, and not writing mode and as such I should not have to worry too much about the integrity of the data. Still though, I would feel less paronoid if these errors didn't crop up now and then. In your case with the errors occuring at different spots, your assessment of the cause sounds reasonable. I take it that these header messages (other that vol. 1 addr 1) also refer to the the storage module and not the error messages caused by specifying non-existing devices on the DSP 90 such as a cartridge drive or floppy drive. Excerpt from our 'lsyserr': Friday, December 18, 1987 3:45:19 pm (PST) disk error storage module, volx=1, daddr=1: disk block header error (OS/disk manager) Monday, December 21, 1987 3:14:54 pm (PST) disk error storage module, volx=1, daddr=1: disk block header error (OS/disk manager) James Kwong Calif. Depart. of H2O Resources, Sacramento, CA 95802 ucdavis.edu!caldwr!kwongj (Internet) ...!ucbvax!ucdavis!caldwr!kwongj (UUCP) -- James Kwong Calif. Depart. of H2O Resources, Sacramento, CA 95802 ucdavis.edu!caldwr!kwongj (Internet) ...!ucbvax!ucdavis!caldwr!kwongj (UUCP) "Our program who art in memory, HELLO be thy name.. " The opinions expressed above are mine, not those of the State of California or the California Department of Water Resources.
rees@apollo.uucp (Jim Rees) (02/19/88)
Note that "disk block header error" usually means that the block was read OK, but had funky stuff in the block header. This does not imply a disk read error, and any attempt to put the block in the badspot list will only make things worse. The block header is an extra thing stuck in front of the data that contains useful stuff like the uid of the object that the data came from. In theory this makes it easier to salvage the disk if something goes wrong. I think this idea came from Parc, where it was used in the Alto file systems. There are various programs in /systest/ssr_util (rwvol, fixvol) that can read and display block headers. But take the warnings seriously -- you can screw up your disk with these if you don't know what you're doing.