[comp.sys.amiga] Hard Drive Grumbling

hrlaser@pnet02.cts.com (Harv Laser) (01/08/88)

The hardware: A1000, Starboard2 (2 megs ram), Supra 4x4 interface to
Supra 20 meg hard drive.

The history: A1000 stone reliable, used for over 2 years, no problems.
             Starboard2, a year old, ditto.
             Supra 20 meg, had for 3 weeks, bought used, everything
                 worked like gangbusters for 3 weeks...till today...

The disaster: I'm sitting here merrily shuffling some icons around
 in many windows, doing 'cleanups' and then 'snapshots' to the hard drive.
I've done this hundreds upon hundreds of times with no ill effects... so
I open up a window, shuffle some icons around (I use CLI and Workbench
both, depends on what I'm doing, but I'm no beginner), clean them up,
extended-select them all and the drawer icon they live in, hit snapshot
from the menu, some hard disk grinking (rather than floppy disk gronking)
and alluvasudden, up pops a requester "Error Validating Volume Supradrive,
Key 7553 already set"... and a couple more grinks, and then another one
"Disk Structure Corrupt! Use DISKDOCTOR to repair"  <<<<sigh>>>>

I go back and look at the icons I had been cleaning and shuffling and 
notice that a couple of them had long names and the names were overlapping.
Normally I didn't think this was a problem (the names of two icons
overlapping each other)... the icons themselves weren't overlapping, but
their names were. Anyway, no dice. I reboot 4 or 5 times, and get a
"Can't Validate Volume Supradrive" requester every single time. 
DH0: (my 10 meg partition) doesn't show up on an INFO command (says
"validating"), and as the tears stream down my face, I begin the laborious
process of backing up, reformatting, and restoring. 

Now I suppose this question should go to CATS, but I'll take all answers:
what gives here? Can overlapping icon names cause this disaster to happen?
Is this a known Intuition bug? Is there ANY way short of a hard disk
reformat to get the sucker to validate when that "Key nnnn already set"
requester comes up? 

Parenthetically, some other comments and observations:
- I phoned Supra in Oregon and told all of this to them, and their answer
  was "reformat it and restore your files from your backup..forget about
  Diskdoctor, that'll just make it worse" (he was right, by the way...
  before reformatting, I tried Diskdoctor just for the hell of it...
  it made it worse. It reported a bunch of corrupt files, I got even
  more "Key nnn already set" requesters with some VERY long numbers
  for the "key", and it didn't solve the validation problem
- I used SDBackup to backup all the files (good program although slow)
  but since I had backed up a couple weeks ago, the 'archive bit' on most
  of the files on the hard drive had been set. Well guess what happens
  when a hard drive can't validate? You got it, you can't re-set the
  archive bit, so every single file that I backed up with SDBackup
  popped up another requester that said "Volume Supradrive is not
  Validated!" which I had to ansewr with a mouse click. Figuring a
  backup with a mouse click after every file would take days, since I
  had HUNDREDS of files on DH0:, I got yet another mutated inspiration
  to use Bryce's "cancel" program to cancel all the requesters automatically.

  That worked for a while, and SDBackup merrily did its thing, till it
  hit a file and reported "Volume SupraDrive has a read/write error"...
  well since I didn't get a requester, since 'cancel' was cancelling it,
  everything hung up... the mouse pointer wouldn't move, and I had to
  reboot, thus messing up my backup. 

So the net result is, I now have a reformatted hard drive using the new
Supra harddisk.device, and new Supramount, done with the new SupraFormat
program, and things seem a bit faster and generally more solid... but I
still want to know how I ended up in Dante's Harddrive Inferno to
begin with?  And... has anyone come up with a program that can overcome
the "Key nnnn already set" and "Volume xxx is not Validated" problem
that could possibly save me, or others, in the future should this 
unfortunate occurance happen again?  And what about icons with overlapping
long names? Is this a problem? 


UUCP: {ihnp4!scgvaxd!cadovax, rutgers!marque}!gryphon!pnet02!hrlaser
INET: hrlaser@pnet02.cts.com

richard@gryphon.CTS.COM (Richard Sexton) (01/08/88)

(this hasnt been one of your all time great king hell weeks around here)

as long as we're talking about filesystem wierdnesses, heres one:

I have corrupted floppy. I have been able to DiskSalv the files
off it, but if I try to CD to it or even DiskCopy it, the machine
Guru's.

Now, I dont know a lot about this stuff, but this doesnt seem right.

Comments ?


-- 
   It's too far to put Santa Fe in my ignition, or something like that. 
              richard@gryphon.CTS.COM    crash!gryphon!richard

page@swan.ulowell.edu (Bob Page) (01/09/88)

[hey, my From: line is finally correct!]

hrlaser@pnet02.cts.com (Harv Laser) wrote:
>alluvasudden, up pops a requester "Error Validating Volume Supradrive,
>Key 7553 already set"... and a couple more grinks, and then another one
>"Disk Structure Corrupt! Use DISKDOCTOR to repair"  <<<<sigh>>>>

First of all, I'd like to see CBM change the requester - advocating
the use of DISKDOCTOR is just asking for trouble.

The 'Key' in AmigaDOS is just a block number.  The current file system
has 'information blocks' that have, among other things, maps of blocks
that point to other things.  The three main 'things' are:

1. Root block: has a list of the bitmap, top-level directories & files
2. Directory block has a list of the child directories and files
3. File block: has a list of the data blocks.

But you knew that.  I've oversimplified somewhat, but you get the
idea.  The root block is just a modified directory block.

The "Key 7553 already set" means that somebody tried to use block 7553
when it was already holding something else, like somehow you had two
entries for the same file in a directory, and it got noticed.  DOS
then marks the bitmap as BAD & tells you about it.

You probably can't find out what caused it, but you can probably
recover.  Using something like sectorama, you can search for
occurances of that block number across the disk.  First order of
business is to read that block and see what's in it.  If it's data,
look at the file the data belongs to (by looking at the parent of the
data block).  Find the parent of the file, that's the directory it's
in, etc.  Now that you (possibly) know what 7553 is being used for,
you then have the task of searching for some block somewhere that
falsely advertises IT is the parent of that file.

Of course, this is a pain in the ass, and I've probably oversimplified
too much to be of any use.

>Can overlapping icon names cause this disaster to happen?

No.  There is a known bug in Workbench where if you move a drawer into
a drawer that was inside the first drawer, it lets you do it and then
gets horribly confused.  But I don't think this is what you are up
against.

>Is there ANY way short of a hard disk reformat to get the sucker to
>validate when that "Key nnnn already set" requester comes up?

Not easily.  You can cheat and set the 'bitmap valid' word in the root
block, but you're asking for a LOT of trouble if you then write to the
disk, since your bitmap will be all messed up.  You can use Sectorama,
which (for the person who doesn't want to know about the internals of
the FS) is so much of a hassle that it is better just to backup the
disk (since you can read a disk that's not validated) and reformat it.

>- I used SDBackup to backup all the files (good program although slow)

You can turn off the compression, it speeds it up quite a bit.

>every single file that I backed up with SDBackup popped up another
>requester that said "Volume Supradrive is not Validated!"

Sometimes you just gotta turn that archive-bit-setting feature off!
[You can't, by the way]

>has anyone come up with a program that can overcome
>the "Key nnnn already set" and "Volume xxx is not Validated" problem
>that could possibly save me, or others, in the future

Unfortunately, there are STILL no utilities for hacking the FS that
are really useful.  I'm working on a suite of special-purpose programs
for error detection & recovery (for regular ol' users, not FS
hackers); and I'll certainly have one that fixes the "Key nnnn already
set".  The other thing I'm doing is committing all this info to bits &
bytes so other people learn the ins and outs (and dos and donts) of
the file systems.

Hmmm, maybe I'll patch KS to advertise MY programs instead of
DiskDoctor.  Yeah, and I'll call it "Get Outta My File System"!  Yeah
yeah, that's the ticket...	:-)

..Bob

PS People have already started asking me -- when?  Ah, sometime before
Christmas?  I just upgraded my disk controller, disk driver, and put
in a new file system, and the three don't seem to want to get along
the way I think they should.  Film at eleven.
-- 
Bob Page, U of Lowell CS Dept.  page@ulowell.edu  ulowell!page
"I've never liked reality all that much, but I haven't found a
better solution."		--Dave Haynie, Commodore-Amiga

page@swan.ulowell.edu (Bob Page) (01/09/88)

richard@gryphon.CTS.COM (Richard Sexton) wrote:
>(this hasnt been one of your all time great king hell weeks around here)
          ^^                                      ^^^^
Does that mean you're having a good week?  Happy New Year.

>I have corrupted floppy. I have been able to DiskSalv the files off it,
>but if I try to CD to it or even DiskCopy it, the machine Guru's.

The bug's in the filesystem handler (in ROM/WCS).  It's pickier than
it thinks it is.  One sure fire way to tickle it is to zap a file list
block, and don't tell the file header about it.

DiskSalv doesn't use the filesystem handler, it talks directly to the
device.

..Bob
-- 
Bob Page, U of Lowell CS Dept.  page@ulowell.edu  ulowell!page
"I've never liked reality all that much, but I haven't found a
better solution."		--Dave Haynie, Commodore-Amiga

daveh@cbmvax.UUCP (Dave Haynie) (01/12/88)

in article <2060@gryphon.CTS.COM>, richard@gryphon.CTS.COM (Richard Sexton) says:
> Summary: floppies too

> as long as we're talking about filesystem wierdnesses, heres one:

> I have corrupted floppy. I have been able to DiskSalv the files
> off it, but if I try to CD to it or even DiskCopy it, the machine
> Guru's.

> Now, I dont know a lot about this stuff, but this doesnt seem right.

Not sure what is corrupted, but it's certainly not all that weird.  DiskSalv
doesn't assume any more than it has to about the state of a disk, and as
such isn't going to crash on anything I've seen so far in the way of disk
corruption.  The DOS, however, does make certain assumptions, I'm not all
that sure just what, but if something is completely out of wack, there's a
definite possibility of the DOS crashing on it.   I suppose the DOS could
have been built to avoid such crashes by checking all the important data it
looks at, but that may have been too damaging to DOS speed to implement.

>    It's too far to put Santa Fe in my ignition, or something like that. 
>               richard@gryphon.CTS.COM    crash!gryphon!richard

-- 
Dave Haynie  "The B2000 Guy"     Commodore-Amiga  "The Crew That Never Rests"
   {ihnp4|uunet|rutgers}!cbmvax!daveh      PLINK: D-DAVE H     BIX: hazy
		"I can't relax, 'cause I'm a Boinger!"

richard@gryphon.CTS.COM (Richard Sexton) (01/14/88)

In article <2393@swan.ulowell.edu> page@swan.ulowell.edu (Bob Page) writes:
I wrote:
>>I have corrupted floppy. I have been able to DiskSalv the files off it,
>>but if I try to CD to it or even DiskCopy it, the machine Guru's.
>
>The bug's in the filesystem handler (in ROM/WCS).  It's pickier than
>it thinks it is.  One sure fire way to tickle it is to zap a file list
>block, and don't tell the file header about it.

Is this fixed in the next kickstart, right ?

CATS ?

-- 
      It's too dark in Santa Fe in my ignition, or something like that.
                          richard@gryphon.CTS.COM 
   {ihnp4!scgvaxd!cadovax, philabs!cadovax, codas!ddsw1} gryphon!richard

richard@gryphon.CTS.COM (Richard Sexton) (01/14/88)

In article <3128@cbmvax.UUCP> daveh@cbmvax.UUCP (Dave Haynie) writes:
I write:
>> Summary: floppies too
>
>> as long as we're talking about filesystem wierdnesses, heres one:
>
>> I have corrupted floppy. I have been able to DiskSalv the files
>> off it, but if I try to CD to it or even DiskCopy it, the machine
>> Guru's.
>
>
>Not sure what is corrupted, but it's certainly not all that weird.  DiskSalv
>doesn't assume any more than it has to about the state of a disk, and as
>such isn't going to crash on anything I've seen so far in the way of disk
>corruption.  The DOS, however, does make certain assumptions, I'm not all
>that sure just what, but if something is completely out of wack, there's a
>definite possibility of the DOS crashing on it.   I suppose the DOS could
>have been built to avoid such crashes by checking all the important data it
>looks at, but that may have been too damaging to DOS speed to implement.

Say what ? Couldn't we like, make it an option ? Like, -s for secure,
for those people willing to give up 35 milliseconds per disk access 
all so it doesn't guru (which IS annoying on a multi tasking system).

Reminds me of a job I had once:

"Hey boss, look how much faster DiskCopy is"

"What did you do"

"Took out the verify pass"

"Very good Richard. Your last official act here will be to put it back
in. Our customers want RIGHT not FAST."



-- 
      It's too dark in Santa Fe in my ignition, or something like that.
                          richard@gryphon.CTS.COM 
   {ihnp4!scgvaxd!cadovax, philabs!cadovax, codas!ddsw1} gryphon!richard

andy@cbmvax.UUCP (Andy Finkel) (01/15/88)

In article <2137@gryphon.CTS.COM> richard@gryphon.CTS.COM (Richard Sexton) writes:
>In article <3128@cbmvax.UUCP> daveh@cbmvax.UUCP (Dave Haynie) writes:
>I write:
>>> Summary: floppies too
>>
>>> as long as we're talking about filesystem wierdnesses, heres one:
>>
>>> I have corrupted floppy. I have been able to DiskSalv the files
>>> off it, but if I try to CD to it or even DiskCopy it, the machine
>>> Guru's.
>>
>>
>>Not sure what is corrupted, but it's certainly not all that weird.  DiskSalv
>>doesn't assume any more than it has to about the state of a disk, and as
>>such isn't going to crash on anything I've seen so far in the way of disk
>>corruption.  The DOS, however, does make certain assumptions, I'm not all

>Say what ? Couldn't we like, make it an option ? Like, -s for secure,
>for those people willing to give up 35 milliseconds per disk access 
>all so it doesn't guru (which IS annoying on a multi tasking system).
>

Most of those GURUs/Software Error requesters *are* AmigaDOS
checking the validity of those lists, blocks, and variables
it considers important, and giving up if it thinks things are
too far gone.  Bringing up the Software Error requester doesn't
halt the machine...on a 2 drive system you can even save to the
other drive (after AmigaDOS checks that one, of course)

Granted, its hard to tell the difference sometimes; the Software
Error requester is a good indication that this is a controlled
crash situation, where AmigaDOS does not think its safe for you
to continue, and you should clean up and reboot.

			andy
-- 
andy finkel		{ihnp4|seismo|allegra}!cbmvax!andy 
Commodore-Amiga, Inc.

"Any sufficiently advanced technology is indistinguishable from
 a rigged demo."

Any expressed opinions are mine; but feel free to share.
I disclaim all responsibilities, all shapes, all sizes, all colors.

cks@radio.toronto.edu (Chris Siebenmann) (01/24/88)

In article <2058@gryphon.CTS.COM> hrlaser@pnet02.cts.com (Harv Laser) writes:
...
>Is this a known Intuition bug? Is there ANY way short of a hard disk
>reformat to get the sucker to validate when that "Key nnnn already set"
>requester comes up? 

 You can use the various disk editors floating around to repair the
damage, if you can find the two (or more) places that are trying to
use 'Key nnnn' (which is really disk block nnnn; I don't know why
AmigaDOS calls them keys here). Doing this is hampered by a lack of
tools to find out who's trying to use that key; the disk validator
knows, but won't tell you, and I don't know of any other tools that
will. What I did when this happened to me was to figure out which file
the disk block really belong to, copy the file to floppy, and delete
it by zapping the directory with the disk editor (thank god, the file
a block belongs to can be determined from the block itself most of the
time).

>Parenthetically, some other comments and observations:
>- I phoned Supra in Oregon and told all of this to them, and their answer
>  was "reformat it and restore your files from your backup..forget about
>  Diskdoctor, that'll just make it worse" (he was right, by the way...
>  before reformatting, I tried Diskdoctor just for the hell of it...
>  it made it worse. It reported a bunch of corrupt files, I got even
>  more "Key nnn already set" requesters with some VERY long numbers
>  for the "key", and it didn't solve the validation problem

 At least in my case, it also made various important bits of the hard
drive disappear into oblivion (I managed to rescue some of them with a
disk editor, but others disappeared for good).

>- I used SDBackup to backup all the files (good program although slow)
>  but since I had backed up a couple weeks ago, the 'archive bit' on most
>  of the files on the hard drive had been set. Well guess what happens
>  when a hard drive can't validate? You got it, you can't re-set the
>  archive bit, so every single file that I backed up with SDBackup
>  popped up another requester that said "Volume Supradrive is not
>  Validated!" which I had to answer with a mouse click. Figuring a
>  backup with a mouse click after every file would take days, since I
>  had HUNDREDS of files on DH0:, I got yet another mutated inspiration
>  to use Bryce's "cancel" program to cancel all the requesters automatically.

 What I did the second time this happened was to use a disk editor to
mark the disk valid; then SDBackup could write those archive bits (I
used this ability later on).

>  That worked for a while, and SDBackup merrily did its thing, till it
>  hit a file and reported "Volume SupraDrive has a read/write error"...
>  well since I didn't get a requester, since 'cancel' was cancelling it,
>  everything hung up... the mouse pointer wouldn't move, and I had to
>  reboot, thus messing up my backup. 

 The first time around, I had the same problem you had; what I did was
to go through all the files on the HD with tar, deleting the ones it
couldn't read. This worked OK, but was very time consuming (20M HD,
unpartitioned, making a tar archive in NIL:). The second time around I
got smart. I first marked the disk as validated, then went through
with SDBackup marking things as unbacked-up, then used SDBackup to
back stuff up. Because SDBackup could now mark what files it had
backed up, I could recover fairly fast from a bad file cropping up. I
could also leave the machine unattended to do the backup; when I came
back to find it crashed, I could just use -V to find out what file it
had been trying to back up and zap that file.

-- 
	"I shall clasp my hands together and bow to the corners of the world."
			Number Ten Ox, "Bridge of Birds"
Chris Siebenmann		{allegra,mnetor,decvax,pyramid}!utgpu!radio!cks
cks@radio.toronto.edu	     or	...!utgpu!{chp!hak!ziebmef,ontmoh}!cks