[comp.sys.amiga] Close Call

mercurio@crash.CTS.COM (Phil Mercurio) (07/14/87)

[-]


The following is a description of a close call I had with my 20 MB
Supra hard disk on my Amiga.  Warning:  those of you with queasy
stomachs, who lie awake at night worrying about how thoroughly you've
backed up your hard disk, may find the material below distressing.
I will attempt to alleviate the suspense, however, by revealing now
that there is a happy ending.  I should also mention that this is
a tad long-winded.


First, I should describe my hardware configuration.  I have an old
Amiga 1000 (acquired in Sept. '85) with a 20 MB Supradrive, an
ASDG Minirack-C containing a 2 MB memory board plugged into the Supra
controller's bus connector, and two external 3.5" drives (I have a
long desk).  All of what I describe occurred under Kickstart/Workbench 
1.2.

Last night, a friend and I were attempting to use Carolyn Scheppner's
wonderful Cmd program to generate a printout that we would later 
send to an HP LaserJet.  We were using my PagePrint1.3 program (available
on DevDisk0005), which pretty-prints C source, and both the C file being
printed and the printer image file being generated by Cmd were on my Supra
hard disk (in different directories).  PagePrint accesses the printer by
Open()'ing "prt:" -- nothing magical going on here.  After a few seconds of
disk activity, the system froze (no Guru, just an unresponsive system).
This didn't really bother us, since a screen grabbing program (Snatch, part
of the commercially available WindowPrint) we'd been using earlier had been
crashing repeatedly.

We administer the Amiga/Vulcan nerve pinch (CTRL-Ah-Ah) and the system
reboots.  The first command in my Startup-Sequence is Supramount, as
it should be.  Upon the attempt to be mounted, the Supra starts blinking
its busy light madly and continuously (it normally mounts in a second
or so).  This went on for several seconds, maybe a minute, before my
friend and I realized something was not right.  We rebooted again, and
again the system seemed to get stuck attempting to mount the drive,
blinking that little busy light until the cows come home (there aren't
many cows here in San Diego -- I'm from Chicago, where cows have been
known to cause trouble in the past).

We attempted to boot off of a standard 1.2 Workbench disk (which does
not attempt to mount the drive), and it came up fine.  We edited the
Startup-Sequence on my Workbench to remove just the Supramount command
and booted from that disk -- no problems.  Attempting to mount the
drive again caused it to busy loop, again.

We rebooted again, this time shutting the power to the entire system off
before it knew what hit it.  We waited about 10 seconds, then turned the
power back on (I have everything plugged into one power strip so I can turn
everything on at once -- this has been working fine for months).  The
system kickstarted properly, but again busy looped at the attempt to
mount the drive.

This was starting to look serious.  Although I was reasonably well backed
up, I had no desire to reformat and reconstruct an 85% full 20 MB disk.
We had begun to suspect that the Supra was taking the initiative and
attempting the reformatting for me.

It was clearly time for action.  We had one observation upon which we
could base a decision:  the pattern of blinking of the busy light
demonstrated a small amount of variability -- signs of life!  I
consulted with my friend, Phil Cohen, whose wisdom I have sought out
and deferred to since we worked together at UCSD 9 years ago.  He'd
been witness to this entire melodrama.  We decided that the drive
probably either knew what it was doing, or else all was lost anyway.
We decided to attempt to mount the drive again, and this time, to
wait until the busy light stopped blinking.

So we waited, and we waited, and then we waited some more.  If this 
were a 1940's black-and-white movie, the hands of an analog clock would 
be seen advancing at many times their normal rate, then the pages
of a calendar would begin to tear off and flutter away.  Seasons
would change, the frost would melt, springtime birds would begin to
chirp ... sorry, I get carried away.  Seriously, I was too distraught
to think to time anything, but I would estimate that the light blinked
for over 5 minutes.  Occasionally, it would pause for a second or
two, then continue.

And then, it stopped.  

I gingerly approached the keyboard to cd to dh0: -- it was there!
I checked out a few important directories -- all there.  I did an
info -- the hard disk was as full as I had expected it to be.  We
toasted our good fortune and began backing up everything in sight.

And I've had no problems since then.  It ran flawlessly for several
hours more that night, and is still humming along this morning.  The
damn thing actually healed itself.

I have no idea why this occurred in the first place, and I have no
desire to attempt to replicate it.  I don't think the fault lies
with either Scheppner's Cmd or with my PagePrint, but rather with
the Supra itself.  But then, it did fix itself, so it's hard to
complain.  All in all, I'm pleased with the Supra's performance,
and would chalk this up to either good design on Supra's part or
just plain dumb luck.

Phil Mercurio
DevWare, Inc.

mercurio@pnet01.CTS.COM     Usenet
mercurio                    PeopleLink or GEnie

bryce@COGSCI.BERKELEY.EDU (Bryce Nesbitt) (07/15/87)

Sounds sort of like the AmigaDOS disk "validator".  You described a
system crash and when things came up again the drive spent 5 minuites
blinking it's light.
If it's the validator then what happened was this:  The crash prevented
AmigaDOS from from perfroming a final update on the hard disk.  Next
time things where booted AmigaDOS noticed somehting "strange" and decided
to validate the ENTIRE disk.

AmigaDOS keeps a lot of links that can be used to reconstruct most of
even a badly mangled disk.  THIS IS A FEATURE!  If you happend to zap
the dual copies of the FAT on an IBM disk you be "up data creek without
any file links".  (Your data would be quite scrambled)

-----------------------------
|\ /|  . Ack! (NAK, EOT, SOH)
{o O} . 
( " )	bryce@cogsci.berkeley.EDU -or- ucbvax!cogsci!bryce
  U	"Success leads to stagnation; stagnation leads to failure."

cmcmanis%pepper@Sun.COM (Chuck McManis) (07/15/87)

Phil, and others who will inevitably have this happen to them. I believe 
you have witnessed the disk-validator in the flesh. You see when the
disk is being written to and hasn't finished the file yet, and the machine
crashes the disk has to be revalidated before AmigaDOS will trust it.
If this has happened to you on a full floppy you know that it took
nearly a minute to validate your 880K floppy, if it happened on your
20 meg hard disk, it doesn't take 20 minutes but it takes a good long
time. So yes, you will need to wait a while, and cross your fingers
and hope it doesn't say 'Disk structure Corrupt, Use DiskDoctor to 
correct it.' 


--Chuck McManis
uucp: {anywhere}!sun!cmcmanis   BIX: cmcmanis  ARPAnet: cmcmanis@sun.com
These opinions are my own and no one elses, but you knew that didn't you.

higgin@cbmvax.UUCP (Paul Higginbottom SALES) (07/15/87)

In article <1385@crash.CTS.COM> mercurio@crash.CTS.COM (Phil Mercurio) writes:
$The following is a description of a close call I had with my 20 MB
$Supra hard disk on my Amiga.
$[...SNIP...]
$We decided to attempt to mount the drive again, and this time, to
$wait until the busy light stopped blinking. ...I would estimate that
$the light blinked for over 5 minutes.  And then, it stopped.
$I gingerly approached the keyboard to cd to dh0: -- it was there!
$I checked out a few important directories -- all there.  I did an
$info -- the hard disk was as full as I had expected it to be.
$Phil Mercurio
$DevWare, Inc.

What you had witnessed was simply the hard drive being VALIDATED by
AmigaDOS.  Since it was SOOOOO full, it took forever (well.. 5 minutes).

This was caused by the fact that the disk bitmap probably didn't
checksum correctly, and caused AmigaDOS to rebuild it.  And the bitmap
didn't checksum because a command had been writing to the disk when
the machine crashed so badly that DOS didn't have chance to finish
writing the critical information back to the drive.  But, thanks to
the redundancy in AmigaDOS, it is able to heal itself.

Ya know, some people complain about the poor performance of AmigaDOS,
but ask yourself something - have you ever lost a disk because of
the DOS?  In my experience it has ALWAYS been media failure, or
copy protection failure.

	Paul Higginbottom.

davidlo@madvax.UUCP (David Lo) (07/16/87)

In article <8707150606.AA03673@cogsci.berkeley.edu>, bryce@COGSCI.BERKELEY.EDU (Bryce Nesbitt) writes:
> If it's the validator then what happened was this:  The crash prevented
> AmigaDOS from from perfroming a final update on the hard disk.  Next
> time things where booted AmigaDOS noticed somehting "strange" and decided
> to validate the ENTIRE disk.
> 
> AmigaDOS keeps a lot of links that can be used to reconstruct most of
> even a badly mangled disk.  THIS IS A FEATURE!  If you happend to zap
> the dual copies of the FAT on an IBM disk you be "up data creek without
> any file links".  (Your data would be quite scrambled)
> 

   I've noticed the first time I boot from an almost full workbench, the
   Amiga took quit a while (like 1 to 3 minutes, or at least seems to be)
   to load workbench.  The subsequence reboot takes less time.  

   I guess what happens is the Amiga actually attempted to rearrange the
   segments of the workbench disk.  Isn't it true ?


-- 
David Lo   (415)939-2400                                          /\  o
Varian Instruments, 2700 Mitchell Drive, Walnut Creek, CA 94598     \/
{ptsfa,lll-crg,zehntel,dual,amd,fortune,ista,rtech,csi,normac}varian!davidlo

rokicki@rocky.STANFORD.EDU (Tomas Rokicki) (07/16/87)

SCSI drives fix themselves.  If your box crashes while doing
a write to the drive, the SCSI drive will play with itself
for a while until it repairs itself.  The time this takes is
dependent on the size of your disk partition; this is an
excellent reason to partition a hard disk.  Perhaps a 4MByte
development partition where you develop your most crash-prone
programs, so the box comes back up quickly.

Sometimes, as happened to my CLtd drive a long time ago, the
light never stops flashing.  (We are talking days here.)  Then
it's time to worry.  I turned the system off for a week, turned
it back on, and everything was back.  I sent the drive back to
CLtd anyway . . .

-tom

fnf@mcdsun.UUCP (Fred Fish) (07/16/87)

In article <2122@cbmvax.UUCP> higgin@cbmvax.UUCP (Paul Higginbottom SALES) writes:
>Ya know, some people complain about the poor performance of AmigaDOS,
>but ask yourself something - have you ever lost a disk because of
>the DOS?  In my experience it has ALWAYS been media failure, or
>copy protection failure.

I suspect this is going to generate *lots* of heat and flames!  The DOS
seems to be absolutely the most fragile part of an otherwise nicely
engineered system (aside from lack of an MMU which is probably my number
one gripe).  I have never lost a single floppy to a media failure, after
it passed format and verified, though I've run into a few that wouldn't
format (far less than 1%).  But then I always use top grade DSDD floppies.
I have lost a couple that could be attributed to copy protection.  But
I've lost count of the number of floppies that have bit the dust because
the system guru'd while a write to the floppy was in progress.  I've
also had to completely reformat and reload my hard disk several times
because of the same problem.  Because I am probably more paranoid than
most people about the filesystem reliability, I've backed up my stuff
religiously and have not yet lost anything really important.

-Fred
-- 
= Drug tests; just say *NO*!
= Fred Fish  Motorola Computer Division, 3013 S 52nd St, Tempe, Az 85282  USA
= seismo!noao!mcdsun!fnf    (602) 438-5976o

daveh@cbmvax.UUCP (Dave Haynie) (07/16/87)

in article <1385@crash.CTS.COM>, mercurio@crash.CTS.COM (Phil Mercurio) says:
> Keywords: Amiga hard disk Supra horror story
> 
> The following is a description of a close call I had with my 20 MB
> Supra hard disk on my Amiga.  

Sounds like a long visit from the disk validator.  This can happen when the
system crashes somehow in the middle of a disk operation, even if the crash
occurs at a relatively safe time, like when no disk activity is actually 
taking place.  The disk's bitmap gets marked invalid, some operations take
place, and before the disk get marked valid again, the crash takes place.
Next time you start up the disk's handler, it validates the bitmap, which on
a nearly full 20Meg hard drive certainly could take some time.  The SupraMount
command is probably why this happened before you could see the disk; on a
HD that get's mounted via BindDrivers, the disk usually shows up as soon as
the system comes up, even though the validator may run for some time.

How is the Supra Drive?  A friend asked me about a problem he's having with it
in combination with a ComSpec memory card; maybe you or someone else out there
has some ideas.  He's got an Amiga with a 68010 (running DeciGel), the 
2 meg ComSpec, and the Supra.  When the ComSpec and Supra are used together,
the machine gurus in WorkBench.  Used separately, they work OK.  And he hasn't
been able to make them crash together from CLI.  I'm not at all familiar
with the Supra Drive, but that SupraMount command is the first thing that
makes me suspicious.

> (there aren't
> many cows here in San Diego -- I'm from Chicago, where cows have been
> known to cause trouble in the past).

We have lots of cow problems here in PA, so I can sympathize.  Can't keep the
suckers out of lab or the computer rooms....

> Phil Mercurio
> DevWare, Inc.
-- 
Dave Haynie     Commodore-Amiga    Usenet: {ihnp4|caip|rutgers}!cbmvax!daveh
"The A2000 Guy"                    PLINK : D-DAVE H             BIX   : hazy
     "Catch a wave and you're sittin' on top of the world" -Beach Boys

mph@rover.UUCP (Mark Huth) (07/16/87)

In article <1385@crash.CTS.COM> mercurio@crash.CTS.COM (Phil Mercurio) writes:
>
>The following is a description of a close call I had with my 20 MB
>Supra hard disk on my Amiga.  
> .........
>Open()'ing "prt:" -- nothing magical going on here.  After a few seconds of
>disk activity, the system froze (no Guru, just an unresponsive system).
> .........
>We administer the Amiga/Vulcan nerve pinch (CTRL-Ah-Ah) and the system
>reboots.  The first command in my Startup-Sequence is Supramount, as
>it should be.  Upon the attempt to be mounted, the Supra starts blinking
>its busy light madly and continuously (it normally mounts in a second
>or so).  This went on for several seconds, maybe a minute, before my
>friend and I realized something was not right.  We rebooted again, and
>
This is normal behavior for a hard disk which has been rebooted with an invalid
bit map!  Those of you with hard disks, do not panic!  The disk validator (I
think) is performing its proper function.  When the drive is mounted the
validator checks out the drive, and discovers that the bit map on the disk is
INVALID.  This usually happens when the system has crashed with files opened
for writing.  Whell, it takes a long time (5 -10 minutes) for the validator to
examine EVERY sector on the drive and determine if it is allocated to a a valid
file or not, and then repair the bit map.  LEAVE IT ALONE while this is
happening a give thanks to those that designed this program WHICH IS SAVING your
hard disk from becoming a useless bit bucket.

I learned this from hard experience.  I am affiliated with a company which is
developing yet another hard disk for the Amiga.  It works through the parallel
port (groan - but there are good points to this as well) and is rather
inexpensive.  I'll give more details of this RSN, as the last bugs are being
ironed out.  Anyway, during the course of development of drivers and backup
utilities, I have left many a hard drive in limbo.  The first time that I
rebooted and the light came on and stayed on I was quite discouraged - so much
so that I couldn't do anything for several minutes.  Fortunate indeed, as I
discovered what was really happening when the light went out and continued the
startup sequence.  Well done!  (Actually, shades of Un*x after a crash.)

Another thing that I have learned is that the DiskDoctor program does work
on properly designed hard drives.  Having gotten the drive to work well enough
that I refused to work without it, I managed to discover some lurking hardware
bugs that occasionally corrupted the data on the disk.  After repairing these
bugs (not yet having developed the backup program) I was faced with many busted
directories.  So, with nothing to lose, I ran DiskDoctor.  It ran, and ran, and
reported its completion.  I did a dir on the drive and found to my horror that
it was empty!  I poked around with a disk dumping routine I've written, and
discoverd that everything was intact, but disconected form the root directory.

Well, thought I, this can be patched (and if Commodore would get off their
duffs and send us the developer kit for which we have paid, I might have used
DiskEd).  I decided to rerun DiskDoctor, and after some thrashing and the
trashing of some unimportant files, it completed.  Now the disk was intact!!
Someone has a sense of humor, though, because what used to be JE_sys_disk was
now called Lazarus.  I ran development on it for several more weeks until just
this weekend I was able to get both backup and restore to perform useful work.

FLAME ON - full heat to Commodore

Jefferson Enterprises played your silly developer games, was granted the 
priviledge of sending in our 50 bucks.  This procedure took several months, but
three months ago, WE SENT IN OUR MONEY.  We called a month later and were told
that the management shakeup had slowed things down a bit - be patient.  Two more
months, have elapsed, and now we get an answering machine and most recently a
recording telling us that the number had been disconnected.  We really don't
want to turn Commordore into the Postmaster General for mail fraud, but 
WHERE IS THE DEVELOPER KIT!!!!!!!!!  You took our money, using the US Mail, now
I want the goodies!  We would relly like to make hardware for the expansion port
but I find that difficult without the bus timing information that is supposed
to be in the developer kit.  We got plans for a really spiffy controller that
will have performance approximating  a RAM:disk but WE NEED INFORMATION!

FLAME OFF - enter keyboard cool down period.


Mark Huth
seismo!nogo!mcdsun!rover!mph

>Phil Mercurio
>DevWare, Inc.
>
>mercurio@pnet01.CTS.COM     Usenet
>mercurio                    PeopleLink or GEnie

peter@sugar.UUCP (Peter da Silva) (07/19/87)

> Ya know, some people complain about the poor performance of AmigaDOS,
> but ask yourself something - have you ever lost a disk because of
> the DOS?  In my experience it has ALWAYS been media failure, or
> copy protection failure.
> 
> 	Paul Higginbottom.

So the only criterion we should apply to a file suystem is whether or not
it crashes the disk. Great.

Yes, of course I disagree. Speed is important. It shouldn't take more time
to get a directory listing than to read a file of the same size. This means
that on the Amiga floppy you should not have to do more than one disk access
to get the current directory until the total size of all the file headers in
the current directory exceeds 5K.

At the very least you should *attempt* to put the file headers contiguously
after the directory, to the point of not allocating any other blocks on a
directory track until there simply isn't any more room elsewhere. Ideally you
should do a UNIX-style file system with preferential caching of inodes, then
directories, then files. It's also probably a good idea not to cache large
files that are being read sequentially. Yeh, and how about sequential block
prefetch and disk seek optimisation. Then you get loadseg to request all
blocks at once, to cut out the dreaded "run grind" you get after you've run
a program (or selected it form the workbench) and then thoughtlessly typed
another command that accesses the same disk.

Of course, your mileage may vary.
-- 
-- Peter da Silva `-_-' ...!seismo!soma!uhnix1!sugar!peter (I said, NO PHOTOS!)

carolyn@cbmvax.UUCP (Carolyn Scheppner CATS) (07/21/87)

In article <419@rover.UUCP> mph@rover.UUCP (Mark Huth) writes:
>[]
>FLAME ON - full heat to Commodore
>
>Jefferson Enterprises played your silly developer games, was granted the 
>priviledge of sending in our 50 bucks.  This procedure took several months, but
>three months ago, WE SENT IN OUR MONEY.  We called a month later and were told
>that the management shakeup had slowed things down a bit - be patient.  Two more
>months, have elapsed, and now we get an answering machine and most recently a
>recording telling us that the number had been disconnected.  We really don't
>want to turn Commordore into the Postmaster General for mail fraud, but 
>WHERE IS THE DEVELOPER KIT!!!!!!!!!  You took our money, using the US Mail, now
>I want the goodies!  We would relly like to make hardware for the expansion port
>but I find that difficult without the bus timing information that is supposed
>to be in the developer kit.  We got plans for a really spiffy controller that
>will have performance approximating  a RAM:disk but WE NEED INFORMATION!
>
>FLAME OFF - enter keyboard cool down period.
>
>[]

GOOD NEWS:

   Lauren says that she has spoken to you about the delay.  Your package
was mailed about 2 months ago to a Post Office Box which was the wrong
address, and we were not aware of this until it was returned to us
recently.  I am not sure if the PO Box address was supplied on your
application or if that was an error at this end.  We apologize for
the delay.  Your package was remailed to your correct address last
week and you should be receiving it very soon.

   
BAD NEWS:

   The Certified developer package does not include DiskEd (which will
be on the Software Tools product), and does not include bus timings
(these are in the A1000 Schematics and Expansion Specs).  

   It does include a 1 year Amigamail subscription, $20 BIX discount
coupon, 1.2 Enhancer (which may be returned un-opened for any of
our $20 support materials), and a discount hardware price list.
The hardware discounts alone are worth the $50.

-- 
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Carolyn Scheppner -- CBM   >>Amiga Technical Support<<
                     UUCP  ...{allegra,caip,ihnp4,seismo}!cbmvax!carolyn 
                     PHONE 215-431-9180
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

carlos@io.UUCP (Carlos Smith) (07/22/87)

Dave Haynie had mentioned his suspicion of the "Supramount" command. It so
happens that I recently spoke with a tech support guy at Supra, and asked
him about it. He said that the supramount command uses the information in
the mountlist along with a file called Supra.0 in the devs: directory to
mount the drive. All that is in Supra.0 is partioning information for the
drive. He said that if one were to examine the information in Supra.0 and
use it to build the additional partitions in the mountlist, that one could
then just use the normal mount command. 

I examined this file, and sure enough, it seemed to mostly be low cylinder 
and high cylinder numbers for the partitions. Then looking at the mountlist
information, it appeared obvious how to explicitly define separate partitions
for the hard disk using this information. I will try this next time I 
repartition the disk, and see if it really works this way.

I guess they do this because they have a nice utility that you can run at
disk configuration time that lets you easily define partitions using
gadgets for the number and size of partitions. The mountlist contains only
information for the dh0: device. I figure that rather than mucking around with
the mountlist when you set up partitions, or making the user do it, they set
up this file and let the supramount command to do it "magically". They also 
say that this set up causes no problems with the new, hard-disk optimized
file system which they say they have been testing.

Anyway, I am quite happy with the Supra. I run it daisy-chained with a CLtd
Amega board and have never had problems (the AMega is inboard of the Supra
controller). It's also nice to have a clock-calendar built in (though I have
had the date trashed by particularly violent crashes - copper going crazy,
weird sounds from the audio, etc.).
-- 
			Carlos Smith
			uucp:...!harvard!umb!ileaf!carlos
			Bix:	carlosmith

charles@hpcvcd.HP (Charles Brown) (07/30/87)

>> Ya know, some people complain about the poor performance of AmigaDOS,
>> but ask yourself something - have you ever lost a disk because of
>> the DOS?  In my experience it has ALWAYS been media failure, or
>> copy protection failure.
>> 	Paul Higginbottom.

I am not sure it is so clear whether a particular failure is caused
by the operating system or by the media.  Well, maybe the failure
really is in the media, but the operating system is not very
convenient about recovering the disk.  I want to be able to recover
as much as possible from a corrupt file.  AmigaDo* PROTECTS me from
that.

>At the very least you should *attempt* to put the file headers contiguously
>after the directory, to the point of not allocating any other blocks on a
>directory track until there simply isn't any more room elsewhere. Ideally you
>should do a UNIX-style file system with preferential caching of inodes, then
>directories, then files.
>-- Peter da Silva `-_-' ...!seismo!soma!uhnix1!sugar!peter

One of the nice things about Un*x, with its inodes is linking.  Is
the Amig* file system able to do this?  I sorely miss it.
	Charles Brown	hplabs!hp-pcd!charles