[comp.sys.ibm.pc] Hard drive problem

bob@menno.UUCP (Robert Schwartz) (02/17/90)

I'm in charge of a small microcomputer lab (10 Franklin PC8000's with 
Seagate 225 hard drives attached).  Lately about half of our machines
have been exhibiting unusual behavior:  they have lost their ability to
boot.  We've run them for months without problems and now some refuse to
boot.  First fix attempt: reload the hidden system files.  Sometimes
this helps, sometimes it doesn't.  Second fix: reformat and reload the
software.  A Real Pain, but that's what undergraduate employees are 
for :-).  This usually doesn't change the situation much.  However, just
after the format, when only the system is loaded, it will boot fine.
After reloading all our software, nothing.  Final attempt:  Low-level
format.  Not even our employees are very willing to try this.  It
sometimes has to be done twice before things work again.  

So my questions are these:  what is wrong?  have we been victimized by a
virus of some sort?  Or should we just live with it and pay our
employees more?

-- 
Robert Schwartz				...texbell!ncrwic!menno!bob
Department of Computer Science          
Bethel College				bob@bethelks.edu
N. Newton, KS 67117

bbesler@vela.acs.oakland.edu (Brent Besler) (02/20/90)

Since so many of the machines are exhibiting the problem, it sounds like you
have a virus problem.  Unfortunately you will probably reformat and reload
ll of the software to make sure the virus is gone.  You are also going to have
to pay special attention to make sure no contaminated disks are put back
into the machines.
                                               Brent H. Besler

chan@chansw.UUCP (Jerry H. Chan) (02/20/90)

In article <267@menno.UUCP>, bob@menno.UUCP (Robert Schwartz) writes:
> ...
> Lately about half of our machines
> have been exhibiting unusual behavior:  they have lost their ability to
> boot.  We've run them for months without problems and now some refuse to
> ...
> So my questions are these:  what is wrong?  have we been victimized by a
> virus of some sort?  Or should we just live with it and pay our
> employees more?

I have seen a disportionate number of SEAGATES exhibit this problem, and for
that reason do not use SEAGATES in my systems anymore.  I believe that the
problem is thermal sensitivity.  If you powerdown your systems in the
evenings, this problem usually manifests itself by an unreadable cylinder
0 (guess where the boot info is kept on your disk?) for the first 5 or so
minutes until the disk warms up.  This problem was pseudo-confirmed by the
tech support person I deal with at my distributorship, in particular, with
ST225's and ST251's (although I have seen at least one ST4096 exhibit this
problems too).

The fix (IMHO):

Option A: If your drives are still under warranty, swap them out under warranty
   and keep your eye on the new critters; unfortunately, it sometimes takes up
   to several months (in my experience) for this problem to rear its ugly head.

Option B: Low-level format -- sometimes this will help for a while, but from
   experience, the problem will probably come back to haunt you.  (Low
   level format with DEBUG, g=c800:5, Disk Manager, Speedstor, etc.)

Option C: Keep your machines powered on 24 hours a day, for obvious reasons.

To be fair, I have seen identical problems with several Miniscribe's 20M 40msec
3.5" drives, as well as with a Connors (in my laptop).  My fix for these
have been warranty swaps. I consider this to be an industry-wide engineering
challenge for all disk manufacturers; in my opinion, Seagate hasn't fared too
well in this challenge.

Having seen my share of disk failures, I don't believe that the disk
manufacturers have quite the "handle" on the technology as they would like
us to believe -- there's still a fair amount of "black magic" to get these
things to work.
-- 
Jerry Chan 508-853-0747, Fax 508-853-2262  |"My views necessarily reflect the
Chan Smart!Ware Computer Services & Prods  | views of the Company because
Worcester, MA 01606                        | I *am* the Company." :-)
{bu.edu,husc6}!m2c!chansw!chan             \---------------------------------

steve@wintermute.ucsd.edu ({Darkavich}) (02/21/90)

Before you pannic and reformat your drive you should try a couple of things
first.  When this happens I try these things first.  1) try moving the unit
90 degres, if on its side put it on its back or vice versa.  This usually
works.  2) try looking at the cables in the machine, they may be loose or
in need of replace.  I always keep a spare set of cable around.  (they are
only $3-4.  A great investment when they go bad.  If neither of these work
then try booting from a floppy and switch to the C: drive (this will only
work on the first disk partition) if you can read all the files but not
boot from it then transfer the system files over.  This has some random
behaviours depending on how your disk partitions were set up.  I believe
(don't quote me on this) that fdisk partitions that have an extended
partition will get messed up.  This happened to me a long long time ago and
I do not remember if this was the definate cause.  If all else fails then
reformat the drive.  It is always a good idea to do a low level format of
the drive when reformating because it will look for bad sectors on the
drive and mark them off if they exist. you can get software that will automate
the entire format procedure. 
	
	Hope this helps.

	Steve Misrack
	steve@ucsd.edu

wdarden@nrtc.nrtc.northrop.com (Bill Darden <wdarden>) (02/21/90)

Before doing a low level destructive format, you might try SpinRrite
II and Norton's Disk Doctor or Mace's Emeregency Room.  Spinrite II
will do a non-destructive low level format and optimise your
interleave factor.  These are all short termed fixes.  If you run
critical applications, you should get rid of the drive.

Good luck,

BiLL.....

bbesler@vela.acs.oakland.edu (Brent Besler) (02/22/90)

I have a shareware program called HD-TEST that will do non-destructive low level
formats also.  I can get the address if anyone is interested.
                                        Brent H. Besler 

elund@pro-graphics.cts.com (Eric Lund) (02/22/90)

> Before doing a low level destructive format, you might try SpinRrite
> II and Norton's Disk Doctor or Mace's Emeregency Room.  Spinrite II
> will do a non-destructive low level format and optimise your
> interleave factor.  These are all short termed fixes.  If you run
> critical applications, you should get rid of the drive.

I never thought I'd be one to look for room to plug Spinrite II.  I've heard
of a lot of problems using the program.  Stopping the program (by following
the instructions) can be dangerous and you risk losing boot capability and
maybe some data.  That is a scary factor, but it may help to know I had been
experiencing "Data error on drive c:" for quite some time, and then I took the
chance and ran Spinrite on both partitions.  In addition to solving all my
data error problems, Spinrite retrieved over 1/2 meg of "bad" areas.  

So I will reiterate my plug with the exception:  Spinrite II is excellent as
long as you let it run its course and don't interfere with it, no matter what
it tells you!  It's hard not to fiddle with it when it takes 8 hours to run.

As for Norton's Disk Doctor, it's worse.  Where as Spinrite II gave me new
space, Norton methodically marked every one of my sectors bad.  However, it
performs well on floppy media, reconstructing boots et al.

Eric

 ProLine: elund@pro-graphics
    UUCP: ...crash!pro-graphics!elund
ARPA/DDN: pro-graphics!elund@nosc.mil
Internet: elund@pro-graphics.cts.com

greg@phoenix (greg Nowak) (02/23/90)

In article <1617@crash.cts.com>, elund@pro-graphics (Eric Lund) writes:

>So I will reiterate my plug with the exception:  Spinrite II is excellent as
>long as you let it run its course and don't interfere with it, no matter what
>it tells you!  It's hard not to fiddle with it when it takes 8 hours to run.

This is the longest version of Spinrite, the only one that can restore
sectors marked bad to service. You can also use faster options to work
in two hours... I usually run the 8-hour job overnight.

>As for Norton's Disk Doctor, it's worse.  Where as Spinrite II gave me new
>space, Norton methodically marked every one of my sectors bad.  However, it
>performs well on floppy media, reconstructing boots et al.

Were you, perhaps, keeping your machine on its side? This can cause
hard disk problems in recognizing the physical format. Also, NDD and
Spinrite do different things: NDD doesn't do a reformat, but just
marks sectors bad; Spinrite will give you a new physical format. I've
been happy with both, and use them both now.

...!rutgers!phoenix.princeton.edu!greg

                           Greg Nowak/Phoenix Gang/Princeton NJ 08540

jdudeck@polyslo.CalPoly.EDU (John R. Dudeck) (02/23/90)

In article <1617@crash.cts.com> elund@pro-graphics.cts.com (Eric Lund) writes:
>I never thought I'd be one to look for room to plug Spinrite II.  I've heard
>of a lot of problems using the program.  Stopping the program (by following
>the instructions) can be dangerous and you risk losing boot capability and
>maybe some data.  That is a scary factor, but it may help to know I had been

I have been using Spinrite II ever since it was available, and have never
had problems interrupting it and restarting it.  Where did you hear this?
Perhaps you are thinking of the original Spinrite version?

I should mention that I always boot from floppy when running Spinrite II.
I have made up a diskette especially for that purpose, as they recommend.

While on the subject, I will mention that Spinrite II will handle large
partitions, whereas Spinrite only worked for 32k or smaller partitions.

-- 
John Dudeck                           "You want to read the code closely..." 
jdudeck@Polyslo.CalPoly.Edu             -- C. Staley, in OS course, teaching 
ESL: 62013975 Tel: 805-545-9549          Tanenbaum's MINIX operating system.

elund@pro-graphics.cts.com (Eric Lund) (02/24/90)

> In article <1617@crash.cts.com> elund@pro-graphics.cts.com (Eric Lund) writes:
>> I never thought I'd be one to look for room to plug Spinrite II.  I've heard
>> of a lot of problems using the program.  Stopping the program (by following
>> the instructions) can be dangerous and you risk losing boot capability and
>> maybe some data.  That is a scary factor, but it may help to know I had been
> 
> I have been using Spinrite II ever since it was available, and have never
> had problems interrupting it and restarting it.  Where did you hear this?
> Perhaps you are thinking of the original Spinrite version?

No, this was definitely Spinrite II.  I talked with several users (and saw the
frustration on their faces) as they tried to fix their drives.  The worst case
was an AT&T computer user who interupted the program as instructed, exited the
program, and turned off the machine.  Later, it wouldn't boot.  He had a hell
of a time (since it was an old AT&T dos) getting it up and running again.  He
had to install a new DOS on the system, and his old DOS didn't take kindly to
that idea.  He did get his system working, but it consumed several hours.

This incident is undeniably linked to Spinrite II, but if it was the
"interrupting" that did it, or the main program that did it, who knows.  I
also suspect his old system didn't help matters, but I'd rather believe the
fault lied in his interruption of the program.  

Eric

 ProLine: elund@pro-graphics
    UUCP: ...crash!pro-graphics!elund
ARPA/DDN: pro-graphics!elund@nosc.mil
Internet: elund@pro-graphics.cts.com

elund@pro-graphics.cts.com (Eric Lund) (02/24/90)

>> As for Norton's Disk Doctor, it's worse.  Where as Spinrite II gave me new
>> space, Norton methodically marked every one of my sectors bad.  However, it
>> performs well on floppy media, reconstructing boots et al.
> 
> Were you, perhaps, keeping your machine on its side? This can cause
> hard disk problems in recognizing the physical format. Also, NDD and
> Spinrite do different things: NDD doesn't do a reformat, but just
> marks sectors bad; Spinrite will give you a new physical format. I've
> been happy with both, and use them both now.

I've had my machine in various positions over the years.  I haven't noticed a
performance change in any position, although I agree that a side stand could
cause problems.    I don't know why NDD didn't like my hard disk, but I really
couldn't trust it.  EVERY sector marked bad?  Perhaps it just didn't like my
hard drive.  However, I can't see what's different with a Seagate ST-251-1.

Eric

 ProLine: elund@pro-graphics
    UUCP: ...crash!pro-graphics!elund
ARPA/DDN: pro-graphics!elund@nosc.mil
Internet: elund@pro-graphics.cts.com

jdudeck@polyslo.CalPoly.EDU (John R. Dudeck) (02/25/90)

In article <1643@crash.cts.com> elund@pro-graphics.cts.com (Eric Lund) writes:
>
>No, this was definitely Spinrite II.  I talked with several users (and saw the
>frustration on their faces) as they tried to fix their drives.  The worst case
>was an AT&T computer user who interupted the program as instructed, exited the
>program, and turned off the machine.  Later, it wouldn't boot.  He had a hell
>of a time (since it was an old AT&T dos) getting it up and running again.  He
>had to install a new DOS on the system, and his old DOS didn't take kindly to
>that idea.  He did get his system working, but it consumed several hours.
>
>This incident is undeniably linked to Spinrite II, but if it was the
>"interrupting" that did it, or the main program that did it, who knows.  I
>also suspect his old system didn't help matters, but I'd rather believe the
>fault lied in his interruption of the program.  

Hmm, interesting.  This would suggest that Spinrite II is not fully compatible
with some of the older or less-IBM compatible systems.  I don't know all the
details of how AT&T's are different, but I seem to notice that AT&T users
often have questions that nobody else seems to have.

Did the user follow the part of the instructions about having nothing else
in memory when running Spinrite?

As for problems installing a new version of DOS, this is of course one of
the things us DOS users always tear our hair out over, and isn't related
to AT&T, Spinrite, or anything else except the lack of thought of DOS's
designers.  In 9 cases out of 10, it is not worth the bother to try to
install a new DOS version without reformatting and then restoring your
files from backup.

-- 
John Dudeck                           "You want to read the code closely..." 
jdudeck@Polyslo.CalPoly.Edu             -- C. Staley, in OS course, teaching 
ESL: 62013975 Tel: 805-545-9549          Tanenbaum's MINIX operating system.

tim@lakesys.lakesys.com (Timothy Winslow) (02/26/90)

I have a 8Mhz XT Clone, and I have a small hard drive problem.  I have 2
10M hard drives.  One Seagate, one Cogito.  I have the Seagate as drive C,
and the Cogito as D.  I have a Western Digital WD1002A-WX1 controller.
When turning on the computer, the system NEVER recognizes D.  Then I reset
the system, and it recognizes it, no problem.  I hate this extra step.
I have the controller manual, and have set it up correctly, I am not
using anything weird.  I am just having to reboot once, everytime I turn
on my system.


-- 
Timothy Winslow/N9ICD                                    Philosophy in life:
tim@lakesys.lakesys.com                                   Anything goes as
or: uunet!marque!lakesys!tim                               long as I can
or: csd4.csd.uwm.edu!lakesys.lakesys.com!tim                  watch!

elund@pro-graphics.cts.com (Eric Lund) (02/26/90)

> Did the user follow the part of the instructions about having nothing else
> in memory when running Spinrite?

I can only presume so.  That would, however, be a viable explanation. 
Actually, I hope that was the problem.  I hate it when a problem relies in a
program, rather than a user.

                                                  
Eric W. Lund *DISCLAIMER "Disclaimers are for weak people."* Prodigy: xcbr22b
UUCP: ...crash!pro-graphics!elund *COWS FOR RENT* ProLine: elund@pro-graphics
Internet: elund@pro-graphics.cts.com ** ARPA/DDN: pro-graphics!elund@nosc.mil
 

coffman@plains.UUCP (Clark Coffman) (02/27/90)

In article <1704@lakesys.lakesys.com> tim@lakesys.lakesys.com (Timothy Winslow) writes:
[I have a 8Mhz XT Clone, and I have a small hard drive problem.  I have 2
[10M hard drives.  One Seagate, one Cogito.  I have the Seagate as drive C,
[and the Cogito as D.  I have a Western Digital WD1002A-WX1 controller.
[When turning on the computer, the system NEVER recognizes D.  Then I reset
[the system, and it recognizes it, no problem.  I hate this extra step.
[I have the controller manual, and have set it up correctly, I am not
[using anything weird.  I am just having to reboot once, everytime I turn
[on my system.
[
[
[-- 
  I've been watching to see if anyone has had this problem, but so far your're
the first one I've seen with the same problem. I've got two 20 meg drives
and occassionally the second one doen't come up, I've switched the drives
around but no go. 
  However, I have written a program that checks the second
drive at boot and tries to read a file, if it can't then it reboots the
my machine so I don't have to. If you don't find the hardware solution
let me know and if you want I'll send the program.
  I've dealt with drive and card manufacturers before and it really doesn't
seem worth while for a problem like this, even if they could help, which I
doubt. Good Luck.

 Hey, who else would you expect to be responsible for what I say?  
           ----------(=-   Dagda Mor   -=)-----------
 Clark W. Coffman                        coffman@sparky.UUCP
 Fargo, N.D. -  701-232-9531             coffman@sparky.NoDak.edu
                                         nu116215@ndsuvm1
                                         nu116215@vm1.nodak.edu

kaleb@mars.jpl.nasa.gov (Kaleb Keithley) (02/28/90)

In article <3654@plains.UUCP> coffman@plains.UUCP (Clark Coffman) writes:
>In article <1704@lakesys.lakesys.com> tim@lakesys.lakesys.com (Timothy Winslow) writes:
>[I have a 8Mhz XT Clone, and I have a small hard drive problem.  I have 2
>[10M hard drives.  One Seagate, one Cogito.  I have the Seagate as drive C,
>[and the Cogito as D.  I have a Western Digital WD1002A-WX1 controller.
>[When turning on the computer, the system NEVER recognizes D.  Then I reset
>[the system, and it recognizes it, no problem.  I hate this extra step.

>the first one I've seen with the same problem. I've got two 20 meg drives
>and occassionally the second one doen't come up, I've switched the drives
>around but no go. 

I had a similar problem, (slightly different symptoms,) that was solved by
getting a bigger power supply.  I'd suggest one of the 200W XT P.Ss that
are available for around $40 w/o a U.L. sticker, or about $50 w/ the U.L.
sticker.  I can't provide an algorithm to calculate which P.S. you really
need, but 135-150 watts is just not enough for two drives, floppy, color card,
and any extended/expanded memory you might have.

Chewey, get us outta here!
                 
kaleb@mars.jpl.nasa.gov            Jet Propeller Labs
Kaleb Keithley

amichiel@rodan.acs.syr.edu (Allen J Michielsen) (02/28/90)

In article <2928@jato.Jpl.Nasa.Gov> kaleb@mars.UUCP (Kaleb Keithley) writes:
>In article <3654@plains.UUCP> coffman@plains.UUCP (Clark Coffman) writes:
>>In article <1704@lakesys.lakesys.com> tim@lakesys.lakesys.com (Timothy Winslow) writes:
>>[ I have 2 drives.  Seagate, Cogito.  the Seagate as drive C,
>>[ a Western Digital WD1002A-WX1
>>[ When turning on.. NEVER recognizes D.  I reset
>>[ and it recognizes it,
>
>>the first one I've seen with the same problem. I've got two 20 meg drives
>>and occassionally the second one doen't come up, I've switched the drives
>>around but no go. 
>
>but 135-150 watts is just not enough for two drives, floppy, color card,
>and any extended/expanded memory you might have.
>
>

   I strongly suspect that the problem should be the time required for the
drives to spin up. Whenever D isn't ready when C it, this controller will
refuse to ident D.  Software that causes a controller reset may be possible...
If the problem is occassionally, as in the second post, I suspect the problem
is the variation in the amount of time required to spin up the drives.  Then
I would try using a park program before shutting down to see IF that affects
the pattern (i.e. makes one always slower).
   As for the Power supply not being big enough, open the case & measure the
voltage levels from a cold start.  The evidence in this case should be very
clear by watching the meter.
   As for 135 watts not being big enough,  I better go tell that to the 6 PC's
I have with antique 65watt supplies that run 24 hrs a day, have 2 floppies,
2 hard disks, ega's, a 80386 inboard 386 with the 80387 & 5 MB's ram, & 
2 serial, 2 par, clock, ieee 488 interfaces, etc.  I had only hoped these would
last long enough for the new 135 watt power supplies to be delivered, but have
long out done themselves at over a year now & not a problem.

al

Elbereth@moncam.co.uk (Dave Emmerson) (03/06/90)

It's one of those topics which won't go away, right?
Doubtless this has been said before, but while people keep asking
the same questions, I suppose they'll keep needing the same answers.

We thermally tested our 68020 VME system a couple of years back,
basically what we found was that all our boards (CPU's LS/F/HCT TTL,
PALs, GALs, LSI, R's and C's, Xtals - the lot) all worked fine up to
and beyond 70 Centigrade - plenty hot enough, you won't want to hold
them at that temp. The hard disks though were another story, NOT ONE
of the 24 assorted makes/models would continue to run reliably in an
ambient atmosphere of 40 Centigrade.

So what does this mean to you? Well look at the cooling system in your
machine, and see how well the most sensitive part of it is catered for:
in most cases it isn't, there's just an extractor fan on top of the
power supply- a good idea, but it only does half the job. Very little
of the air that it moves actually passes over the hard drive. If you've
spare spaces in the disk racking area, try to get the hard drives near
some vent holes, if there are any, or better, try to get a small fan
mounted where it will do some good. DON'T run the machine with the lid 
off! The hard drives will run hotter, not cooler, unless you use a fan.
In any case, check the exising fan for fouling with fluff, especially
if the room is carpeted. Tower style boxes which stand on the floor are
especially prone to gaining an interior fur coat, air intake filters 
don't help much if nobody cleans them regularly either! As a last resort,
if the HD is mounted in the front of your machine, you might try it
with the plastic bezel removed, or take it off, drill some vents in it
and replace it - any of the above will invalidate your warranty of
course.

Given that it is difficult to manufacture a precision drive with a 
wide temperature tolerance, perhaps they should sacrifice a little
height and add some small heatsink fins. 
Having seen the insides of a couple of those 'small footprint' machines,
I don't think I'd want one unless the HD was ultra-reliable. I think
I'll stick with the 'battleship mentality' style a while longer...

We were lucky of course, we remodelled our case before it went into
production, I wouldn't want to test PCs the same way!

Dave E.

news@udenva.cair.du.edu (netnews) (03/08/90)

In article <144@vela.acs.oakland.edu> bbesler@vela.acs.oakland.edu (Brent Besler) writes:
>I have a shareware program called HD-TEST that will do non-destructive low level
>formats also.  I can get the address if anyone is interested.
>                                        Brent H. Besler 

speedstore does this also. it will even change the interleave. very slow,
but occasionally useful.

tim