[comp.sys.amiga.hardware] Request FEEDBACK on A3000 Glitch

jesup@cbmvax.commodore.com (Randell Jesup) (03/24/91)

[ followups to comp.sys.amiga.hardware, .tech is supposed to be gone ]

In article <1991Mar21.180521.9469@swbatl.sbc.com> jburnes@swbatl.sbc.com (Jim Burnes - 235-0709) writes:
>Has anyone noticed any hard drive errors on new A3000's.  We've been 
>getting CRC erros on our Quantum hard drives.  Whats really wierd is
>that after running continous overnight diagnostics on the system, the
>errors won't occurr.  After bringing this machine (and were not the
>only one in St. Louis that has this problem) ...after bringing it to
>our dealer MULTIPLE TIMES and having every major system componenet
>swapped out the errors keep occurring.  Sometimes they occurr when
>running the 2.0 version of DPAINT III.  Sometimes we can't even predict
>a crash.  I'm beginning to think this is a engineering design problem.
>Some sort of strange bus activity/noise problem that causes the machine
>to fail.

	There are other possibilites.  You may be getting glitches on your
power lines (these can be REALLY confusing), so it might happen only at
your home.  It can be hard to diagnose unless lights or other equipment show
changes (like dimming or brightening).

	It could be a PD or commercial utility leaving "time bombs" 
unintentionally.  It may be an interaction of two incorrect programs (for
example, one write to location 0, the other reads it and assumes it's 0).

	What version of 2.0 are you running?  Does it happen if you run 1.3?
(You should have 2.02 (version 36.207)).  There was a WB disk for 2.02, did
you update your system using it?

	The sort of thing you mention is a theoretical possibility, but that
sort of error is quite unlikely assuming you did swap motherboard/disk/PS.
(You did swap those, if I read your message right.)

>My Commodore dealer is stumped because everyone he calls at Commodore
>says that they havent heard a thing about this.  I think they are 
>lying.  More than one of his customers has had the same exact problem
>If Commodore won't admit to it, I'm forced to take an informal survey.

	I certainly haven't heard anything like this, but then again I'm
a software engineer, not hardware, let alone PA or Service.

NOTE: this is (as usual) a personal unofficial comment/opinion, not that of
Commodore or any branch thereof.
-- 
Randell Jesup, Keeper of AmigaDos, Commodore Engineering.
{uunet|rutgers}!cbmvax!jesup, jesup@cbmvax.commodore.com  BIX: rjesup  
The compiler runs
Like a swift-flowing river
I wait in silence.  (From "The Zen of Programming")  ;-)

daveh@cbmvax.commodore.com (Dave Haynie) (03/26/91)

In article <20071@cbmvax.commodore.com> jesup@cbmvax.commodore.com (Randell Jesup) writes:
>[ followups to comp.sys.amiga.hardware, .tech is supposed to be gone ]

>>I'm beginning to think this is a engineering design problem.  Some sort of 
>>strange bus activity/noise problem that causes the machine to fail.

>	There are other possibilites.  You may be getting glitches on your
>power lines (these can be REALLY confusing), so it might happen only at
>your home.  

Actually, the power line would be my guess.  We had a similar problem back in
the early days of the A2620.  The systems worked fine in the offices, but 
failed at my lab station.  I was chasing down a failure that kicked on about
once every two days.  Since I thought there was some obscure timing problem,
all eyes were on me, while the UNIX software folks merrily worked along without
a glitch.  I eventually caught a power line induced glitch on my system, which
had the effect of freezing everything up.  Of course, I wasn't using a hard 
disk system there, just the CPU running in a constant loop.  A line glitch can
as easily clobber a hard disk as it can a CPU.

As far as long term hard disk activity, I don't know what QA has done, but I
have run tests of my own.  I set an A3000 up in the lab with three hard disks;
the built-in, an A2091, and a Hardframe.  They were each grabbing stuff from
each other, deleting, copying, etc. in a tight loop.  This ran until I took it
down, a little over two weeks, without a single problem.  I think most of us
using A3000s find similar reliability -- they stay up until you crash them or
the power goes out (not totally unheard of around here).  

If you're running into this kind of thing, where the machine fails at home,
works fine at the repair shop, examine what's being done closely.  You of
course have to make sure the tests at the shop mimic the failure mode at
home -- you're not likely to have a problem in 5 minutes at a shop if you don't
see the problem at home for hours.  But if they shop can't make it fail, 
there's a real good chance its not your machine.  Radio Shack currently sells
some kind of power line monitor thing for about $20 or so which may be useful
in tracking down glitches (I haven't used the thing, we have an expensive one
here at work which prints out fluctuations on paper tape rather than via LEDs).

>	It could be a PD or commercial utility leaving "time bombs" 
>unintentionally.  It may be an interaction of two incorrect programs (for
>example, one write to location 0, the other reads it and assumes it's 0).

Which is why it's so important to duplicate the exact failure mode if you're
trying to show it off at the store.  If BonzoPaint left a time bomb, you're
not likely to get the same failure with IggyCAD no matter how long you play
with it.

>	The sort of thing you mention is a theoretical possibility, but that
>sort of error is quite unlikely assuming you did swap motherboard/disk/PS.
>(You did swap those, if I read your message right.)

It's theoretically possible, but very unlikely.  When solving problems, you
eliminate the likely first, before venturing into the unlikely.  

>>My Commodore dealer is stumped because everyone he calls at Commodore
>>says that they havent heard a thing about this.  

I personally have heard relatively isolated stories, on occasion, of such 
random crashes, on every system I have worked on, from the PLUS/4 and C128
on up to the A3000.  It's never proven to be a design flaw.  Always something
else -- bad power line spikes, flakey or misinserted expansion boards, 
software uglies, etc.  Sure, there's always a first time for everything, but
a design flaw of any kind kicks up a large number of failures, and we do
indeed hear about it.  Very loudly, in fact, if its anything that's to do with
engineering, rather than production problems.

>>I think they are lying.  

I don't understand this attitude that Commodore is some evil giant intent on
taking your money and delivering systems than give you heartburn.  Maybe if
we were IBM, I could understand it.  But really, there's no heinous cover up
or anything here, and all our index fingers are shorter than our middle 
fingers.  And you're talking here to the engineers, too, not The Great
Uninformed.

>	I certainly haven't heard anything like this, but then again I'm
>a software engineer, not hardware, let alone PA or Service.

And I can't say for certain that service hasn't isolated some kind of problem
either, anything's possible.  But they do contact us about problem they can't
solve.  And I haven't been hiding or anything.  So I do suggest investigation
of those other areas I have mentioned.

>Randell Jesup, Keeper of AmigaDos, Commodore Engineering.

-- 
Dave Haynie Commodore-Amiga (Amiga 3000) "The Crew That Never Rests"
   {uunet|pyramid|rutgers}!cbmvax!daveh      PLINK: hazy     BIX: hazy
      "That's me in the corner, that's me in the spotlight" -R.E.M.