[comp.sys.apollo] Can you swap drive electronics packages?

casey@CS.UCLA.EDU (10/11/88)

  We recently had a problem with one of our DN4000's losing it's disk.
This was a big problem since we still don't have any comprehensive back up
plan and that drive hadn't ever been backed up (presumably our 8mm video
back up system should be arriving in about a week now).

  Well the Apollo hardware person came in.  She swapped disk controller
and no go.  Then she swapped disks and wa la, it worked.  Well, as far as
she was concerned, that was it, she was ready to wrap up her tools and
head for home.

  But wait I said, we really can't do this unless it's absolutely
necessary!  I mean, that drive error didn't look like a media failure;
why don't we try swapping the drive electronics packages?  ``What?'' was
all she said.  So I had to show her the little circuit card on the drive
and explain its function.  She looked dubious.

  In any case, I finally convinced her to call into head quarters and ask
whether it was possible.  The guy there, also recalcitrant, finally gave
the go ahead as long as ``no soldering is involved.'' These people really
know their work.  So we swapped the electronics packages (three minutes
of work) and the old disk worked perfectly and passed all her standard
tests.

  Now, one day later I get a call from that Apollo hardware person and
she wants to come back and swap the drive out anyway because they once
swapped drive electronics packages, or they heard of someone swapping
drive electronics packages once, and the drive failed two days later.

  What I want to know is this: am I dealing with superstition or a
professional opinion?  I'm willing to do the work if it's indeed
necessary, but that's a day of my time wasted if it's not.

  And, while I'm on the subject, if it does turn out that swapping drive
electronics packages is legitimate, why doesn't Apollo do it by default?
It's only three minutes of work and even if you did do a back up the
night before, you're going to lose work you didn't have to if it's the
drive electronics package that's failed.

Casey

krowitz@RICHTER.MIT.EDU (David Krowitz) (10/13/88)

Actually, come to think of it, when our MSD-500's failed (three of them!),
one of the things they did try was swapping the electronics on the drive ...
which they did after trying a new controller board. Of course, it's more
of a pain for them to swap out a whole MSD-500 than it is to swap out one
of the 5 1/4" drives (the MSD-500 must weigh at least 100 lb).


 -- David Krowitz

krowitz@richter.mit.edu   (18.83.0.109)
krowitz%richter@eddie.mit.edu
krowitz%richter@athena.mit.edu
krowitz%richter.mit.edu@mitvma.bitnet
(in order of decreasing preference)

casey@admin.cognet.ucla.edu (Casey Leedom) (10/14/88)

| From: krowitz@richter.MIT.EDU (David Krowitz)
| 
| Just out of curiosity ... what was the error message that lead you to
| suspect the electronics on the drive and not the HDA? Every time we've
| had a drive fail, it has turned out to be the HDA (of course, they
| weren't 5 1/4" drives, they were the MSD-167 in the DN460/660 or the
| MSD-500 in the DSP-80).
| 
| Actually, come to think of it, when our MSD-500's failed (three of them!),
| one of the things they did try was swapping the electronics on the drive
| ...  which they did after trying a new controller board. Of course, it's
| more of a pain for them to swap out a whole MSD-500 than it is to swap
| out one of the 5 1/4" drives (the MSD-500 must weigh at least 100 lb).

  It was really just a guess.  We hadn't heard any terrible screeching,
etc. and we weren't getting lots of bad data off the drive, it was just
not really there as far as the software was concerned.  It *felt* like a
piece of electronics going away.  I might very well have been wrong, but
as it turns out, I wasn't.

  I also have had an awful lot of experience with maintenance of larger
drives where engineers would come in and swap out various modules of the
drive subsystem.  That's why I was surprised that the Apollo service type
wasn't going to try replacing the drive electronics PCB.

  In any case, they came yesterday and forced me to replace the drive.
I'm really pretty pissed about this because I still think that it was
more superstition than professional opinion on their part.  So now I get
to waste a day reinstalling software and restoring a back up I made the
night before.

Casey

casey@admin.cognet.ucla.edu (Casey Leedom) (10/14/88)

| From: hollaar@cs.utah.edu (Lee Hollaar)
| 
| Generally, the manufacturers of the small winchester drives view them as
| a complete unit, without field-replaceable parts.  Warranty service, or
| having them fixed by the manufacturer, consists of returning the complete
| unit.  They really aren't set up to handle board fixes in the field, or
| at least don't like to do it.

  Oh, and so why is it so easy to swap the drive electronics package?
All that was required to do it was take out two screws, unconnect a
couple of cables, and then slip the new PCB in.  It seems to me that the
drive manufacturer (Micropolis in this case) has gone to great pains to
house a modular component on a replaceable platform.  It sounds more to
me like Apollo and other companies just aren't taking advantage of that.

Casey

--------
Bush doesn't fall on his tongue and Dukakis doesn't ``bat a home run'' -
therefore Bush ``won'' the debate.  At least that's what the media told
us afterwards.  I love not having to think for myself any more.

giebelhaus@hi-csc.UUCP (Timothy R. Giebelhaus) (10/15/88)

What you are dealing with is no proceedure.  That is, there is not
proceedure set up at the current time.  Thus, there is no knowledge
of what will happen if the entire disk is not swapped.  I will forward
the idea of further investigation of seperating the electronics
and the media of the drive.

In the mean time, I would let your service engineer replace the 
entire drive unless you are into taking risks.
-- 
UUCP: {uunet, ihnp4!umn-cs}!hi-csc!giebelhaus
ARPA: hi-csc!giebelhaus@umn-cs.arpa
Nobody I know admits to sharing my opinions.  I don't even have a pet
which will share my opinion.

giebelhaus@hi-csc.UUCP (Timothy R. Giebelhaus) (10/15/88)

OK, I talked to a technical manager (though this is still an unofficail
piece of news).  The reason they swap out the entire unit is:

  1) Swapping the entire unit makes the hardware fix vendor
     independant.  That is, a hardware engineer does not have to 
     know who supplied the drive to service the drive; she can
     simply replace a 155 meg drive with a 155 meg drive (not 
     necessarally the same drive).

  2) Many drives would take too much time to debug.  It is much more
     efficient to send the entire drive back to the product repair
     center where it can be debugged than to debug the thing in
     the field.

There is no guarantee that the electronic which was replaced on
your disk is 100% compatible with the disk its self.  They may
be different revisions or entirely different.
-- 
UUCP: {uunet, ihnp4!umn-cs}!hi-csc!giebelhaus
ARPA: hi-csc!giebelhaus@umn-cs.arpa
Nobody I know admits to sharing my opinions.  I don't even have a pet
which will share my opinion.

casey@admin.cognet.ucla.edu (Casey Leedom) (10/18/88)

| From: giebelhaus@hi-csc.UUCP (Timothy R. Giebelhaus)
| 
| What you are dealing with is no procedure.  That is, there is not
| procedure set up at the current time.  Thus, there is no knowledge of
| what will happen if the entire disk is not swapped.  I will forward the
| idea of further investigation of separating the electronics and the media
| of the drive.
| 
| ... OK, I talked to a technical manager (though this is still an unofficial
| piece of news).  The reason they swap out the entire unit is:
| 
|   1) Swapping the entire unit makes the hardware fix vendor independent.
|      That is, a hardware engineer does not have to know who supplied the
|      drive to service the drive; she can simply replace a 155Mb drive
|      with a 155Mb drive (not necessarily the same drive).
| 
|   2) Many drives would take too much time to debug.  It is much more
|      efficient to send the entire drive back to the product repair center
|      where it can be debugged than to debug the thing in the field.
| 
| There is no guarantee that the electronic which was replaced on your disk
| is 100% compatible with the disk itself.  They may be different revisions
| or entirely different.

Tim,
  Thanks a lot for checking into this for us!!!  We really appreciate it!

  However, point 2. is totally irrelevant.  As I said in my earlier
notes, it only took three minutes to swap the electronics boards.  This
gives you only three components which need to be swap-tested: the
controller sitting in the system bus, the drive electronics board, and
the drive itself.  This gives you a very easy procedure:

1. Swap controller in system bus.
   If problem goes away, you're done.  Stop.

2. Swap drives.
   If problem goes away, you've narrowed down the problem:
	Swap old drive back in and install new drive electronics board.
	If problem goes away, you're done.  Stop.
	otherwise, it's the media:
		Swap drive electronics boards back and install new drive.
		Stop.

  The extra steps in step 2 only involve about ten minutes worth of work.
At the rate we're paying for maintenance, it's definitely justified to
save the data on the disk.

  Your first point is slightly more relevant, but not very.  It's a minor
point to keep track of what drives are in which machines and just bring
the correct type of replacement.  Apollo doesn't stock that many 155Mb
drives to make this a problem.

  The only really relevant point seems to be the drive electronics
compatibility point in your final paragraph.  I'm not aware of any
guarantee from Micropolis (or any other vendor) which says that these are
standard parts on their drives that who's interface to the rest of the
drive won't change with succeeding releases of their drive product.  Now,
if they *do* make some sort of statement to that effect, I don't think
Apollo (or any other maintenance organization) has a leg to stand on in
refusing to do the drive electronics swapping.

| In the mean time, I would let your service engineer replace the entire
| drive unless you are into taking risks.

  Well, that's what we did.  But it was sure nice to have that drive back
for the one day with the new electronics package so we could recover the
data.  I made a back up, we installed the new disk, I reinstalled the OS,
I restored the files from the back up, and I lost a day and the people
using the machine lost three days in the process.

Casey

--------
The American public wants to be lied to.  They want to be told that they
are the envy of the world.  They want to be told that things are ok.
Dukakis points at the serious problems that must be addressed and
proposes some of the hard decisions that he thinks will be necessary.
Bush tells us that everything is wonderful.  Who's ahead in the polls?