casey@CS.UCLA.EDU (10/11/88)
We recently had a problem with one of our DN4000's losing it's disk. This was a big problem since we still don't have any comprehensive back up plan and that drive hadn't ever been backed up (presumably our 8mm video back up system should be arriving in about a week now). Well the Apollo hardware person came in. She swapped disk controller and no go. Then she swapped disks and wa la, it worked. Well, as far as she was concerned, that was it, she was ready to wrap up her tools and head for home. But wait I said, we really can't do this unless it's absolutely necessary! I mean, that drive error didn't look like a media failure; why don't we try swapping the drive electronics packages? ``What?'' was all she said. So I had to show her the little circuit card on the drive and explain its function. She looked dubious. In any case, I finally convinced her to call into head quarters and ask whether it was possible. The guy there, also recalcitrant, finally gave the go ahead as long as ``no soldering is involved.'' These people really know their work. So we swapped the electronics packages (three minutes of work) and the old disk worked perfectly and passed all her standard tests. Now, one day later I get a call from that Apollo hardware person and she wants to come back and swap the drive out anyway because they once swapped drive electronics packages, or they heard of someone swapping drive electronics packages once, and the drive failed two days later. What I want to know is this: am I dealing with superstition or a professional opinion? I'm willing to do the work if it's indeed necessary, but that's a day of my time wasted if it's not. And, while I'm on the subject, if it does turn out that swapping drive electronics packages is legitimate, why doesn't Apollo do it by default? It's only three minutes of work and even if you did do a back up the night before, you're going to lose work you didn't have to if it's the drive electronics package that's failed. Casey
krowitz@RICHTER.MIT.EDU (David Krowitz) (10/13/88)
Actually, come to think of it, when our MSD-500's failed (three of them!), one of the things they did try was swapping the electronics on the drive ... which they did after trying a new controller board. Of course, it's more of a pain for them to swap out a whole MSD-500 than it is to swap out one of the 5 1/4" drives (the MSD-500 must weigh at least 100 lb). -- David Krowitz krowitz@richter.mit.edu (18.83.0.109) krowitz%richter@eddie.mit.edu krowitz%richter@athena.mit.edu krowitz%richter.mit.edu@mitvma.bitnet (in order of decreasing preference)
casey@admin.cognet.ucla.edu (Casey Leedom) (10/14/88)
| From: krowitz@richter.MIT.EDU (David Krowitz) | | Just out of curiosity ... what was the error message that lead you to | suspect the electronics on the drive and not the HDA? Every time we've | had a drive fail, it has turned out to be the HDA (of course, they | weren't 5 1/4" drives, they were the MSD-167 in the DN460/660 or the | MSD-500 in the DSP-80). | | Actually, come to think of it, when our MSD-500's failed (three of them!), | one of the things they did try was swapping the electronics on the drive | ... which they did after trying a new controller board. Of course, it's | more of a pain for them to swap out a whole MSD-500 than it is to swap | out one of the 5 1/4" drives (the MSD-500 must weigh at least 100 lb). It was really just a guess. We hadn't heard any terrible screeching, etc. and we weren't getting lots of bad data off the drive, it was just not really there as far as the software was concerned. It *felt* like a piece of electronics going away. I might very well have been wrong, but as it turns out, I wasn't. I also have had an awful lot of experience with maintenance of larger drives where engineers would come in and swap out various modules of the drive subsystem. That's why I was surprised that the Apollo service type wasn't going to try replacing the drive electronics PCB. In any case, they came yesterday and forced me to replace the drive. I'm really pretty pissed about this because I still think that it was more superstition than professional opinion on their part. So now I get to waste a day reinstalling software and restoring a back up I made the night before. Casey
casey@admin.cognet.ucla.edu (Casey Leedom) (10/14/88)
| From: hollaar@cs.utah.edu (Lee Hollaar) | | Generally, the manufacturers of the small winchester drives view them as | a complete unit, without field-replaceable parts. Warranty service, or | having them fixed by the manufacturer, consists of returning the complete | unit. They really aren't set up to handle board fixes in the field, or | at least don't like to do it. Oh, and so why is it so easy to swap the drive electronics package? All that was required to do it was take out two screws, unconnect a couple of cables, and then slip the new PCB in. It seems to me that the drive manufacturer (Micropolis in this case) has gone to great pains to house a modular component on a replaceable platform. It sounds more to me like Apollo and other companies just aren't taking advantage of that. Casey -------- Bush doesn't fall on his tongue and Dukakis doesn't ``bat a home run'' - therefore Bush ``won'' the debate. At least that's what the media told us afterwards. I love not having to think for myself any more.
giebelhaus@hi-csc.UUCP (Timothy R. Giebelhaus) (10/15/88)
What you are dealing with is no proceedure. That is, there is not proceedure set up at the current time. Thus, there is no knowledge of what will happen if the entire disk is not swapped. I will forward the idea of further investigation of seperating the electronics and the media of the drive. In the mean time, I would let your service engineer replace the entire drive unless you are into taking risks. -- UUCP: {uunet, ihnp4!umn-cs}!hi-csc!giebelhaus ARPA: hi-csc!giebelhaus@umn-cs.arpa Nobody I know admits to sharing my opinions. I don't even have a pet which will share my opinion.
giebelhaus@hi-csc.UUCP (Timothy R. Giebelhaus) (10/15/88)
OK, I talked to a technical manager (though this is still an unofficail piece of news). The reason they swap out the entire unit is: 1) Swapping the entire unit makes the hardware fix vendor independant. That is, a hardware engineer does not have to know who supplied the drive to service the drive; she can simply replace a 155 meg drive with a 155 meg drive (not necessarally the same drive). 2) Many drives would take too much time to debug. It is much more efficient to send the entire drive back to the product repair center where it can be debugged than to debug the thing in the field. There is no guarantee that the electronic which was replaced on your disk is 100% compatible with the disk its self. They may be different revisions or entirely different. -- UUCP: {uunet, ihnp4!umn-cs}!hi-csc!giebelhaus ARPA: hi-csc!giebelhaus@umn-cs.arpa Nobody I know admits to sharing my opinions. I don't even have a pet which will share my opinion.
casey@admin.cognet.ucla.edu (Casey Leedom) (10/18/88)
| From: giebelhaus@hi-csc.UUCP (Timothy R. Giebelhaus) | | What you are dealing with is no procedure. That is, there is not | procedure set up at the current time. Thus, there is no knowledge of | what will happen if the entire disk is not swapped. I will forward the | idea of further investigation of separating the electronics and the media | of the drive. | | ... OK, I talked to a technical manager (though this is still an unofficial | piece of news). The reason they swap out the entire unit is: | | 1) Swapping the entire unit makes the hardware fix vendor independent. | That is, a hardware engineer does not have to know who supplied the | drive to service the drive; she can simply replace a 155Mb drive | with a 155Mb drive (not necessarily the same drive). | | 2) Many drives would take too much time to debug. It is much more | efficient to send the entire drive back to the product repair center | where it can be debugged than to debug the thing in the field. | | There is no guarantee that the electronic which was replaced on your disk | is 100% compatible with the disk itself. They may be different revisions | or entirely different. Tim, Thanks a lot for checking into this for us!!! We really appreciate it! However, point 2. is totally irrelevant. As I said in my earlier notes, it only took three minutes to swap the electronics boards. This gives you only three components which need to be swap-tested: the controller sitting in the system bus, the drive electronics board, and the drive itself. This gives you a very easy procedure: 1. Swap controller in system bus. If problem goes away, you're done. Stop. 2. Swap drives. If problem goes away, you've narrowed down the problem: Swap old drive back in and install new drive electronics board. If problem goes away, you're done. Stop. otherwise, it's the media: Swap drive electronics boards back and install new drive. Stop. The extra steps in step 2 only involve about ten minutes worth of work. At the rate we're paying for maintenance, it's definitely justified to save the data on the disk. Your first point is slightly more relevant, but not very. It's a minor point to keep track of what drives are in which machines and just bring the correct type of replacement. Apollo doesn't stock that many 155Mb drives to make this a problem. The only really relevant point seems to be the drive electronics compatibility point in your final paragraph. I'm not aware of any guarantee from Micropolis (or any other vendor) which says that these are standard parts on their drives that who's interface to the rest of the drive won't change with succeeding releases of their drive product. Now, if they *do* make some sort of statement to that effect, I don't think Apollo (or any other maintenance organization) has a leg to stand on in refusing to do the drive electronics swapping. | In the mean time, I would let your service engineer replace the entire | drive unless you are into taking risks. Well, that's what we did. But it was sure nice to have that drive back for the one day with the new electronics package so we could recover the data. I made a back up, we installed the new disk, I reinstalled the OS, I restored the files from the back up, and I lost a day and the people using the machine lost three days in the process. Casey -------- The American public wants to be lied to. They want to be told that they are the envy of the world. They want to be told that things are ok. Dukakis points at the serious problems that must be addressed and proposes some of the hard decisions that he thinks will be necessary. Bush tells us that everything is wonderful. Who's ahead in the polls?