bediger@isis.cs.du.edu (bruce allen ediger) (06/28/91)
I'm posting this for a friend who doesn't have usenet access, so email replies to me, and I'll get them to him. Thanks. We have FOUR CDC/4680 (MIPS ECL R6000) systems and all four go up and down like yo-yos, have poor network performance (especially NFS -- gag!), disk controller and ethernet board failures, CPU board failures, and so on and on and on. CDC claims they have a total of 48 systems installed and these are the only FOUR that have problems. We actually received shipment of SIX systems, but TWO were irreparably DOA and had to be shipped back. A THIRD was not DOA but had to be swapped out with a new box after about two months of failed efforts to make it work. SO, since our experience is that SEVEN consecutive systems shipped to us from MIPS, and installed by CDC with their variant of the MIPS OS, were utter dogs, it seems hard to take CDC at their word that the other 40-some-odd systems are happy-go-lucky.
cprice@mips.com (Charlie Price) (06/29/91)
In article <1991Jun28.155256.24743@mnemosyne.cs.du.edu> bediger@isis.cs.du.edu writes: >I'm posting this for a friend who doesn't have usenet access, so email >replies to me, and I'll get them to him. Thanks. > > >We have FOUR CDC/4680 (MIPS ECL R6000) systems and all four go up and down >like yo-yos, have poor network performance (especially NFS -- gag!), >disk controller and ethernet board failures, CPU board failures, and so on >and on and on. CDC claims they have a total of 48 systems installed and these >are the only FOUR that have problems. We actually received shipment of SIX >systems, but TWO were irreparably DOA and had to be shipped back. A THIRD >was not DOA but had to be swapped out with a new box after about two months >of failed efforts to make it work. SO, since our experience is that SEVEN >consecutive systems shipped to us from MIPS, and installed by CDC with their >variant of the MIPS OS, were utter dogs, it seems hard to take CDC at their >word that the other 40-some-odd systems are happy-go-lucky. 1) When one posts an article saying something so critical of a product, it seems only fair that one not do so anonymously. I realize that the poster did not write the comment, nor the writer create the article -- but I believe the original writer's name and organization should appear with the comments. 2) What "replies" does this article solicit? There are no question marks in the body of the article so I don't see any explicit questions. Does the original writer want information or does he/she just want to complain? I'll take a guess about what is wanted and make some general comments. We have a number of R6000-based machines in engineering (running MIPS OS and not a CDC-modified version). Though I don't keep track of reliability explicitly I have some anecdotal experience with real machines and through talking to various hardware and software folks working on the boxes. The RC6280 and RC6260 (MIPS product numbers) are our most complicated products (fastest, most memory, most controllers, most disks, most nets, ...) and are not our most reliable products. However they can't, generally be described as "yo-yos". However: 1) The machines are complicated and are somewhat sensitive to the detailed nature of the work load. Over time, the computer center here and people in the field have turned up hardware problems by using machines in a new way. Some very-heavy loads don't provoke any problems and other, seemingly less stressful loads, can trigger them. If an application mix happens to discover a new hardware bug then the machines probably trip over it a lot. I don't find it hard to believe that one installation has a lot of problems that other installations just never see. The machines *are* getting globally better over time as is the OS software. 2) That means it is important to have the latest Engineering Changes to hardware and revisions of software. Presumably the CDC field people know that. 3) No matter what the machine, you can always find an individual machine that just has problems. Having 4 of them does seem like rather a lot, so I would guess that you have a new bug. (Lucky you :-) ) 4) (Point-of-comment disclaimer: Remember; I speak for myself alone). I wouldn't encourage you to accept wretched reliability from MIPS or any other vendor. It seems to me there are 3 main choices, and the one you choose depends on the skills, time you have, relationship with the vendor, alternatives available, and other stuff. 1) Complain, put up with it, and hope somebody (else) finds the problem. This doesn't take any special effort, but also doesn't necessarily make anything run better. 2) Rip the machines out and send them back. You don't have the reliability problem any more, but you don't have a working computer system either. 3) Actively "help" the vendor discover the problem by *making* them pay attention to you and then giving them all the assistance that you can. The nature of load-dependent or environment-dependent reliability problems means that the users and administrators are often an important part of identifying the problem area. This can be (very) painful and not every installation has the time/skills/resources to do it. If you can do it, you stand a much better chance of getting your particular problems fixed. -- Charlie Price cprice@mips.mips.com (408) 720-1700 MIPS Computer Systems / 928 Arques Ave. MS 1-03 / Sunnyvale, CA 94088-3650
k2@bl.physik.tu-muenchen.de (Klaus Steinberger) (07/01/91)
bediger@isis.cs.du.edu (bruce allen ediger) writes: >I'm posting this for a friend who doesn't have usenet access, so email >replies to me, and I'll get them to him. Thanks. >We have FOUR CDC/4680 (MIPS ECL R6000) systems and all four go up and down >like yo-yos, have poor network performance (especially NFS -- gag!), >disk controller and ethernet board failures, CPU board failures, and so on [something deleted] >word that the other 40-some-odd systems are happy-go-lucky. We have one CDC CD4680, and we are very happy with the system, after some initial problems (We've triggered a bug in the floating chip). The system is really fast, and runs reliably. We don't experience poor network performance, are you sure your network is ok? (be sure you don't run EP/IX 1.2.1 which has some trouble with YP) We experienced some failures with the CPU Board, but they were all related to heavy board swapping, during the search for the floating point chip bug. CDC's support is very good, hardware as well as software. Sincerely, Klaus Steinberger -- Klaus Steinberger Beschleunigerlabor der TU und LMU Muenchen Phone: (+49 89)3209 4287 Hochschulgelaende FAX: (+49 89)3209 4280 D-8046 Garching, Germany BITNET: K2@DGABLG5P Internet: k2@bl.physik.tu-muenchen.de
guscus@katzo.rice.edu (Gustavo E. Scuseria) (07/01/91)
In article <5229@spim.mips.COM> cprice@mips.com (Charlie Price) writes: >In article <1991Jun28.155256.24743@mnemosyne.cs.du.edu> bediger@isis.cs.du.edu writes: >> >> [stuff deleted] >> >>We have FOUR CDC/4680 (MIPS ECL R6000) systems and all four go up and down >>like yo-yos, have poor network performance (especially NFS -- gag!), >>disk controller and ethernet board failures, CPU board failures, and so on >>and on and on. >> [more stuff deleted] My RC6280 had pretty much the same problems ... I was going to buy it (trading in one of my m-2000s) but gave up after 6 months and innumerable board exchanges. Charlie Price's recommendation in such a case are: > > > 1) Complain, put up with it, and hope somebody (else) finds > the problem. This doesn't take any special effort, > but also doesn't necessarily make anything run better. > > 2) Rip the machines out and send them back. > You don't have the reliability problem any more, > but you don't have a working computer system either. > > 3) Actively "help" the vendor discover the problem by > *making* them pay attention to you and then giving them > all the assistance that you can. > The nature of load-dependent or environment-dependent > reliability problems means that the users and administrators > are often an important part of identifying the problem area. > This can be (very) painful and not every installation > has the time/skills/resources to do it. > If you can do it, you stand a much better chance of getting > your particular problems fixed. which sounds very good ... In my case, MIPS demanded a maintenance contract on the machine to continue an unsuccesfull effort to keep it up longer than 24 hours ! It will crashed for any reason, anytime. Of course, I sent the 6280 back and bought an IBM 6000/530. With the money left, I'm also buying an IBM 550 or an HP 730. Have not made up my mind yet ... Either of them easily beat the 6280 in floating point speed, not to mention that they cost only a fraction of the MIPS's box price. IMHO, that's the way you get your problems fixed. -- Gustavo E. Scuseria | guscus@katzo.rice.edu Department of Chemistry | Rice University | office: (713) 527-4082 Houston, Texas 77251-1892 | fax : (713) 285-5155