[comp.sys.cdc] R6000 systems, anyone?

bediger@isis.cs.du.edu (bruce allen ediger) (06/28/91)

I'm posting this for a friend who doesn't have usenet access, so email
replies to me, and I'll get them to him.  Thanks.


We have FOUR CDC/4680 (MIPS ECL R6000) systems and all four go up and down
like yo-yos, have poor network performance (especially NFS -- gag!),
disk controller and ethernet board failures, CPU board failures, and so on
and on and on. CDC claims they have a total of 48 systems installed and these
are the only FOUR that have problems. We actually received shipment of SIX
systems, but TWO were irreparably DOA and had to be shipped back. A THIRD
was not DOA but had to be swapped out with a new box after about two months
of failed efforts to make it work. SO, since our experience is that SEVEN
consecutive systems shipped to us from MIPS, and installed by CDC with their
variant of the MIPS OS, were utter dogs, it seems hard to take CDC at their
word that the other 40-some-odd systems are happy-go-lucky.

cprice@mips.com (Charlie Price) (06/29/91)

In article <1991Jun28.155256.24743@mnemosyne.cs.du.edu> bediger@isis.cs.du.edu writes:
>I'm posting this for a friend who doesn't have usenet access, so email
>replies to me, and I'll get them to him.  Thanks.
>
>
>We have FOUR CDC/4680 (MIPS ECL R6000) systems and all four go up and down
>like yo-yos, have poor network performance (especially NFS -- gag!),
>disk controller and ethernet board failures, CPU board failures, and so on
>and on and on. CDC claims they have a total of 48 systems installed and these
>are the only FOUR that have problems. We actually received shipment of SIX
>systems, but TWO were irreparably DOA and had to be shipped back. A THIRD
>was not DOA but had to be swapped out with a new box after about two months
>of failed efforts to make it work. SO, since our experience is that SEVEN
>consecutive systems shipped to us from MIPS, and installed by CDC with their
>variant of the MIPS OS, were utter dogs, it seems hard to take CDC at their
>word that the other 40-some-odd systems are happy-go-lucky.

1) When one posts an article saying something so critical of a product,
   it seems only fair that one not do so anonymously.
   I realize that the poster did not write the comment, nor the writer
   create the article -- but I believe the original
   writer's name and organization should appear with the comments.

2) What "replies" does this article solicit?
   There are no question marks in the body of the article
   so I don't see any explicit questions.
   Does the original writer want information
   or does he/she just want to complain?

I'll take a guess about what is wanted and make some general comments.

We have a number of R6000-based machines in engineering
(running MIPS OS and not a CDC-modified version).
Though I don't keep track of reliability explicitly I have some
anecdotal experience with real machines and through talking to
various hardware and software folks working on the boxes.

The RC6280 and RC6260 (MIPS product numbers)
are our most complicated products
(fastest, most memory, most controllers, most disks, most nets, ...)
and are not our most reliable products.
However they can't, generally be described as "yo-yos".

However:
1) The machines are complicated and are somewhat sensitive to the
   detailed nature of the work load.
   Over time, the computer center here and people in the field have
   turned up hardware problems by using machines in a new way.
   Some very-heavy loads don't provoke any problems and other,
   seemingly less stressful loads, can trigger them.
   If an application mix happens to discover a new hardware bug
   then the machines probably trip over it a lot.

   I don't find it hard to believe that one installation has a lot
   of problems that other installations just never see.

   The machines *are* getting globally better over time
   as is the OS software.

2) That means it is important to have the latest Engineering Changes
   to hardware and revisions of software.
   Presumably the CDC field people know that.

3) No matter what the machine, you can always find an individual
   machine that just has problems.  Having 4 of them does seem
   like rather a lot, so I would guess that you have a new bug.
   (Lucky you :-) )

4) (Point-of-comment disclaimer:  Remember; I speak for myself alone).
   I wouldn't encourage you to accept wretched reliability from
   MIPS or any other vendor.
   It seems to me there are 3 main choices, and the one you choose
   depends on the skills, time you have, relationship with the
   vendor, alternatives available, and other stuff.

   1) Complain, put up with it, and hope somebody (else) finds
      the problem.  This doesn't take any special effort,
      but also doesn't necessarily make anything run better.

   2) Rip the machines out and send them back.
      You don't have the reliability problem any more,
      but you don't have a working computer system either.

   3) Actively "help" the vendor discover the problem by
      *making* them pay attention to you and then giving them
      all the assistance that you can.
      The nature of load-dependent or environment-dependent
      reliability problems means that the users and administrators
      are often an important part of identifying the problem area.
      This can be (very) painful and not every installation
      has the time/skills/resources to do it.
      If you can do it, you stand a much better chance of getting
      your particular problems fixed.
-- 
Charlie Price    cprice@mips.mips.com        (408) 720-1700
MIPS Computer Systems / 928 Arques Ave.  MS 1-03 / Sunnyvale, CA   94088-3650

k2@bl.physik.tu-muenchen.de (Klaus Steinberger) (07/01/91)

bediger@isis.cs.du.edu (bruce allen ediger) writes:

>I'm posting this for a friend who doesn't have usenet access, so email
>replies to me, and I'll get them to him.  Thanks.


>We have FOUR CDC/4680 (MIPS ECL R6000) systems and all four go up and down
>like yo-yos, have poor network performance (especially NFS -- gag!),
>disk controller and ethernet board failures, CPU board failures, and so on
[something deleted]
>word that the other 40-some-odd systems are happy-go-lucky.

We have one CDC CD4680, and we are very happy with the system, after
some initial problems (We've triggered a bug in the floating chip).

The system is really fast, and runs reliably. We don't experience
poor network performance, are you sure your network is ok?
(be sure you don't run EP/IX 1.2.1 which has some trouble with YP)

We experienced some failures with the CPU Board, but they were all 
related to heavy board swapping, during the search for the floating
point chip bug. 

CDC's support is very good, hardware as well as software.

Sincerely,
Klaus Steinberger

--
Klaus Steinberger               Beschleunigerlabor der TU und LMU Muenchen
Phone: (+49 89)3209 4287        Hochschulgelaende
FAX:   (+49 89)3209 4280        D-8046 Garching, Germany
BITNET: K2@DGABLG5P             Internet: k2@bl.physik.tu-muenchen.de

guscus@katzo.rice.edu (Gustavo E. Scuseria) (07/01/91)

In article <5229@spim.mips.COM> cprice@mips.com (Charlie Price) writes:
>In article <1991Jun28.155256.24743@mnemosyne.cs.du.edu> bediger@isis.cs.du.edu writes:
>>
>>     [stuff deleted]
>>
>>We have FOUR CDC/4680 (MIPS ECL R6000) systems and all four go up and down
>>like yo-yos, have poor network performance (especially NFS -- gag!),
>>disk controller and ethernet board failures, CPU board failures, and so on
>>and on and on. 
>>     [more stuff deleted]

    My RC6280 had pretty much the same problems ...
    I was going to buy it (trading in one of my m-2000s)
    but gave up after 6 months and innumerable board exchanges.

    Charlie Price's recommendation in such a case are:
>
>
>   1) Complain, put up with it, and hope somebody (else) finds
>      the problem.  This doesn't take any special effort,
>      but also doesn't necessarily make anything run better.
>
>   2) Rip the machines out and send them back.
>      You don't have the reliability problem any more,
>      but you don't have a working computer system either.
>
>   3) Actively "help" the vendor discover the problem by
>      *making* them pay attention to you and then giving them
>      all the assistance that you can.
>      The nature of load-dependent or environment-dependent
>      reliability problems means that the users and administrators
>      are often an important part of identifying the problem area.
>      This can be (very) painful and not every installation
>      has the time/skills/resources to do it.
>      If you can do it, you stand a much better chance of getting
>      your particular problems fixed.

  which sounds very good ... 

  In my case, MIPS demanded a maintenance contract on the machine
  to continue an unsuccesfull effort to keep it up longer than 
  24 hours ! It will crashed for any reason, anytime.
  Of course, I sent the 6280 back and bought an IBM 6000/530.
  With the money left, I'm also buying an IBM 550 or an HP 730.
  Have not made up my mind yet ... Either of them easily beat the
  6280 in floating point speed, not to mention that they cost only
  a fraction of the MIPS's box price. IMHO, that's the way you get
  your problems fixed.
 
--
Gustavo E. Scuseria              | guscus@katzo.rice.edu
Department of Chemistry          |
Rice University                  | office: (713) 527-4082
Houston, Texas 77251-1892        | fax   : (713) 285-5155