[comp.sys.sgi] reliability questions

rion@FORD-WDL1.ARPA (Rion Cassidy) (03/17/88)

I want to get some feedback from those of you have had significant
experience with SGI equipment.  From our experiences either SGI
hardware is very low quality or we have a lemon.  It is important for
us to know which because we have to make a decision regarding the
purchase of a new machine soon. I will explain; you judge for
yourself.  I am not trying to run down SGI, just explain the facts.

Our IRIS 3120 was originally purchased in November 1986.  In July 1987
we started having intermittent RAM parity problems.  This is the sort
of problem that stops the IRIS dead in its tracks and frequently can
ruin files.  The SGI tech came out six different times, each time
saying, "we're pretty sure it's fixed *this* time."  Finally, after
some angry phone calls, the sixth visit (in October) seemed to do the
trick.  Perhaps we had an elusive, unusual problem that no one else
could've solved any faster.

In January we started getting error messages about track errors on
the disk.  These seemed pretty innocous at first since no real harm
was being done.  First the tech came out and made some disk
configuration changes and rebooted the system.  That seemed to work
for a couple of weeks until the same error message started popping up
again.  The decision then was to reformat the disk, eliminating the
spurious bad sector messages.  Trouble here was that my backup of the
system apparently was not successful; tar got conflicting file sizes
on many files.  We didn't realize this until the disk had been
reformatted and we were trying to restore everything.  The backup tape
was trash, a month's work was lost. 

The IRIS only worked for a while after that.  In February we were
getting occasional error messages about the fbc and font ram (I never
did figure out what that meant!).  We reported this but before they
had a chance to attempt a fix, the disk went completely south.  There
were continous I/O error messages and it wouldn't reboot.  So now they
had to give us a new disk.  This problem cost a week of
access time and lost work.  The gotcha on the repair was that the new
disk would only run under SGI's system release 3.6 (new model, not
previously supported I imagine).  Our backup was an entire dump of
the disk.  The new disk had just release 3.6, no user files, and doing
a complete restoration would wipe out the new operating system without
which we couldn't use the disk.  So we had to do a file by file
restore in some cases (we were foolish enough to have used SGI's
'backup' program which splits directories between tape cartridges).

It seems to be OK now.  Occasionally when I put a tape in the drive
and attempt a tar command the system locks up.  Drive light off, no
response on the console.  Should I call in on this too?  We've got Sun
workstations all over the place and they don't seem to have anywhere
near as many hardware problems.

In our estimation, the total of system down time and lost work comes
to about 10%.  When it comes in large chunks during critical
development periods it seems a lot worse.

My question: have other SGI users had problems with this frequency?
Or do we just have a lemon?  Are repeated visits for the same problem
normal in the world of hardware maintenance?

Any feedback would be appreciated.  If there is sufficient and
interesting responses, I will post a summary.

P.S.  I solved the problem with kermit crashing (discussed in a
previous posting).  It has to be installed in /bin *and* /usr/bin or
else it will crash when a send is attempted.

Rion Cassidy							
Ford Aerospace
rion@ford-wdl1.arpa
...{sgi,sun,ucbvax}!wdl1!rion

Disclaimer: The above is solely the opinion of the author.