[comp.sys.sun] soft ecc errors on the sun 4/260

blackman@hodgkin.med.upenn.edu (David Blackman) (01/14/89)

I am looking for advice on how to track down a memory problem on a Sun
4/260.  The following is recorded in our /usr/adm/messages file about
every two minutes:

axon vmunix: mem0: soft ecc addr f8248 syn 4f<S16,S2,S1,S0,SX> 47 U1647

Thanks, David
Blackman@hodgkin.med.upenn.edu

eap@bu-it.bu.edu (Eric A. Pearce) (01/25/89)

blackman@hodgkin.med.upenn.edu (David Blackman) says:
>I am looking for advice on how to track down a memory problem on a Sun
>4/260.  The following is recorded in our /usr/adm/messages file about
>every two minutes:
>
>axon vmunix: mem0: soft ecc addr f8248 syn 4f<S16,S2,S1,S0,SX> 47 U1647

The "mem0" is the number of the board (they start at 0), so this is your
first memory board, which should be in slot 6.  

The "U1647" refers to the chip position on the board.  If you look at the
back of the board, you will see the number under each chip.  They are
soldered in, so you don't have much choice but to replace the entire card.

You should be able to find the bad board by booting in diag (change the
little toggle switch on the cpu to "diag").  A "correctable error" will
light the "CE" LED on the bad board and a "uncorrectable error" will light
up "UE".  It would be a good idea to boot in diag again after replacement
to make sure you've fixed it.  If your system crashes due to a memory
error while in normal use, go look at the back of it to check out the
LED's. 

If you have 8 meg boards, you can pull out the bad one, and move the
jumpers on the other memory cards if needed.  If you have the newer 32 meg
card, I guess you are stuck until you can get it replaced.  If you are
doing the work yourself, check to make sure you have the resistor
terminator block only on the first memory board.  It is located near the
center backplane connector.

If you are getting errors often (more than one a week), call Sun and have
it replaced.  This may take some effort, as Sun field service told me not
to worry about it until I was getting several hundred per day.  It has
been my experience that the "soft" errors usually lead to "hard" ones that
crash your system.  

 -e

 Eric Pearce                                   ARPANET eap@bu-it.bu.edu
 Boston University Information Technology      CSNET   eap%bu-it@bu-cs
 111 Cummington Street                         JNET    jnet%"ep@buenga" 
 Boston MA 02215                               UUCP    !harvard!bu-cs!bu-it!eap 
 617-353-2780 voice  617-353-6260 fax          BITNET  ep@buenga