[comp.unix.aix] Flaky behavior after installing 3002 upgrade

henkel%nepjt@ncsuvx.ncsu.edu (Chuck Henkel) (12/10/90)

I just installed the 3002 upgrade to a 320 system and I am seeing some
erratic behavior. I'm curious whether other people are having similar
problems or did I just screw something up.

Notes:

- Upgrade applied from tape.

- System was at level 3000, *not* 3001, when I applied the PKGID U401202
  Release 310 tape IBM sent me. IBM guy knew I was at 3000 when he
  sent me the tape, so this shouldn't be the problem.

- Due to space limitations I had to apply and commit the upgrade at
  the same time. In hindsight, I wish I hadn't done that. (So keep
  that in mind when yours comes...)

Problems:

1) The main problem is that dbx causes the entire system to crash with
   the flashing "888" led display. The hidden message is 102 300 0c0,
   which is supposed to mean "Data storage interrupt - processor type"

   Invoking "dbx" alone at the shell prompt (csh) gives me the dbx
   prompt OK, but "dbx file" crashes the system immediately.

   I haven't investigated this further because I'm tired of having to
   reboot.

2) I was trying to use dbx to debug another problem which the upgrade
   seems to have introduced. A program which execs two other programs
   only manages to exec one of them and then hangs. I can't be more
   specific because of 1) above, but it worked yesterday.


_Chuck Henkel
--
| Chuck Henkel                      |                            |
| N.C. State University             | Curious about evolution?   |
| Department of Nuclear Engineering |   Read Stephen J. Gould.   |
| henkel%nepjt@ncsuvx.ncsu.edu      |                            |

eddjp@edi386.UUCP ( Dewey Paciaffi ) (12/10/90)

In article <HENKEL%NEPJT.90Dec9223901@nepjt.ncsuvx.ncsu.edu> henkel@nepjt.ncsu.edu (Chuck Henkel) writes:
-
-I just installed the 3002 upgrade to a 320 system and I am seeing some
-erratic behavior. I'm curious whether other people are having similar
-problems or did I just screw something up.
-
-Notes:
-
-- Upgrade applied from tape.
-
-- System was at level 3000, *not* 3001, when I applied the PKGID U401202
-  Release 310 tape IBM sent me. IBM guy knew I was at 3000 when he
-  sent me the tape, so this shouldn't be the problem.
-
-Problems:
-
-1) The main problem is that dbx causes the entire system to crash with
-   the flashing "888" led display. The hidden message is 102 300 0c0,
-   which is supposed to mean "Data storage interrupt - processor type"

I began getting this particular interrupt while backing up NFS file 
systems. It turned out that I was running out of disk space (Data Storage)
in my /tmp filesystem. While I'm not sure exactly what the correlation
is, throwing a few more megabytes into /tmp cured my problem.

I don't remember if this happened pre-3002 or not.

-- 
Dewey Paciaffi           ...!uunet!edi386!eddjp

pemurray@miavx1.acs.muohio.edu (Peter Murray) (12/17/90)

In article <127@edi386.UUCP>, eddjp@edi386.UUCP ( Dewey Paciaffi ) writes:
> In article <HENKEL%NEPJT.90Dec9223901@nepjt.ncsuvx.ncsu.edu> henkel@nepjt.ncsu.edu (Chuck Henkel) writes:
> -Problems:
> -
> -1) The main problem is that dbx causes the entire system to crash with
> -   the flashing "888" led display. The hidden message is 102 300 0c0,
> -   which is supposed to mean "Data storage interrupt - processor type"
> 
> I began getting this particular interrupt while backing up NFS file 
> systems. It turned out that I was running out of disk space (Data Storage)
> in my /tmp filesystem. While I'm not sure exactly what the correlation
> is, throwing a few more megabytes into /tmp cured my problem.
> 
> I don't remember if this happened pre-3002 or not.

We're running the "September Update" (that's 3001, isn't it?) and we see
the machine crashing with the 102 300 0c0 message.  We happened to narrow
it down to a MS-DOS file being served to a PC over NFS (and it's
reproducable), but IBM hasn't come up to look at it yet.

We have 7 megs of space in the /tmp partition (the way it came installed).

Peter
-- 
Peter Murray            Neat UNIX Stunts #5:             pemurray@miavx1.bitnet
176 Thompson Hall    sh> drink <bottle; opener    pmurray@apsvax.aps.muohio.edu
Oxford, OH 45056                       NeXT Mail:  pmurray@next4.acs.muohio.edu

jfh@rpp386.cactus.org (John F Haugh II) (12/17/90)

In article <3149.276bd842@miavx1.acs.muohio.edu> pemurray@miavx1.acs.muohio.edu (Peter Murray) writes:
>We're running the "September Update" (that's 3001, isn't it?) and we see
>the machine crashing with the 102 300 0c0 message.  We happened to narrow
>it down to a MS-DOS file being served to a PC over NFS (and it's
>reproducable), but IBM hasn't come up to look at it yet.

The "July" update was 3001.  "September" was 3002, and may have had those
problems corrected.  I ran into a number of problems involving NFS crashes
in early levels of AIX which were fixed by upgrading the software.  I
don't know if 3003 has been published yet, but you might want to start
clamoring for it.
-- 
John F. Haugh II                             UUCP: ...!cs.utexas.edu!rpp386!jfh
Ma Bell: (512) 832-8832                           Domain: jfh@rpp386.cactus.org
"While you are here, your wives and girlfriends are dating handsome American
 movie and TV stars. Stars like Tom Selleck, Bruce Willis, and Bart Simpson."

henkel%nepjt@ncsuvx.ncsu.edu (Chuck Henkel) (01/18/91)

In article <3149.276bd842@miavx1.acs.muohio.edu>
pemurray@miavx1.acs.muohio.edu (Peter Murray) writes:
> We're running the "September Update" (that's 3001, isn't it?) and we see
> the machine crashing with the 102 300 0c0 message.  We happened to narrow
> it down to a MS-DOS file being served to a PC over NFS (and it's
> reproducable), but IBM hasn't come up to look at it yet.

Here's what IBM came up with re my original problem with machine
crashing when I invoke dbx:

It turned out that I had a hardware bug on the sysplanar0 board.
This was discovered by examining the output of the lscfg command:

% lscfg -v | head -30

INSTALLED RESOURCE LIST WITH VPD

The following resources are installed on your machine.
	
  sysunit0          00-00             RISC System/6000 System Unit
  sysplanar0        00-00             CPU Planar

        Part Number.................81F7774 
        EC Level....................2604441 
        Processor Identification....00015223
        ROS Level and ID............IPLVER0.0 LVL0.00,81F7775 
        Processor Component ID......0000003100000031
        Device Specific.(Z0)........01250B
        Device Specific.(Z1)........02FF02
        Device Specific.(Z2)........032001
        Device Specific.(Z3)........042104
        Device Specific.(Z4)........1D2006
        Device Specific.(Z5)........FFFFFF
        Device Specific.(Z6)........0A3005
        Device Specific.(Z7)........2A3005
        Device Specific.(Z8)........FFFFFF
        Device Specific.(Z9)........FFFFFF
        ROS Level and ID............OCS(05000000) 
        ROS Level and ID............SEEDS(05000000) 

The line:

        Device Specific.(Z1)........02FF02
			            ^^^^ 
apparently means that hardware component "02" is at firmware revision
"FF", but the highest revision level is only supposed to be "21", so
this suggested to the IBMer in Austin that I had a hardware problem in
the planar.

So I called 1-800-IBM-SERV and arranged for a service guy to come out
and replace the board. He agreed to come out the next day only after I
insisted that, no, immediately wasn't a good time for me.

_Chuck

--
| Chuck Henkel                      |                           |
| Department of Nuclear Engineering | Your tax dollars at work. |
| N.C. State University             |                           |
| henkel%nepjt@ncsuvx.ncsu.edu      |                           |