[comp.sys.att] 3B2 crashes mysteriously

jkg@prism.gatech.EDU (Jim Greenlee) (10/18/90)

I have a 3B2/310 that is crashing at seemingly random times (sometimes
daily) due to a kernel MMU fault. This machine is on a local area network
that has about 8 other 3B2s that are connected via RFS. We are running SVR
3.2 and WIN TCP/IP 3.0.1 (which we're using as the netspec for RFS). Only
one of the machines is exhibiting this trouble - the rest have been up for
a month with no problems. Another department here on campus is having the
same problem with a 3B2/1000, only less frequently (once every 3-4 weeks).

I have tried swapping a different machine in and checking the network cabling.
Everything appears to be working properly. I can't really pin the problem
down to any specific situation, but it seems to be happening when someone
is logged in from a terminal that is connected directly to the PORTS card.
Logging in from the console or via rlogin/telnet doesn't seem to cause
problems (this is purely speculation on my part, and may not be the cause
of the trouble).

I seem to remember seeing a posting a while back that mentioned a possible
bug in the kernel that would cause this. My recollection is that a fix is
available from the Hotline, but I haven't called them yet (I'm planning to
do that tomorrow). In the meantime, could anybody shed any light on this
problem for me? Thanks.

						Jim Greenlee
-- 
Jim Greenlee - Instructor, School of ICS, Georgia Tech     jkg@cc.gatech.edu

Jryy, abj lbh'ir tbar naq qbar vg! Whfg unq gb xrrc svqqyvat jvgu vg
hagvy lbh oebxr vg, qvqa'g lbh?!

ram@attcan.UUCP (Richard Meesters) (10/18/90)

In article <15406@hydra.gatech.EDU>, jkg@prism.gatech.EDU (Jim Greenlee) writes:
> I have a 3B2/310 that is crashing at seemingly random times (sometimes
> daily) due to a kernel MMU fault. This machine is on a local area network
> that has about 8 other 3B2s that are connected via RFS. We are running SVR
> 3.2 and WIN TCP/IP 3.0.1 (which we're using as the netspec for RFS). Only
> one of the machines is exhibiting this trouble - the rest have been up for
> a month with no problems. Another department here on campus is having the
> same problem with a 3B2/1000, only less frequently (once every 3-4 weeks).
> 

We've seen some problems up here in Canada with kernel MMU fault (F_ACCESS) on
machines running SVR3.2.2 and TCP/IP.  The problem is apparently fixed in the 
3.2.3 release of UNIX.  I don't know if this release applies to your 3B2/310
though.  Best bet is to call the hotline (like you say you're going to do 
anyways).  They should recognize it as a known problem.

Regards,

------------------------------------------------------------------------------
     Richard A Meesters                |
     Technical Support Specialist      |     Insert std.logo here
     AT&T Canada                       |
                                       |     "Waste is a terrible thing
     ATTMAIL: ....attmail!rmeesters    |      to mind...clean up your act"
     UUCP:  ...att!attcan!ram          |
------------------------------------------------------------------------------

jahn@sapphire.idbsu.edu (Greg Jahn) (10/22/90)

In article <12793@attcan.UUCP> ram@attcan.UUCP (Richard Meesters) writes:
>In article <15406@hydra.gatech.EDU>, jkg@prism.gatech.EDU (Jim Greenlee) writes:
>> I have a 3B2/310 that is crashing at seemingly random times (sometimes
>> daily) due to a kernel MMU fault. This machine is on a local area network
>> that has about 8 other 3B2s that are connected via RFS. We are running SVR
>> 3.2 and WIN TCP/IP 3.0.1 (which we're using as the netspec for RFS). Only
>> one of the machines is exhibiting this trouble - the rest have been up for
>> a month with no problems. Another department here on campus is having the
>> same problem with a 3B2/1000, only less frequently (once every 3-4 weeks).
>> 
>
>We've seen some problems up here in Canada with kernel MMU fault (F_ACCESS) on
>machines running SVR3.2.2 and TCP/IP.  The problem is apparently fixed in the 
>3.2.3 release of UNIX.  I don't know if this release applies to your 3B2/310
>though.  Best bet is to call the hotline (like you say you're going to do 
>anyways).  They should recognize it as a known problem.

Don't bet on it (hotine support answers).  We are experiencing the same
KERNEL MMU FAULT (F_ACCESS) problem.  We run a 3B2/600 SVR3.2.2, TCP/IP
3.0.1.  Our system was failing sometimes DURING going to 'init 3', and 
other times at most 15 minutes after coming up.  The hotline said to do
a partial restore (non-trivial solution).  Then our system was stable
for about two weeks, now it has failed twice in the last three days.
The hotline didn't seem familiar with the problem.  DOES ANYONE OUT
THERE KNOW WHAT"S GOING ON?

------ 
... on a completely different topic:  Is anyone out there running
a similar configuration w/ smail3?  We'd like to support mail for
locally attached uucp accounts, but we do not have any out-going
uucp connections.  Thus we need uucp mail and internet mail to
talk to each other.  ANy help/advice would be greatly appreciated,
the easier it is for me to do this, the better the chance it'll
happen.

	- Greg 


-- 
Greg Jahn, Boise State Univ.| The struggle itself towards the heights
/ jahn@sapphire.idbsu.edu / | is enough to fill a man's heart. One must
/ dosjahn@idbsu.bitnet    / | imagine Sisyphus happy.
/ (208)385-3891.thephone  / |			- Albert Camus

wmb@ulysses.att.com (W M Brelsford) (10/23/90)

In article <15406@hydra.gatech.EDU>, jkg@prism.gatech.EDU (Jim Greenlee) writes:
> I have a 3B2/310 that is crashing at seemingly random times (sometimes
> daily) due to a kernel MMU fault. This machine is on a local area network
> that has about 8 other 3B2s that are connected via RFS. We are running SVR
> 3.2 and WIN TCP/IP 3.0.1 (which we're using as the netspec for RFS).

In article <12793@attcan.UUCP> ram@attcan.UUCP (Richard Meesters) writes:
>We've seen some problems up here in Canada with kernel MMU fault (F_ACCESS) on
>machines running SVR3.2.2 and TCP/IP.

In article <1990Oct22.164803.24643@sapphire.idbsu.edu>, jahn@sapphire.idbsu.edu (Greg Jahn) writes:
> Don't bet on it (hotine support answers).  We are experiencing the same
> KERNEL MMU FAULT (F_ACCESS) problem.  We run a 3B2/600 SVR3.2.2, TCP/IP
> 3.0.1.  Our system was failing sometimes DURING going to 'init 3', and 
> other times at most 15 minutes after coming up.

We've been suffering with this problem (F_ACCESS) too, but it only
seems to occur when trying to access a system (via rlogin, rcp, etc.)
that is currently down.  And in such cases it always happens -- rather
than reporting "Connection timed out", it waits about 2 minutes and
crashes.  So we're very careful to comment out /etc/hosts lines for
systems that go down -- we have a daemon doing it.

We're running SVR3.1 with an older version of TCP/IP.  We were hoping
our upcoming upgrade to both would fix it..

Bill Brelsford
AT&T, Basking Ridge NJ
wmb@joplin.att.com

jrallen@devildog.att.com (Jon Allen) (10/23/90)

>>We've seen some problems up here in Canada with kernel MMU fault (F_ACCESS) on
>>machines running SVR3.2.2 and TCP/IP.
>
>> KERNEL MMU FAULT (F_ACCESS) problem.  We run a 3B2/600 SVR3.2.2, TCP/IP

This is a well-known bug; the AT&T hotline has a fix for 3.0.1.  I haven't 
tested WIN/3B 3.2 yet to see if the fix is included in that release.  Many 
of our customers were having the problem and it turned out to be an 
improperly freed up streams q in one of the TCP modules.  This bug shows up
quickly if you have heavy TCP/IP traffic.

-Jon
jrallen@devildog.att.com

ram@attcan.UUCP (Richard Meesters) (10/24/90)

In article <1990Oct22.164803.24643@sapphire.idbsu.edu>, jahn@sapphire.idbsu.edu (Greg Jahn) writes:
> In article <12793@attcan.UUCP> ram@attcan.UUCP (Richard Meesters) writes:
> >In article <15406@hydra.gatech.EDU>, jkg@prism.gatech.EDU (Jim Greenlee) writes:
| >> I have a 3B2/310 that is crashing at seemingly random times (sometimes
| >> daily) due to a kernel MMU fault. This machine is on a local area network
| >> that has about 8 other 3B2s that are connected via RFS. We are running SVR
| >> 3.2 and WIN TCP/IP 3.0.1 (which we're using as the netspec for RFS). Only
| >> one of the machines is exhibiting this trouble - the rest have been up for
| >> a month with no problems. Another department here on campus is having the
| >> same problem with a 3B2/1000, only less frequently (once every 3-4 weeks).
| >> 
| >
| >We've seen some problems up here in Canada with kernel MMU fault (F_ACCESS) on
| >machines running SVR3.2.2 and TCP/IP.  The problem is apparently fixed in the 
| >3.2.3 release of UNIX.  I don't know if this release applies to your 3B2/310
| >though.  Best bet is to call the hotline (like you say you're going to do 
| >anyways).  They should recognize it as a known problem.
| 
| Don't bet on it (hotine support answers).  We are experiencing the same
| KERNEL MMU FAULT (F_ACCESS) problem.  We run a 3B2/600 SVR3.2.2, TCP/IP
| 3.0.1.  Our system was failing sometimes DURING going to 'init 3', and 
| other times at most 15 minutes after coming up.  The hotline said to do
| a partial restore (non-trivial solution).  Then our system was stable
| for about two weeks, now it has failed twice in the last three days.
| The hotline didn't seem familiar with the problem.  DOES ANYONE OUT
| THERE KNOW WHAT"S GOING ON?
| 

Like I said, there's a problem running 3.2.2 UNIX with 3.0.1 TCP/IP, which 
will cause the machines to intermittantly panic with a Kernel MMU Fault
(F_ACCESS).  I don't know what the cause of the panic is, but in my 
experience with it, it's an intermittant failure that may or may not be 
exhibited on a particular machine.  I've seen situations where on two 
virtually identical machines, one has the problem and the other doesn't.  
The only fix I know of is to upgrade the machine to UNIX 3.2.3.  I haven't 
been able to find the fix seperately.


Regards,

------------------------------------------------------------------------------
     Richard A Meesters                |
     Technical Support Specialist      |     Insert std.logo here
     AT&T Canada                       |
                                       |     "Waste is a terrible thing
     ATTMAIL: ....attmail!rmeesters    |      to mind...clean up your act"
     UUCP:  ...att!attcan!ram          |
------------------------------------------------------------------------------

laver@siodo.UCSD.EDU (Mick Laver) (10/27/90)

>Like I said, there's a problem running 3.2.2 UNIX with 3.0.1 TCP/IP, which 
>will cause the machines to intermittantly panic with a Kernel MMU Fault
>(F_ACCESS).  I don't know what the cause of the panic is, but in my 
....
>The only fix I know of is to upgrade the machine to UNIX 3.2.3.  I haven't 
>been able to find the fix seperately.

We have the same problem on both a 400 and 600 running 3.1.1. I've been
assured by the hotline folks that the 3.2 tcp/ip release will fix it.
Can anyone verify that? 

ram@attcan.UUCP (Richard Meesters) (10/29/90)

In article <345@siodo.UCSD.EDU>, laver@siodo.UCSD.EDU (Mick Laver) writes:
> We have the same problem on both a 400 and 600 running 3.1.1. I've been
> assured by the hotline folks that the 3.2 tcp/ip release will fix it.
> Can anyone verify that? 

According to the 3.2 release notes, there's a fix for t "an MMU fault when
using rlogin and hitting the interrupt key" and also a periodic close panic
having to do with the pty's.

Haven't had enough time to thoroughly test it.  I've never seen the problem on
my machines in my lab, but I have seen it on customer machines.  Since it 
appears to be an intermittant problem (in the cases I've seen), I haven't had
enough time to evaluate it fully.

Regards,

------------------------------------------------------------------------------
     Richard A Meesters                |
     Technical Support Specialist      |     Insert std.logo here
     AT&T Canada                       |
                                       |     "Waste is a terrible thing
     ATTMAIL: ....attmail!rmeesters    |      to mind...clean up your act"
     UUCP:  ...att!attcan!ram          |
------------------------------------------------------------------------------