[comp.sys.att] Which UNIX pc kernel is better? 3.51 or 3.51a?

lenny@icus.UUCP (Lenny Tropiano) (06/04/88)

Which kernel has the "net-at-large" found to be more reliable for a system
with heavy UUCP traffic?   For some reason recently my system has been
hanging with the "slow-down-and-die" syndrome, random panic's, etc..  I've
been running 3.51a for quite some time, and just recently this has
been happening again?  What could I have done?  I used to be able to keep
my machine running 24 hrs/day, 7 days/week for over a month before rebooting
normally.  Now I have to reset the machine on the average twice every 3 days...

I've been getting:

	panic:  addr fault in kernel
	panic:  page fault in kernel

Sounds like a bad kernel to me?  Phone driver?  Window Driver?  Or all
three?

$ ls -l /UNIX3.51a /unix
-rwxr-xr-x  2 root    root     168915 Mar 26 22:01 /UNIX3.51a*
-rwxr-xr-x  2 root    root     168915 Mar 26 22:01 /unix*

$ sum /UNIX3.51a
45337 165 /UNIX3.51a

Help!  [I wish the source was available to this machine?  Does anyone
know if it is?]

-Lenny
-- 
US MAIL  : Lenny Tropiano, ICUS Computer Group        IIIII  CCC U   U  SSS
           PO Box 1                                     I   C    U   U S
           Islip Terrace, New York  11752               I   C    U   U  SS 
PHONE    : (516) 968-8576 [H] (516) 582-5525 [W]        I   C    U   U    S
TELEX    : 154232428 [ICUS]                           IIIII  CCC  UUU  SSS 
AT&T MAIL: ...attmail!icus!lenny  
UUCP     : ...{mtune, ihnp4, boulder, talcott, sbcs, bc-cis}!icus!lenny 

rjg@sialis.mn.org (Robert J. Granvin) (06/05/88)

>Which kernel has the "net-at-large" found to be more reliable for a system
>with heavy UUCP traffic?   For some reason recently my system has been
>hanging with the "slow-down-and-die" syndrome, random panic's, etc..  I've
>been running 3.51a for quite some time, and just recently this has
>been happening again?  What could I have done?  I used to be able to keep
>my machine running 24 hrs/day, 7 days/week for over a month before rebooting
>normally.  Now I have to reset the machine on the average twice every 3 days...

>I've been getting:
>
>	panic:  addr fault in kernel
>	panic:  page fault in kernel

>Sounds like a bad kernel to me?  Phone driver?  Window Driver?  Or all
>three?

There is a bug within the Unix kernel, version 3.51a.  There is
apparently a problem when the OBM tries to close its last buffer after
a communication sequence.  This causes the above panics.

NOTE:  These problems will _only_ occur if you are using the On Board
Modem for UUCP type stuff.  If you use a modem attached to any of the
serial ports rather than use the OBM, you'll never see this problem.
Also, these problems do not occur if you use the OBM for cu. 

ATT is working on this problem, plus another kernel problem that is
intensely obscure.  I've been in somewhat regular contact with the
technicians trying to keep tabs on when it will be released.  When I
have a confirmed date or the actual fixdisk, I'll post the
information.

This fixdisk will most likely be a fixdisk that must be applied to an
already applied 3.51a fixdisk, and will probably be released on an as
needed basis only.  It would not automatically come with a 3.51a
fixdisk.

>$ ls -l /UNIX3.51a /unix
>-rwxr-xr-x  2 root    root     168915 Mar 26 22:01 /UNIX3.51a*
>-rwxr-xr-x  2 root    root     168915 Mar 26 22:01 /unix*
>
>$ sum /UNIX3.51a
>45337 165 /UNIX3.51a

These are correct.

In any case, I've found 3.51a to be much more stable than 3.51 even
with the above problem.

-- 
"I've been trying for some time to                           Robert J. Granvin
 develop a life-style that doesn't          National Information Systems, Inc.
 require my presence."                                       rjg@sialis.mn.org
    -Garry Trudeau          ...uunet!{{amdahl,hpda}!bungia,rosevax}!sialis!rjg

kls@ditka.UUCP (Karl Swartz) (06/07/88)

In article <380@icus.UUCP> lenny@icus.UUCP (Lenny Tropiano) writes:
>Which kernel has the "net-at-large" found to be more reliable for a system
>with heavy UUCP traffic?

I frequently dial into ditka from a vt340 pretending to be a
vt100.  After running uEmacs, rn, Elm, or just about anything
else that does fancy-ish screen stuff, the 3.51a kernel goes
into a mode where all output looks sort of like it's got the
wrong baud rate (but not exactly).  The Hot-Line folks said
"Yup, it's hosed"; I switched back to the 3.51 kernel.

With 3.51 I was seeing a fair number of the hung window
manager problems until I got HDB and a TrailBlazer.  Not
sure which one did the trick, but I've only had one crash
of any kind since then, and that was a memory parity error.

-- 
Karl Swartz		|UUCP	{pacbell,emoryu1,decvax!formtek}!ditka!kls
1-412/937-4930 office	|	{pitt,psuvax1}!idis!formtek!ditka!kls
			|BIX	kswartz
"I never let my schooling get in the way of my education."  (Twain)

bob@rush.cts.com (Bob "Rush" Ames) (06/07/88)

In article <536@sialis.mn.org>, rjg@sialis.mn.org (Robert J. Granvin) writes:
> >Which kernel has the "net-at-large" found to be more reliable for a system
> >with heavy UUCP traffic?
> 
> NOTE:  These problems will _only_ occur if you are using the On Board
> Modem for UUCP type stuff.  If you use a modem attached to any of the
> serial ports rather than use the OBM, you'll never see this problem.
> Also, these problems do not occur if you use the OBM for cu. 

No.  This is the second time I've read this on the net, so I'm speaking up.

I'm running with 3.51 because whenever I move 3.51a to /unix I get
about 2 panics a day.  I'm NOT USING the OBM.  I am using HDB, 2400 Baud
external modem, system configured for ONE VOICE line.

I have run 3.51 for about 3 months without a re-boot.  Whenever I
try 3.51a I'm down within 8 hours.  

I've been told that there is a bug in a PROM on the COMBO cards.  I've
heard a rumor that this kernel panic bug is related to these defective
PROMs, which I've heard AT&T will replace for free.  I've lost the
information on how to determine whether you've got the old, bad PROM
or the new one.

3.51, 2.5Mb, 67Mb, HDB, 1 voice line, tty000: 2400Baud modem, tty001: tvi950

To prove that OBM has nothing to do with this, I configured as one voice
line.  Even with the OBM totally disconnected and no inittab entries for it,
(coloned out ph1 line) I still paniced.

Personally, I'm pretty happy with the old kernel, although the other
fixes on the disk are quite good.

Bob Ames @ Rush UNIKS PC Support Center 619-432-6860   INET: bob@rush.cts.com
UUCP: {cbosgd, ucsd, nosc, sun!ihnp4, hplabs!hp-sdd}!crash!rush!bob
                                                     jack!/
"We each pay a fabulous price for our visions of paradise" - Neil Peart 1987

david@ms.uky.edu (David Herron -- One of the vertebrae) (06/07/88)

In article <737@rush.cts.com> bob@rush.cts.com (Bob "Rush" Ames) writes:
>In article <536@sialis.mn.org>, rjg@sialis.mn.org (Robert J. Granvin) writes:
>> NOTE:  These problems will _only_ occur if you are using the On Board
>> Modem for UUCP type stuff.  If you use a modem attached to any of the
>> serial ports rather than use the OBM, you'll never see this problem.
>> Also, these problems do not occur if you use the OBM for cu. 
>No.  This is the second time I've read this on the net, so I'm speaking up.

>I'm running with 3.51 because whenever I move 3.51a to /unix I get
>about 2 panics a day.  I'm NOT USING the OBM.  I am using HDB, 2400 Baud
>external modem, system configured for ONE VOICE line.

>I have run 3.51 for about 3 months without a re-boot.  Whenever I
>try 3.51a I'm down within 8 hours.  

Well, my experience is a little different from yours.  Once I fixdisk'd
to 3.51a I eventually started paniccing a couple of times a day like
you say happened to you.  So I started backing the changes out one at
a time starting with the kernal.  That didn't help so I put the new
one back in and backed out the uucico.  I've been running on the old
uucico since without a single panic.

I USE the OBM.  For dial-outs only.  I do have a combo card installed
and use it from time to time.  Me and my roomate both have computers
hooked to it as terminals, one Mac and one Amiga.

-- 
<---- David Herron -- The E-Mail guy                         <david@ms.uky.edu>
<---- s.k.a.: David le casse\*'   {rutgers,uunet}!ukma!david, david@UKMA.BITNET
<---- 
<---- Goodbye RAH.

dgy@sigmast.UUCP (Dave Yearke) (06/08/88)

In article <737@rush.cts.com> bob@rush.cts.com (Bob "Rush" Ames) writes:
>I've been told that there is a bug in a PROM on the COMBO cards.  I've
>heard a rumor that this kernel panic bug is related to these defective
>PROMs, which I've heard AT&T will replace for free.  I've lost the
>information on how to determine whether you've got the old, bad PROM
>or the new one.

Yes, AT&T will replace the chip for free if you have a bad one.  Here's what
you can do:

1)  Pull the combo board out of the machine.

2)  Look for a small white square along the edge.  If it has an "F" or higher
    ("G", "H", ...) you have the bad one.

3)  Call the hotline.  Have the serial number of your machine handy, and keep
    the combo board in front of you, because they will probably ask you for the
    manufacture and code of the chip (it's the big socketed one in the middle
    of the board towards the serial connectors).

Disclaimer:  I am not affiliated with AT&T.  We had a problem with a curses
application that was receiving spurious delete characters, but only from
terminals connected to the combo board.  After replacing the chip, the problem
went away.

-- 
		Dave Yearke, Sigma Systems Technology, Inc.
		   5813 Main St, Williamsville, NY 14221
		  ...!{sunybcs,ames!canisius}!sigmast!dgy