[comp.sys.att] panic: page fault in kernel

bruce@blilly.UUCP (Bruce Lilly) (01/06/91)

I've sen the subject message several times on at least two 3B1's running 3.51m.

The following from the panic message seem to be fairly consistent:

type = 0x2   pid = <varies, of course> pc = 0x56F62  rps = 0x2000 p = <varies around 0x4[6-7]xxx>
GSR = CD00  BSR0 = 7D23  BSR! = <varies, I've seen 904E and 2046>  PHYSPF = 0
D0 line varies
D4 = FC43  D5 = 0  D6 = DD00  D7 = 430000
A0 = 5781A  A1 = 59064  A2 = <varies, 23xxxx>  A3 = 1F452
A4 = 708B6  A5 = 9297C  A6 = 7081E  userA7 = <varies>/kernA7 = 70812
KI-RAM@56F5C: 0800082A  00070008  67122202  E089E089
KI-RAM@70810:
   these line vary somewhat

Anybody have any idea what's going on?

Thanks in advance.
-- 
	Bruce Lilly		blilly!bruce@Broadcast.Sony.COM
					...uunet!sonyusa!sonyd1!blilly!bruce

thad@cup.portal.com (Thad P Floryan) (01/11/91)

alex@umbc3.UMBC.EDU (Alex S. Crain) in <4832@umbc3.UMBC.EDU> writes:

	There is an alternate shared memory drive called lipc that you can use
	instead. I don't remember the details of the problems, but I remember
	reading something like: "nipc has some problems, but lipc isn't well
	tested, so try nipc first and use the other if it breaks."

	I can't for the life of me remember where I saw this, sorry. 

	Loaded drivers are more suseptible to paging problems then the regualr
	kernel, but I've never been able to figure out exactly what the
	difference is.

	In any case, try switching over to lipc. To do this, edit
	/etc/lddrv/drivers and reboot. Good luck.

Actually, "lipc" is the standard one and "nipc" is the optional, enhanced one.

On page 4 of "Important Information for Users of UNIX PC 3.51 Software" (a
loose supplement to the 3.51 docs) is found (page numbers refer to AT&T UNIX
PC Owner's Manual):

``	Page 4-26.  There are two IPC drivers listed in the Loadable Device
	Driver Interface window:

		Standard Sys V IPC (lipc)

			and

		Enhanced Sys V IPC (nipc)

	The standard IPC driver will be loaded by default and should remain
	so under most circumstances. The enhanced IPC driver is a noncertified
	version containing fixes for specific system problems.  If when running
	a particular software package you receive a system message indicating:

		kernel crash: rmfree panic

	you should unload the standard IPC driver and load the enhanced IPC
	driver.  Follow the directions in the Owner's Manual.
''

and looking at the drivers (this is ONE of the reasons I wrote "coffdate"
which I recently posted):

	thadlabs ksh 24989/24990> cd /etc/lddrv
	thadlabs ksh 24989/24990> ls -l lipc* nipc*
	-rwxrwxrwx  1 root    root      34127 Oct 13 23:54 lipc
	-rw-r--r--  1 bin     bin       18628 Jan  1  1970 lipc.o
	-rw-r--r--  1 bin     bin       20752 Jan  1  1970 nipc.o
	thadlabs ksh 24989/24990> coffdate lipc* nipc*
	Sat Oct 13 23:54:32 1990  lipc
	Sat Apr 18 15:36:05 1987  lipc.o
	Sat Apr 18 15:37:55 1987  nipc.o

I really do NOT believe the problem is related to nipc because I just found
the following (but NOTE the WARNING (this is from the guy who "did" 3.51m)):

+ Relay-Version: version B 2.10.3 4.3bsd-beta 6/6/85; site portal.UUcp
+ Path: portal!uunet!ginosko!xanth!ames!apple!sun-barr!rutgers!att!andante!\
+   ulysses!dptg!mtunb!jcm
+ From: jcm@mtunb.ATT.COM (was-John McMillan)
+ Newsgroups: unix-pc.general
+ Subject: lipc  vs  nipc -- and 'ipcs'
+ Message-ID: <1567@mtunb.ATT.COM>
+ Date: 18 Jul 89 20:00:56 GMT
+ Date-Received: 19 Jul 89 11:21:49 GMT
+ Reply-To: jcm@mtunb.UUCP (John McMillan)
+ Organization: AT&T ISL Middletown NJ USA
+ Lines: 38
+ Keywords: lipc nipc ipcs
+ Summary: 'ipcs' only knows of 'lipc'
+ Portal-Origin: Usenet
+ Portal-Type: text
+ Portal-Location: 5262.3.1470.1
+ 
+ If you do not use Inter-Process Communication features --
+ 	specifically, if you do not use 'nipc' instead of 'lipc' --
+ 	then ignore the following.
+ 
+ 			- - - - - - - - - - - -
+ 			This is a minor warning:
+ 
+ Many moons ago, as CT was terminating their work and a new release was
+ 	being finalized, an IPC bug was fixed -- but CT would not
+ 	accept the 'fix' because there was inadequate time/resources
+ 	to validate it to their satisfaction.  The loadable driver
+ 	'lipc.o' -- their package -- was retained and the new version was
+ 	added to the /etc/lddrv directory as 'nipc.o'.
+ 
+ Those users who reported an error message of something like "rmfree error"
+ 	were instructed to switch to using 'nipc'.  If this switch
+ 	was made by REPLACING 'lipc.o' with 'nipc.o', no problem
+ 	arises.
+ 
+ However, if the user changed the 'drivers' file such that 'nipc' was
+ 	specified as the driver to use -- rather than 'lipc' -- the
+ 	program 'ipcs' cannot find the correct information as it
+ 	has no information about 'nipc'.  IPC works just fine, but
+ 	the report-program indicates IPC is not implemented.
+ 
+ 			- - - - - - - - - - - -
+ 			REQUEST FOR INFORMATION
+ 			- - - - - - - - - - - -
+ 
+ In the two intervening years, not a single 'nipc' bug has been
+ 	reported (to me).  IF YOU USE *NIPC* and have observed
+ 	any bugs, please E-MAIL me any details:
+ 
+ 			att!mtunb!jcm
+ 
+ Otherwise, I am inclined to replace lipc with nipc in the fix disk.
+ 
+ john mcmillan	-- att!mtunb!jcm


Thad Floryan [ thad@cup.portal.com ]

rmfowler@texrex.uucp (Rex Fowler) (01/12/91)

A few weeks ago I posted that I had finally fixed my 3b1 kernel parity
error crash problem.  Well, I lied.  It must have been a lucky 30 hours
uptime since the normal uptime was around 1-4 hours before crashing.  
This crash problem lasted over 4 months!

3 days ago I decided to see what drivers I had loaded.  I was surprised.
I have a starlan card but it never occurred to me that a loadable driver was
being loaded at boottime.  I had removed the starlan card previously
and still had crashes so i dismissed the starlan card as the problem.

So, I decided to play with the drivers and here's what happened.
I replaced lipc with nipc, took out starlan driver, added ate driver
to /etc/lddrv/drivers.  Rebooted..

I've been up for 3 days now and TODAY I see postings about the loadable
drivers causing kernel parity error/crashes!  Why didn't this discussion
come 4 months ago! :^)

Anyway, I still don't know which driver was guilty since I changed so 
many of them at once.  After my machine has been up for 5-6 days, (just
to be safe) I will try to isolate which of the drivers was screwing me.  
Right now I am just relieved that FINALLY, 4 months later, I have a 
reliable machine! I am pretty sure that it was the starlan driver.  I 
didn't get the errors associated with the lipc driver that I saw in 
todays posting by Thad.

-- 
Rex Fowler <rmfowler%texrex@cirr.com>
UUCP:  egsner!texrex!rmfowler

ssb@quest.UUCP (Scott Sheldon Bertilson) (01/16/91)

  I started using "nipc" sometime before John sent his note and have
continued to use it, sometimes heavily with no problems.  By the way,
the system includes "nipcs" to provide status information if running
with "nipc".  At one point, I think I found that I could re-name
"nipc.o" to be "lipc.o" and "ipcs" would work just fine.
  I should also mention that it is possible to modify the allocation
sizes in the object file if you need more semaphores or whatever.
I did this in order to play with a version of KA9Q NOS which whacks
the semaphores at a tremendous rate.
-- 

Scott S. Bertilson   ...ssb@quest.UUCP
			scott@poincare.geom.umn.edu