[comp.bugs.4bsd] 4.2bsd memall mfind crash

urban@spp2.UUCP (Michael Urban) (06/11/88)

We have recently added some additional disk drives to a Vax 11/780
running 4.2bsd (yeah, I know, I know).  We have since been crashing
on a "memall mfind" panic about once a day.  At first I thought the
problem was that we were mounting 17 file systems with NMOUNT set to
only 15, but increasing NMOUNT to 20 failed to alleviate the problem.
I note that the clause in memall that is producing the crash is a
patch that was added (four years ago!) to alleviate another panic, the
vaguely remembered MUNHASH bug.  

Does anyone know what the problem is here, and how to fix it?  This is
starting to cost us big bucks.

-- 
   Mike Urban
	...!trwrb!trwspp!spp2!urban 

"You're in a maze of twisty UUCP connections, all alike"

chris@mimsy.UUCP (Chris Torek) (06/15/88)

In article <339@algol.spp2.UUCP> urban@spp2.UUCP (Michael Urban) writes:
>We have recently added some additional disk drives to a Vax 11/780
>running 4.2bsd (yeah, I know, I know).  We have since been crashing
>on a "memall mfind" panic about once a day.  At first I thought the
>problem was that we were mounting 17 file systems with NMOUNT set to
>only 15, but increasing NMOUNT to 20 failed to alleviate the problem.

Did you also change cmap.h, and then (as that necessitates) locore.s?

>I note that the clause in memall that is producing the crash is a
>patch that was added (four years ago!) to alleviate another panic, the
>vaguely remembered MUNHASH bug.  

That was a workaround for a compiler bug.

>Does anyone know what the problem is here, and how to fix it?

Switch to 4.3BSD.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

phb@dcdwest.UUCP (Peter H. Berens) (06/16/88)

From article <339@algol.spp2.UUCP>, by urban@spp2.UUCP (Michael Urban):
> We have recently added some additional disk drives to a Vax 11/780
> running 4.2bsd (yeah, I know, I know).  We have since been crashing
> on a "memall mfind" panic about once a day.  At first I thought the
> problem was that we were mounting 17 file systems with NMOUNT set to
> only 15, but increasing NMOUNT to 20 failed to alleviate the problem.
> I note that the clause in memall that is producing the crash is a
> patch that was added (four years ago!) to alleviate another panic, the
> vaguely remembered MUNHASH bug.  
> 
> Does anyone know what the problem is here, and how to fix it?  This is
> starting to cost us big bucks.
> 
> -- 
>    Mike Urban
> 	...!trwrb!trwspp!spp2!urban 
> 

We came accross this problem about a year ago and suffered with it
for a long time before finding the problem.  If you have too many
mounted file systems then the core map structure has a bit field
that is no longer bit enough to store the index into the mount table.

I belive the only thing we had to change was cmap.h, which I will
provide the diff to below, but I also remember some other define
that has to be changed when you bump NMOUNT.  In param.h you may need
to increase MSWAPX as well.

Hope this helps.

					Pete Berens
					ITT Defense Communictions
					(619) 578-3080 x240

*** /tmp/,RCSt1001638	Wed Jun 15 15:42:24 1988
--- cmap.h	Tue Nov 17 15:51:35 1987
***************
*** 1,4
- static char RCSid[] = "$Header: cmap.h,v 1.1 87/10/31 23:44:13 phb Exp $";
  /*
   * $Log:	cmap.h,v $
   * Revision 1.1  87/10/31  23:44:13  phb

--- 1,3 -----
  /*
   * $Header: cmap.h,v 1.2 87/11/17 15:50:50 phb Exp $
   *
***************
*** 1,5
  static char RCSid[] = "$Header: cmap.h,v 1.1 87/10/31 23:44:13 phb Exp $";
  /*
   * $Log:	cmap.h,v $
   * Revision 1.1  87/10/31  23:44:13  phb
   * Initial revision

--- 1,6 -----
  /*
+  * $Header: cmap.h,v 1.2 87/11/17 15:50:50 phb Exp $
+  *
   * $Log:	cmap.h,v $
   * Revision 1.2  87/11/17  15:50:50  phb
   * Increased c_mdev field to 5 bits.  Decreased c_blkno to 19 bits from
***************
*** 1,6
  static char RCSid[] = "$Header: cmap.h,v 1.1 87/10/31 23:44:13 phb Exp $";
  /*
   * $Log:	cmap.h,v $
   * Revision 1.1  87/10/31  23:44:13  phb
   * Initial revision
   * 

--- 2,11 -----
   * $Header: cmap.h,v 1.2 87/11/17 15:50:50 phb Exp $
   *
   * $Log:	cmap.h,v $
+  * Revision 1.2  87/11/17  15:50:50  phb
+  * Increased c_mdev field to 5 bits.  Decreased c_blkno to 19 bits from
+  * 20.  Rearragned entries to avoid problems with locore.s.
+  * 
   * Revision 1.1  87/10/31  23:44:13  phb
   * Initial revision
   * 
***************
*** 14,21
  {
  unsigned int 	c_next:13,	/* index of next free list entry */
  		c_prev:13,	/* index of previous free list entry */
! 		c_mdev:4,	/* which mounted dev this is from */
! 		c_lock:1,	/* locked for raw i/o or pagein */
  		c_want:1,	/* wanted */
  		c_page:16,	/* virtual page number in segment */
  		c_hlink:13,	/* hash link for <blkno,mdev> */

--- 19,25 -----
  {
  unsigned int 	c_next:13,	/* index of next free list entry */
  		c_prev:13,	/* index of previous free list entry */
! 		c_mdev:5,	/* which mounted dev this is from */
  		c_want:1,	/* wanted */
  		c_page:16,	/* virtual page number in segment */
  		c_hlink:13,	/* hash link for <blkno,mdev> */
***************
*** 23,29
  		c_free:1,	/* on the free list */
  		c_gone:1,	/* associated page has been released */
  		c_type:2,	/* type CSYS or CTEXT or CSTACK or CDATA */
! 		c_blkno:20,	/* disk block this is a copy of */
  		c_ndx:10;	/* index of owner proc or text */
  };
  

--- 27,34 -----
  		c_free:1,	/* on the free list */
  		c_gone:1,	/* associated page has been released */
  		c_type:2,	/* type CSYS or CTEXT or CSTACK or CDATA */
! 		c_lock:1,	/* locked for raw i/o or pagein */
! 		c_blkno:19,	/* disk block this is a copy of */
  		c_ndx:10;	/* index of owner proc or text */
  };