dhb@rayssd.UUCP (03/30/85)
Has anyone ever successfully gotten more than 15 file systems on a 4.2 BSD system? After many long delays, we are finally going to convert from 4.1 to 4.2, and we need to be able to mount more than 15 file systems. I tried making the same changes that I made in 4.1 (increase the size of mdev in the cmap stucture, increase NMOUNT and NSWAPX in param.h, fix mount/umount) but it doesn't seem to work. I even talked to Mike Karrels in Dallas and he indicated that that was all I had to do. The problem we are experiencing is that random processes dump core at random times. This can be very annoying if the shell core dumps, and it can be disastrous if "init" core dumps. The behaviour seems to indicate some kind of swapping error. At first I didn't even associate this problem with the changes to the coremap structure but in a final act of desperation I backed off the change and now the system runs fine. We have been trying to track what we thought was a weird swapping error for three months (tues and wed eve.) and have now been running smoothly WITHOUT the coremap changes for over two weeks. We now feel that all our other changes are done and the system is ready to release to the users. The only problem is that one of our machines currently has eighteen mounted file systems and another one has twenty-three! To compound the problem, we are also expecting delivery of six new disk drives (400M Eagles). Before anyone says "Why dont you just make a few bigger file systems?", there are internal political reasons why we need to portion out the disk space in relatively small (30 - 60 Meg) chunks. Sorry for rambling on so much but if anyone has ever gotten more than 15 file systems to work, PLEASE let me know how you did it. -- Dave Brierley Raytheon Co.; Portsmouth RI; (401)-847-8000 x4073 ...!decvax!brunix!rayssd!dhb ...!allegra!rayssd!dhb ...!linus!rayssd!dhb
mojo@sun.uucp (Joseph Moran) (03/31/85)
In article <681@rayssd.UUCP> dhb@rayssd.UUCP writes: >Has anyone ever successfully gotten more than 15 file systems on a 4.2 BSD >system? After many long delays, we are finally going to convert from 4.1 >to 4.2, and we need to be able to mount more than 15 file systems. I tried >making the same changes that I made in 4.1 (increase the size of mdev in the >cmap stucture, increase NMOUNT and NSWAPX in param.h, fix mount/umount) but >it doesn't seem to work. I even talked to Mike Karrels in Dallas and he >indicated that that was all I had to do. The problem we are experiencing >is that random processes dump core at random times. This can be very >annoying if the shell core dumps, and it can be disastrous if "init" core >dumps. The behaviour seems to indicate some kind of swapping error. At >first I didn't even associate this problem with the changes to the coremap >structure but in a final act of desperation I backed off the change and >now the system runs fine. We have been trying to track what we thought >was a weird swapping error for three months (tues and wed eve.) and have >now been running smoothly WITHOUT the coremap changes for over two weeks. > > ... Your problem is the "Fastreclaim" code in vax/locore.s. This code is an optimization put into 4.2. This code knows about the cmap structure. If you change anything in the cmap structure w/o rewritting this code, you are bound to get bad paging problems. As it turns out, you can take out the call to Fastreclaim as it is simply an optimization, in the long run you'll want to rewrite the code for your new cmap structure. It turns out that this code also knows a few other magic numbers also, w/o using the right symbols to reference them (like UPAGES). The second problem can be avoided by figuring out some of the magic numbers in the code and putting in an expression using the right symbols. It turns out that we were bit by this same problem here at Sun twice. We changed the cmap structure for use with the nfs (network file system). We had a hard time figuring out why random pages got paged in incorrectly and processes were dying when we were running the nfs kernel until it was tracked down to Fastreclaim. Later we were playing with changing UPAGES and got bit by Fastreclaim again. Sometimes changing .h files doesn't do everything it really needs to. Hats off to Bill Shannon for finding both of these. Joe Moran sun!mojo
chris@umcp-cs.UUCP (Chris Torek) (04/04/85)
I strongly urge all 4.2 VARs to do something about this. BRL (and UCB
and others) have modified h/cmap.h to contain #defines for use by
locore.s. This at least centralizes the information. (By the way, I
have a sneaking suspicion that Fastreclaim was done as a quick hack by
the Franz group. Anyone "in the know" care to comment?)
[Excerpts from the BRL version of h/cmap.h:]
/*
* core map entry
*
* Limits imposed by this structure:
*
* limit cur. size fields
* Physical memory+ 64 Mb c_next, c_prev, c_hlink
* Mounted filesystems 255 c_mdev
* size of a process segment 1 Gb c_page
* filesystem size 2 Gb c_blkno
* proc, text table size 1024 c_ndx
*
* + memory can be expanded by converting first three entries
* to bit fields, shrinking c_unused, and increasing MAXMEM below.
*/
#ifndef LOCORE
struct cmap
{
[...]
};
#else LOCORE
/*
* bit offsets of elements in cmap
*/
#define C_INTRANS 66
#define C_FREE 67
#define SZ_CMAP 16 /* sizeof(struct cmap) */
#define MAXMEM 64*1024 /* maximum memory, in Kbytes */
#endif LOCORE
[...]
--
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 4251)
UUCP: {seismo,allegra,brl-bmd}!umcp-cs!chris
CSNet: chris@umcp-cs ARPA: chris@maryland
dhb@rayssd.UUCP (04/29/85)
Thanks to all who replied to my request for a way to mount more than fifteen file systems in 4.2 BSD. There is indeed something besides the size of the fields in the core map that needs to be changed and it CAN cause weird swapping problems or other random behaviour. In locore.s there is a routine called "Fastreclaim" which "knows" about the core map structure. What it knows is the size of the structure and the location of two one-bit fields within it. Therefore if you make any changes to this structure you have to change the code in locore.s. One suggestion that was made was to make the size of the structure and the offsets to these fields #defines in the cmap.h file. I hope that the people at Berkeley take this suggestion and put it in the next release. What I did to solve the problem was to increase the size of the "mdev" field in the core map structure and move the field that followed it to the end of the structure. This way the offsets to the two other fields remained constant and I only had to change one line in locore.s. If you want to know the specific changes that I made, send me mail and I will send you a diff listing. If I get enough requests I will post it to the net. -- Dave Brierley Raytheon Co.; Portsmouth RI; (401)-847-8000 x4073 ...!decvax!brunix!rayssd!dhb ...!allegra!rayssd!dhb ...!linus!rayssd!dhb