[net.unix-wizards] MAX FILES PER PROCESS PROBLEM

jjr@ccieng5.UUCP (11/14/83)

One of the development groups here is developing a system which needs
more than 20 files open per process. I believe I have to change the
value of NOFILE in param.h and the value of _NFILE in stdio.h.

Does anyone know if there is any other changes that have to be made? Will
making this change cause any serious problems? Any help anyone can offer
will be appreciated. Thanks.
				Jim Roche
				Computer Consoles Inc.
				seismo!rochester!ritcv!ccieng5!jjr

sdyer@bbncca.ARPA (Steve Dyer) (11/19/83)

What you describe (increasing NOFILE and _NFILE) should work just
fine.  At BBN, we routinely run with NOFILE set to 40 (needed for some
network monitoring programs.) You might want to increase NINODE and
NFILE in param.h, too, if many programs are going to take advantage of
this--otherwise, don't bother.

Steve Dyer
decvax!bbncca!sdyer
sdyer@bbncca

dbj.rice%rand-relay@sri-unix.UUCP (11/20/83)

From:  Dave Johnson <dbj.rice@rand-relay>

Under 4.1bsd, you can not make NOFILE greater than 31 thanks to the vread
and vwrite system calls.  In the page table entry for a vread page, there
is a 5-bit wide field called pg_fileno which is set to the file descriptor
number that the page is mapped from.  That is the meaning of the comment:

    #define	NOFILE	20		/* max open files per process */
    /* NOFILE MUST NOT BE >= 31; SEE pte.h */

from param.h where NOFILE is defined.

                                        Dave Johnson
                                        Dept. of Math Science
                                        Rice University
                                        dbj.rice@CSNet-Relay

msc@qubix.UUCP (Mark Callow) (11/21/83)

From: Steve Dyer, decvax!bbncca!sdyer
	What you describe (increasing NOFILE and _NFILE) should work just
	fine.  At BBN, we routinely run with NOFILE set to 40 (needed for some
	network monitoring programs.) You might want to increase NINODE and
	NFILE in param.h, too, if many programs are going to take advantage of
	this--otherwise, don't bother.

You do not say what version of Unix you are running.  However I believe
the original question was referring to 4.1bsd.  In the document
"Installing and Operating 4.1bsd" it says that NOFILE cannot be made
greater than 31 because of a bitfield used used in some unspecified data
structure.  It says that they expect this limitation to be removed in
future versions of the system.  However the same limitation is mentioned in
the document "Installing and Operating 4.2bsd".
-- 
	Mark Callow, Saratoga, CA.
	...{decvax,ucbvax,ihnp4}!decwrl!
		      ...{ittvax,amd70}!qubix!msc
	decwrl!qubix!msc@Berkeley.ARPA

chris@umcp-cs.UUCP (11/22/83)

Of course, you could go in and perform surgery on the paging & vread code
and rip out all the pg_fod from pg_fileno stuff, and thus get rid of the
31 max files limit.  (As I recall the limit would be 32 except that they
use a special number to indicate that pg_fileno isn't valid and instead
it's a text segment being fod'ed.  Just changing the whole thing to toss
out vread would probably clean much stuff up immensely.  And, how many
people actually *use* vread??)

Chris
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci
UUCP:	{seismo,allegra,brl-bmd}!umcp-cs!chris
CSNet:	chris@umcp-cs		ARPA:	chris.umcp-cs@CSNet-Relay

usenet@abnjh.UUCP (usenet) (11/28/83)

I would definitely look for another solution to your problem before I
would go ahead with modifying the kernel and stdio to increase the max
number of open files allowed.  Twenty should be enough for anybody.
The main reason why I feel this way is that 20 is (much) larger than
seven plus or minus two.  Any program that really needs more than 20
open file descriptors at one time is probably too complicated to be
maintainable anyway, and should be redesigned.

Rick Thomas
ihnp4!abnjh!usenet   or   ihnp4!abnji!rbt

thomas@utah-gr.UUCP (Spencer W. Thomas) (11/29/83)

We ran into this recently when we got a commercial finite element
package.  It's written in Fortran (bad enough), and wants to open 25
"tape" files.  I guess it uses them for temp storage, I dunno.  Anyway,
we didn't really have a choice - we "had" to increase the number of
available files/process.

=Spencer

preece@uicsl.UUCP (11/30/83)

#R:ccieng5:-19600:uicsl:12500017:000:878
uicsl!preece    Nov 29 08:58:00 1983

			  Any program that really needs more than 20
	open file descriptors at one time is probably too complicated to be
	maintainable anyway, and should be redesigned.
----------
I've responded to this statement before, but it's worth repeating.
Even if YOUR problems don't naturally require lots of files  you
should not assume you've seen all the problems in the world.  I'll
give you one easy example: a distribution sort of alphabetic records.
Just open one file for each letter and partition the incoming records
into 26 files. That easy, obvious, and no problem to maintain. It also
requires at least 27 files open at once.  Our group has come across
several situations where we could have used a great number of file
descriptors and where reorganizing to use fewer files at once added
significantly to the complexity of the problem.

scott preece
ihnp4!uiucdcs!uicsl!preece

cfh@cca.UUCP (Christopher Herot) (12/01/83)

Rick Thomas's suggestion is precisely the kind or haughty arrogance
that gives programmers a bad name.  Such arbitrary restrictions are
often the cause of MORE complicated programs, since the programmer
must invent write-arounds to get the system to work.

As one example of why 20 not very large, consider a program which
has to drive 3 graphics displays, 3 touch screens, one data tablet,
one digitiser, a sound box, and an ascii terminal.  Now add a file for
error logging, and a few pipes to other processes and you don't have
very many left over for the actual disc files the program may need
to access.  As a result, the programmer may be prevented from using
simple and elegant schemes such as storing one relation from his database
in one Unix file.  Instead he must invent a more complex scheme to
store multiple relations in one file, or write a program to open
and close files in order to conserve the precious descriptors.

~v

kvm@basservax.SUN (12/06/83)

Don't let vread and vwrite get in your way.  Throw them out.
Then rid section 2 of the manual of ridiculous vread/vwrite
caveats like the one in the creat(2) entry.

ron%brl-vgr@sri-unix.UUCP (12/07/83)

From:      Ron Natalie <ron@brl-vgr>

Try most peoples fortran implementations (UNIX and others) and tell me
if you don't find some relatively low ( < infinite) limit on the number
of open files.

-Ron

DBrown.TSDC%hi-multics@sri-unix.UUCP (12/07/83)

This message is empty.

tihor%nyu@sri-unix.UUCP (12/07/83)

From:  Stephen Tihor <tihor@nyu>

The maximum number of files available under VMS is a per user authorization
parameter with a default of 20.  We leave ours set to 45 for most real jobs
since it just costs some P1 space when the datastructures are allocated.

preece@uicsl.UUCP (12/08/83)

#R:ccieng5:-19600:uicsl:12500018:000:1720
uicsl!preece    Dec  5 08:48:00 1983

[The following response to my distribution sort example was received
by mail]
	But your example helps make the point that many-file algorithms almost
	certainly need to be replaced with better approaches.  As soon as your
	client for the "distribution sort of alphabetic records" notices that
	the records filed under each letter are not sorted, he is likely to
	ask for that obvious improvement.  Then what do you do, insist on
	using 26^N (where N ~ 10) open files?  That is a disgusting sorting
	scheme!
----------
The respondent has apparently not read volume 3 of Knuth.  After applying the
distribution the individual sets can be sorted by whatever means and
concatenated (not merged), since they are ordered (that is each item in set
s is greater than each element in set s-1, less than each item in set s+1).
The idea of gaining sort speed by subdividing is well known; Quicksort is
an example of the method.  Distribution sorting is convenient when the key
lends itself to distribution. It is also possible to sort by distribution
entirely by distributing on the least significant positions first,
concatenating, and moving on the next most significant, etc.  This may
be familiar to you if you've ever sorted on a card sorting machine (most of
the net is probably too young to have had that particular experience).
With variable-length keys it may be convenient to distribute first by
length, so that short keys are skipped in until we get to the key positions
they include (my particular application is sorting text words for file
inversion).

At any rate, distribution sorting is well known and effective, the respondent
is apparently ignorant as well as abrasive.

scott preece
ihnp4!uiucdcs!uicsl!preece

gwyn%brl-vld@sri-unix.UUCP (12/11/83)

From:      Doug Gwyn (VLD/VMB) <gwyn@brl-vld>

The respondent knows perfectly well about distribution (and other)
sorting schemes.  Funny how you missed the point.

ron%brl-vgr@sri-unix.UUCP (12/16/83)

From:      Ron Natalie <ron@brl-vgr>

Actually, the classic sorter an IBM 82 had 13 bins.  One for each card
row plus a reject bin, two knobs, and some buttons.  If this were an
IBM and we were using EBCDIC we could actually simulate the kind of sort
they were doing.

-Ron Natalie
(Born in 1959, not too late to use punch cards without the benefit
of a computer.)

edhall%rand-unix@sri-unix.UUCP (12/16/83)

Yes, the classic IBM sorter I used had 13 bins, as you described.

I stand corrected; my memory must be deteriorating with age.  :-)

I think the issue of limitations to various UNIX features, as
distinguished from the *lack* of features, is an interesting one.
I find UNIX to be relatively free of unreasonable restrictions
(at least the Berkeley virtual-memory versions).

		-Ed

dm%bbn-unix@sri-unix.UUCP (12/16/83)

From:  Dave Mankins <dm@bbn-unix>


Sorting aside, there are legitimate uses for allowing more than
20 open files to a process.  Our original implementation of TCP
(on an 11/70 running a modified version 6 UNIX many years ago)
was written as a user-mode program which was the only process
allowed to talk to the network device (a slight
oversimplification).  User programs (e.g., telnet, ftp, the
mailer) would then open up RAND ports (which are similar to
"named pipes") to talk with the TCP program, which, in turn
controlled communications with the network.  Limiting a process
to 20 open files means you can have only about 8 processes using
the network (network connections each take two files, one in each
direction, and it's nice to have a logging output file as well).

TCP is a large, complicated protocol, and is somewhat difficult
to fit into an 11's kernel space without coming up with an
overlay scheme (although I understand that someone at MIT has
succeeded in doing so).  It would also be nice if the kernel didn't
take up the entire memory of a small 11, and putting TCP in user
mode meant it could be swapped once in a while.

This was before there were VAXes, of course.

lwa%mit-csr@sri-unix.UUCP (12/16/83)

Well, actually our TCP implementation here at MIT mostly runs
in user processes.  The kernel contains those pieces of the
protocols necessary for demultiplexing incoming packets out
to user processes; each user process contains the rest of
the protocol layers (for example, Internet, TCP, and user Telnet).
                                                  -Larry Allen
-------

edhall%rand-unix@sri-unix.UUCP (12/19/83)

>                                                         ...  This may
> be familiar to you if you've ever sorted on a card sorting machine (most of
> the net is probably too young to have had that particular experience).

Well, sonny, it sure is nice to know that most people on the net are
younger than my ripe old age of 29.  (:-))  However, I seem to remember
the card-sorting machine I used having only 10 stacks, not 26.  A limit
of twenty open files is generous by those standards!  (And, yes, there
were techniques of doing alphabetic sorts on this card sorter.  This
required two passes per column, though.)

Here in the real world we constantly have to make compromises based on
physical constraints.  Perhaps the `real world' of computing is so
flexible that *any* constraint seems fair game for removal.  But I
hardly consider the 20- (or 32-) file limit to be holding back progress
in the same manner as, say, a 16-bit address.

Have no fear, however!  Note that the 4.2 features which use bit
strings (such as select()) use a *pointer* to integer, leaving open
the possibility of an arbitrary number of bits for such things, and
not a maximum of the magic `32'.

		-Ed Hall
		edhall@rand-unix        (ARPA)
		decvax!randvax!edhall   (UUCP)