[comp.unix.wizards] unix question: files per directory

dxxb@beta.lanl.gov (David W. Barts) (04/11/89)

How many files can there be in a single UNIX directory
(I realize this may depend on the variety of UNIX; I expect
the Berkeley fast file system would allow more)?  I need
a better answer than "a lot" or "at least 2000", if possible.

(This concerns an application program we are currently running
on an Apollo under Aegis; it depends on a LOT of files being
in a single directory and Aegis's limit of 1500 or so can
be a pain.)

I realize that as directories get bigger, they slow down, but
how much?  Just what IS the maximum directory size?

			Thanks in advance,
David W. Barts  N5JRN, Ph. 509-376-1718 (FTS 444-1718), dxxb@beta.lanl.GOV
BCS Richland Inc.                  |       603 1/2 Guernsey St.
P.O. Box 300, M/S A2-90            |       Prosser, WA  99350
Richland, WA  99352                |       Ph. 509-786-1024

grr@cbmvax.UUCP (George Robbins) (04/11/89)

In article <24110@beta.lanl.gov> dxxb@beta.lanl.gov (David W. Barts) writes:
> 
> How many files can there be in a single UNIX directory
> (I realize this may depend on the variety of UNIX; I expect
> the Berkeley fast file system would allow more)?  I need
> a better answer than "a lot" or "at least 2000", if possible.

At least 33,000  8-) 

I recently played with an archive of comp.sys.amiga from day 1 and
it was on this order.  
 
> I realize that as directories get bigger, they slow down, but
> how much?  Just what IS the maximum directory size?

Yeah, it gets real slow and turns the whole system into a dog when
you are accessing the directories.   Still the time is finite, and
the whole restore took maybe 16 hours (I had other stuff going on).

The tape went from almost continual motion, to twitching a several times
a minute...

I seem to recall that the Mach people at CMU were dabbling with some
kind of hashed directories or auxilliary hashing scheme, this would
make it lots quicker.

I don't know if there is a theoreticl maximum, expept that the
directory must be smaller than the maximum possible filesize,
though I am curious about what constitues an efficient limit so
that if I build a directory tree with n entries at each level,
what is a reasonable tradeoff between tree depth and search time.

This was with Ultrix/BSD, I don't know what limits might pertain to
Sys V and other varients.

-- 
George Robbins - now working for,	uucp: {uunet|pyramid|rutgers}!cbmvax!grr
but no way officially representing	arpa: cbmvax!grr@uunet.uu.net
Commodore, Engineering Department	fone: 215-431-9255 (only by moonlite)

chris@mimsy.UUCP (Chris Torek) (04/11/89)

In article <24110@beta.lanl.gov> dxxb@beta.lanl.gov (David W. Barts) writes:
>How many files can there be in a single UNIX directory ....
>I realize that as directories get bigger, they slow down, but
>how much?  Just what IS the maximum directory size?

The maximum size is the same as for files, namely 2^31 - 1 (2147483647)
bytes.  (This is due to the use of a signed 32 bit integer for off_t.
The limit is larger in some Unixes [Cray], but is usually smaller due
to disk space limits.)  Directory performance falls off somewhat at
single indirect blocks, moreso at double indirects, and still more at
triple indirects.  It takes about 96 kbytes to go to single indirects
in a 4BSD 8K/1K file system.

Each directory entry requires a minimum of 12 bytes (4BSD) or exactly
16 bytes (SysV); 16 is a nice `typical' size, so divide 96*1024 by 16 to
get 6144 entries before indirecting on a BSD 8K/1K file system.

The actual slowdown due to indirect blocks is not clear; you will have
to measure that yourself.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

rikki@macom1.UUCP (R. L. Welsh) (04/11/89)

From article <24110@beta.lanl.gov>, by dxxb@beta.lanl.gov (David W. Barts):
> 
> How many files can there be in a single UNIX directory
...

You will undoubtedly run out of inodes before you reach any theoretical
limit.  Every new file you create will use up one inode.  If you are
seriously contemplating having a huge number of files (be they in one
directory or many), you may have to remake a filesystem to have enough
inodes -- see mkfs(1M), in particular the argument blocks:inodes.  The
optional ":inodes" part is often left off and the defaults taken.  My
manual (old ATT Sys V) says that the maximum number of inodes is
65500.

Also (on Sys V) do "df -t" to check how many inodes your filesystem
currently accomodates.
-- 
	- Rikki	(UUCP: grebyn!macom1!rikki)

dxxb@beta.lanl.gov (David W. Barts) (04/11/89)

Thanks to everyone who responded to my question.  As several 
responses have pointed out, the only limit is imposed by file
size; however, things get painfully slow well before the directory
size reaches the maximum file size.

David W. Barts  N5JRN, Ph. 509-376-1718 (FTS 444-1718), dxxb@beta.lanl.GOV
BCS Richland Inc.                  |       603 1/2 Guernsey St.
P.O. Box 300, M/S A2-90            |       Prosser, WA  99350
Richland, WA  99352                |       Ph. 509-786-1024

lm@snafu.Sun.COM (Larry McVoy) (04/12/89)

>In article <24110@beta.lanl.gov> dxxb@beta.lanl.gov (David W. Barts) writes:
>>How many files can there be in a single UNIX directory ....
>>I realize that as directories get bigger, they slow down, but
>>how much?  Just what IS the maximum directory size?

If you are on a POSIX system, try this

#include <unistd.h>

dirsentries(dirpath)
	char *dirpath;
{
	return pathconf(dirpath, _PC_LINK_MAX);
}

Unfortunately, on systems that allow entries up to the file size, pathconf
will almost certainly return -1 (indicating "infinity").  But machines with
a hard limit should give you that limit.

Larry McVoy, Lachman Associates.			...!sun!lm or lm@sun.com

kremer@cs.odu.edu (Lloyd Kremer) (04/12/89)

In article <24110@beta.lanl.gov> dxxb@beta.lanl.gov (David W. Barts) writes:
>How many files can there be in a single UNIX directory

In article <16839@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes:
>The maximum size is the same as for files, namely 2^31 - 1 (2147483647)
>bytes.  (This is due to the use of a signed 32 bit integer for off_t.


Point of curiosity:

Why was it decided that off_t should be signed?  Why should it not be
unsigned long where unsigned longs are supported, or unsigned int where int
is a 32 bit quantity?  It seems that signed long imposes an unnecessary
2GB limit on file size.

There are many devices having a capacity greater than 4 or 5 GB.  It seems
reasonable that one might want a file greater than 2GB on such a device,
such as the product of something akin to 'tar -cf' of a whole filesystem.

And it doesn't make sense to have a negative offset into a file.  The
only exception that comes to mind is that of returning an error code from
a function like lseek(), and this special case could be macro'd like

	#define SEEK_ERR ((off_t)(-1))

in <sys/types.h> or <sys/stat.h>.

					Just curious,

					Lloyd Kremer
					Brooks Financial Systems
					{uunet,sun,...}!xanth!brooks!lloyd

m5@lynx.uucp (Mike McNally) (04/12/89)

In article <8420@xanth.cs.odu.edu> kremer@cs.odu.edu (Lloyd Kremer) writes:
>Why was it decided that off_t should be signed?  . . .
>And it doesn't make sense to have a negative offset into a file . . .

Except when performing relative seeks:

    newpos = lseek(fd, (off_t) distance, L_INCR);

I suppose one could create a new type for this purpose, and iterate to
traverse distances greater than 2GB.

Of course, an implementation with 64-bit longs would probably stave off
these complaints for quite some time.

-- 
Mike McNally                                    Lynx Real-Time Systems
uucp: {voder,athsys}!lynx!m5                    phone: 408 370 2233

            Where equal mind and contest equal, go.

peter@ficc.uu.net (Peter da Silva) (04/12/89)

In article <8420@xanth.cs.odu.edu>, kremer@cs.odu.edu (Lloyd Kremer) writes:
> And it doesn't make sense to have a negative offset into a file.  The
> only exception that comes to mind is that of returning an error code from
> a function like lseek()...

Speaking of lseek, how about this case:

	lseek(fd, -1024L, 2);	/* Seek to last 1K of file */

It would negatively impact the performance of tail(1) to not have this
ability.
-- 
Peter da Silva, Xenix Support, Ferranti International Controls Corporation.

Business: uunet.uu.net!ficc!peter, peter@ficc.uu.net, +1 713 274 5180.
Personal: ...!texbell!sugar!peter, peter@sugar.hackercorp.com.

guy@auspex.auspex.com (Guy Harris) (04/13/89)

>Why was it decided that off_t should be signed?  Why should it not be
>unsigned long where unsigned longs are supported, or unsigned int where int
>is a 32 bit quantity?  It seems that signed long imposes an unnecessary
>2GB limit on file size.

And an unsigned long on a machine where that's 32 bits long imposes a
4GB limit on file size.  If 2GB is a limitation, I suspect 4GB will be
one shortly.... 

>And it doesn't make sense to have a negative offset into a file.  The
>only exception that comes to mind is that of returning an error code from
>a function like lseek(),

"lseek" also provides another exception.  Its second argument is an
"off_t", and since it can use that argument either as an absolute offset
in the file (i.e., relative to the beginning of the file) *or* as a
possibly-negative offset relative to the current position in the file or
to the end of the file, it has to be signed. 

Making "off_t" unsigned is a stopgap measure with its own problems.  If
(when) offsets > 2GB are needed, the fix will probably be to go to
64-bit offsets of some sort.

rml@hpfcdc.HP.COM (Bob Lenk) (04/13/89)

>>How many files can there be in a single UNIX directory ....
>	return pathconf(dirpath, _PC_LINK_MAX);

This will tell you how many links you can make *to* dirpath, not how
many links you can make *in* dirpath.  There might be a tangential
relationship to the number of subdirectories you can make in dirpath,
but don't even count on that.

		Bob Lenk
		hplabs!hpfcla!rml
		rml%hpfcla@hplabs.hp.com

andrew@alice.UUCP (Andrew Hume) (04/13/89)

in the fifth edition, directories that could no longer fit in the directly
mapped blocks caused unix to crash.

nowadays, the only reason not have huge directories is that they
make a lot of programs REAL slow; it takes time to scan all those dirents.

les@chinet.chi.il.us (Leslie Mikesell) (04/13/89)

In article <3823@ficc.uu.net> peter@ficc.uu.net (Peter da Silva) writes:

>Speaking of lseek, how about this case:

>	lseek(fd, -1024L, 2);	/* Seek to last 1K of file */
>
>It would negatively impact the performance of tail(1) to not have this
>ability.

There are still a few values that could be used without overloading
the capacity of the 3rd argument to lseek(). 
How about adding:

  3 = the pointer is set to its current location minus offset
  4 = the pointer is set to the size of the file minus offset
?

Les Mikesell

rec@dg.dg.com (Robert Cousins) (04/14/89)

In article <9195@alice.UUCP> andrew@alice.UUCP (Andrew Hume) writes:
>
>
>in the fifth edition, directories that could no longer fit in the directly
>mapped blocks caused unix to crash.
>
>nowadays, the only reason not have huge directories is that they
>make a lot of programs REAL slow; it takes time to scan all those dirents.

There is a more real limit to directory sizes in the System V file system:
There can only be 64K inodes per file system.  As I recall (and it has
been a while since I actually looked at it), the directory entry was
something like this:

	struct dirent {
		unsigned short inode; /* or some special 16 bit type */
		char filename[14];
	}

which yielded a 16 byte entry.  Since there is a maximum number of links
to a file (2^10 or 1024?), then the absolute maximum would be:

	64K * 1024 * 16 = 2 ^ 16 * 2 ^ 10 * 2 ^ 4 = 2 ^ 22 = 4 megabytes

This brings up one of the major physical limiations of the System V 
file system:  if you can have 2 ^ 24 blocks, and only 2 ^ 16 discrete
files, then to harness the entire file system space, each file will
(on average) have to be 2 ^ 8 blocks long or 128 K.  Since we know that
about 85% of all files on most unix systems are less than 8K and about 
half are under 1K, I personnally feel that the 16 bit inode number is
a severe handicap.

Comments?

Robert Cousins

Speaking for myself alone.

dg@lakart.UUCP (David Goodenough) (04/14/89)

From article <8420@xanth.cs.odu.edu>, by kremer@cs.odu.edu (Lloyd Kremer):
> In article <16839@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes:
>>The maximum size is the same as for files, namely 2^31 - 1 (2147483647)
>>bytes.  (This is due to the use of a signed 32 bit integer for off_t.
> 
> Why was it decided that off_t should be signed?

I'd imagine an unsigned off_t would make seeking backwards along a file a
bit messy.

Comments Chris??
-- 
	dg@lakart.UUCP - David Goodenough		+---+
						IHS	| +-+-+
	....... !harvard!xait!lakart!dg			+-+-+ |
AKA:	dg%lakart.uucp@xait.xerox.com		  	  +---+

andrew@frip.wv.tek.com (Andrew Klossner) (04/15/89)

Larry McVoy writes:

>> How many files can there be in a single UNIX directory ....

> If you are on a POSIX system, try this
> #include <unistd.h>
> dirsentries(dirpath)
> 	char *dirpath;
> {
> 	return pathconf(dirpath, _PC_LINK_MAX);
> }

This will tell you how many directories a directory can contain, not
how many files.  Adding a file to a directory does not increment its
link count.

  -=- Andrew Klossner   (uunet!tektronix!orca!frip!andrew)      [UUCP]
                        (andrew%frip.wv.tek.com@relay.cs.net)   [ARPA]

lm@snafu.Sun.COM (Larry McVoy) (04/16/89)

In article <5980047@hpfcdc.HP.COM> rml@hpfcdc.HP.COM (Bob Lenk) writes:
>>>How many files can there be in a single UNIX directory ....
>>	return pathconf(dirpath, _PC_LINK_MAX);
>
>This will tell you how many links you can make *to* dirpath, not how
>many links you can make *in* dirpath.  There might be a tangential
>relationship to the number of subdirectories you can make in dirpath,
>but don't even count on that.

This simply isn't true.  The pathconf LINK_MAX has different meanings
depending on the path.  If you ask about anything but a directory you
get the number of links you may make to the file (MAXLINKS or whatever 
your kernel calls it).  If you ask about a directory the answer must be
the number of entries allowed in the specified directory, *not* the 
number of links allowed to the directory.  This information was obtained
(by me) from the person responsible for that page of POSIX and he has 
assured me that this is the intent.  POSIX kernel people take note.

Larry McVoy, Lachman Associates.			...!sun!lm or lm@sun.com

allbery@ncoast.ORG (Brandon S. Allbery) (04/18/89)

As quoted from <6576@cbmvax.UUCP> by grr@cbmvax.UUCP (George Robbins):
+---------------
| In article <24110@beta.lanl.gov> dxxb@beta.lanl.gov (David W. Barts) writes:
| > How many files can there be in a single UNIX directory
| > (I realize this may depend on the variety of UNIX; I expect
| > the Berkeley fast file system would allow more)?  I need
| > a better answer than "a lot" or "at least 2000", if possible.
| 
| At least 33,000  8-) 
| 
| I recently played with an archive of comp.sys.amiga from day 1 and
| it was on this order.  
+---------------

System V has no limit, aside from maximum file size (as modified by ulimit,
presumably).  As a PRACTICAL limit, when your directory goes triple-indirect,
it is too slow to search in a reasonable amount of time.  Assuming the
standard 2K block size of SVR3, this is (uhh, let's see... 2048 bytes/block
/ 16 bytes/dirent = 128 dirent/block; times 10 is 1280 dirent direct, add
single-indirect = 128 * 512 pointers/block [2048 / 4 bytes/pointer] = 65,536
entries single-direct; multiply that by 512 to get double-indirect)
33,621,248 directory entries before you go triple-indirect.  (I personally
think that even going single-indirect gets too slow; 1280 directory entries
is more than I ever wish to see in a single directory!  But even limiting to
single-indirect blocks, you get 66,816 directory entries.)  (I included the
math deliberately; that number looks way too large to me, even though I
worked the math twice.  Maybe someone else in this newsgroup can
double-check.  Of course, I'm no Obnoxious Math Grad Student ;-)

The Berkeley FFS is still based on direct and indirect blocks (it's how
they're arranged on the disk that speeds things up); however, directory
entries are not fixed in size in the standard FFS.  (I have seen FFS with
System V directory entries; the two aren't necessarily linked.  But they
usually are, as flexnames are nicer than a 14-character maximum.)  You can't
simply calculate a number; you must figure the lengths of filenames -- and
the order of deletions and additions combined with file name lengths can
throw in jokers, at least on systems without directory compaction.

I have no doubt that if I screwed up somewhere, we'll both hear about it. ;-)

++Brandon
-- 
Brandon S. Allbery, moderator of comp.sources.misc	     allbery@ncoast.org
uunet!hal.cwru.edu!ncoast!allbery		    ncoast!allbery@hal.cwru.edu
      Send comp.sources.misc submissions to comp-sources-misc@<backbone>
NCoast Public Access UN*X - (216) 781-6201, 300/1200/2400 baud, login: makeuser

allbery@ncoast.ORG (Brandon S. Allbery) (04/18/89)

As quoted from <13577@ncoast.ORG> by allbery@ncoast.ORG (Brandon S. Allbery):
+---------------
| System V has no limit, aside from maximum file size (as modified by ulimit,
| presumably).  As a PRACTICAL limit, when your directory goes triple-indirect,
>--------------------------------------------------------------^^^^^^  HAH!
| it is too slow to search in a reasonable amount of time.  Assuming the
| standard 2K block size of SVR3, this is (uhh, let's see... 2048 bytes/block
| / 16 bytes/dirent = 128 dirent/block; times 10 is 1280 dirent direct, add
| single-indirect = 128 * 512 pointers/block [2048 / 4 bytes/pointer] = 65,536
| entries single-direct; multiply that by 512 to get double-indirect)
| 33,621,248 directory entries before you go triple-indirect.
+---------------

I completely forgot about the inode limit.  System V limits you to 65,535
inodes per file system; as a result, your directory will never go
double-indirect.  However, even single-indirect is noticeably slower than
direct blocks.

++Brandon
-- 
Brandon S. Allbery, moderator of comp.sources.misc	     allbery@ncoast.org
uunet!hal.cwru.edu!ncoast!allbery		    ncoast!allbery@hal.cwru.edu
      Send comp.sources.misc submissions to comp-sources-misc@<backbone>
NCoast Public Access UN*X - (216) 781-6201, 300/1200/2400 baud, login: makeuser

root@helios.toronto.edu (Operator) (04/19/89)

In article <4822@macom1.UUCP> rikki@macom1.UUCP (R. L. Welsh) writes:
>From article <24110@beta.lanl.gov>, by dxxb@beta.lanl.gov (David W. Barts):
>> 
>> How many files can there be in a single UNIX directory
>
>You will undoubtedly run out of inodes before you reach any theoretical
>limit.  

Another thing you may run into is that some UNIX utilities seem to store
the names of all of the files somewhere before they do anything with them,
and if there are a lot of files in the directory, you won't be able to
run the utility on all of them at once. (This won't prevent you from creating
them, though). In particular I am thinking of "rm". When cleaning up after
installing the NAG library, I tried to "rm *" in the source code directory.
It refused (I think the error was "too many files"). I had to go through and 
"rm a*", "rm b*" etc. until it was down to a level that rm would accept. I 
found this surprising. In at least the case of wildcard matching, why wouldn't 
it just read each name from the directory file in sequence, comparing each for 
a match, and deleting it if it was? Having to buffer *all* the names builds in 
an inherent limit such as the one I ran into, unless one uses a linked list 
or some such.

Does anyone know:
     1. why "rm" does it this way, and
     2. are there other utilities similarly affected?

I don't know exactly how many files were in the directory, but it was many
hundreds.
-- 
 Ruth Milner          UUCP - {uunet,pyramid}!utai!helios.physics!sysruth
 Systems Manager      BITNET - sysruth@utorphys
 U. of Toronto        INTERNET - sysruth@helios.physics.utoronto.ca
  Physics/Astronomy/CITA Computing Consortium

dansmith@well.UUCP (Dan "Bucko" Smith) (04/19/89)

>As quoted from <6576@cbmvax.UUCP> by grr@cbmvax.UUCP (George Robbins):
>+---------------
>| In article <24110@beta.lanl.gov> dxxb@beta.lanl.gov (David W. Barts) writes:
>| > How many files can there be in a single UNIX directory
>| > (I realize this may depend on the variety of UNIX; I expect
>| > the Berkeley fast file system would allow more)?  I need
>| > a better answer than "a lot" or "at least 2000", if possible.
>| 
>| At least 33,000  8-) 
[etc...]

	One thing to keep in mind that I haven't seen brought up yet...

	You don't want so many files in a directory that a shell
script would break on them.  You can really get stung if you have
some shell script that goes along with your application that
is going to break the first time you try to set a variable
equal to '*', and you have more than 10k or so worth of filenames.
Yep, there are workarounds (xargs, patch the shell to allow very large (64k)
variables), but if it were me, I'd really look for a way of breaking
things up in subdirectories.  Can you imagine a days worth of
Usenet articles in one directory?  It boggles the mind :-)

				dan
-- 
                         Dan "Bucko" Smith
     well!dansmith  unicom!daniel@pacbell.com  daniel@island.uu.net
ph: (415) 332 3278 (h), 258 2176 (w) disclaimer: Island's coffee was laced :-)
My mind likes Cyberstuff, my eyes films, my hands guitar, my feet skiing...

les@chinet.chi.il.us (Leslie Mikesell) (04/20/89)

In article <776@helios.toronto.edu> sysruth@helios.physics.utoronto.ca (Ruth Milner) writes:

[ rm * fails with large number of files..]

>Does anyone know:
>     1. why "rm" does it this way, and
>     2. are there other utilities similarly affected?

Actually the shell expands the * and can't pass the resulting list to
rm because there is a fixed limit to command line arguments.  All
programs would be affected in the same way, except those where you
quote the wildcard to prevent shell expansion (find -name '*' would
be the common case, and the -exec operator can be used to operate
on each file, or if you have xargs you can -print |xargs command).

However, if your version of unix doesn't automatically compress
directories (SysV doesn't) you should rm -r the whole directory
or the empty entries will continue to waste space.

Les Mikesell

rwhite@nusdhub.UUCP (Robert C. White Jr.) (04/21/89)

> In article <776@helios.toronto.edu> sysruth@helios.physics.utoronto.ca (Ruth Milner) writes:
>>When cleaning up after
>>installing the NAG library, I tried to "rm *" in the source code directory.
>>It refused (I think the error was "too many files").

The shell cant make an argumetn list that long... do the following:

ls | xargs rm

The ls will produce a list of files to standard output and xargs will
repeatedly call it's arguments as a command with as many additional
arguments as it can, taking these additional arguments from it's
standard input...

WALLHAH! rm of a long directory.

weaver@prls.UUCP (Michael Weaver) (04/22/89)

Note that although Aegis 9 and below had strict limits on the number 
of directory entries, Aegis 10, the latest version, is supposed to 
allow any number of files, as long as you've got the disk space.
'Just like real Unix' (almost, no inodes).


-- 
Michael Gordon Weaver                     Phone: (408) 991-3450
Signetics/Philips Components              Usenet: ...!mips!prls!weaver
811 East Arques Avenue, Bin 75
Sunnyvale CA 94086 USA

guy@auspex.auspex.com (Guy Harris) (04/23/89)

>Yep, there are workarounds (xargs, patch the shell to allow very large (64k)
>variables),

It's not always just the shell; it may be the kernel (or whatever
implements "exec").  The Bourne shell has no wired-in limit; most
flavors of UNIX have a limit between 5,120 characters and 20,480
characters.  (SunOS has a limit of 1MB; while the Bourne shell will
cheerfully let you use all of it, the C shell has its own limitations,
alas.)

>but if it were me, I'd really look for a way of breaking
>things up in subdirectories.

This is a good idea, since directory searching gets slower the more
entries there are in a directory.

news@brian386.UUCP (Wm. Brian McCane) (04/27/89)

In article <776@helios.toronto.edu> sysruth@helios.physics.utoronto.ca (Ruth Milner) writes:
>In article <4822@macom1.UUCP> rikki@macom1.UUCP (R. L. Welsh) writes:
=>From article <24110@beta.lanl.gov>, by dxxb@beta.lanl.gov (David W. Barts):
==> 
==> How many files can there be in a single UNIX directory
=>
=>You will undoubtedly run out of inodes before you reach any theoretical
=>limit.  
>
>Another thing you may run into is that some UNIX utilities seem to store
>the names of all of the files somewhere before they do anything with them,
>and if there are a lot of files in the directory, you won't be able to
>run the utility on all of them at once. (This won't prevent you from creating
>them, though). In particular I am thinking of "rm". When cleaning up after
>installing the NAG library, I tried to "rm *" in the source code directory.
>It refused (I think the error was "too many files"). I had to go through and 
>"rm a*", "rm b*" etc. until it was down to a level that rm would accept. I 
>
>Does anyone know:
>     1. why "rm" does it this way, and
>     2. are there other utilities similarly affected?
>
> Ruth Milner          UUCP - {uunet,pyramid}!utai!helios.physics!sysruth


You didn't actually run into a "rm" bug/feature, you hit a shell
FEECHER.  The shell expands for the regexp, and then passes the
generated list to the exec'd command as the arguments.  "rm" can only
handle a limited number of files, (or it may be the shell will only pass
a limited number, who knows, its a FEECHER after all ;-), so rm then
gave the error message of too many filenames.  I would like it if "rm"
were similar to most other commands, ie. you could rm "*", preventing
the expansion of the * to all file names until "rm" got it, but it
returns the message "rm: * non-existent" on my machine, Sys5r3.0.

	brian

(HMmmm.  That new version of "rm" I mentioned sounded kinda useful, I
wonder if anyone out there has 1 already?? HINT ;-)


-- 
Wm. Brian McCane                    | Life is full of doors that won't open
                                    | when you knock, equally spaced amid
Disclaimer: I don't think they even | those that open when you don't want
            admit I work here.      | them to. - Roger Zelazny "Blood of Amber"

guy@auspex.auspex.com (Guy Harris) (05/02/89)

>I would like it if "rm" were similar to most other commands, ie. you
>could rm "*", preventing the expansion of the * to all file names
>until "rm" got it,

Uhh, to what other commands are you referring?  Most UNIX commands don't
know squat about expanding "*"; they rely on the shell to do so, and
merely know about taking lists of file names as arguments.  Other OSes
do things differently; perhaps that's what you're thinking of?

allbery@ncoast.ORG (Brandon S. Allbery) (05/05/89)

As quoted from <432@brian386.UUCP> by news@brian386.UUCP (Wm. Brian McCane):
+---------------
| >Another thing you may run into is that some UNIX utilities seem to store
| >the names of all of the files somewhere before they do anything with them,
| 
| You didn't actually run into a "rm" bug/feature, you hit a shell
| FEECHER.  The shell expands for the regexp, and then passes the
| generated list to the exec'd command as the arguments.  "rm" can only
| handle a limited number of files, (or it may be the shell will only pass
| a limited number, who knows, its a FEECHER after all ;-), so rm then
+---------------

Sorry, it's a kernel limitation.

The combined size of all elements of argv[] must be less than some size
(I have seen 1024, 5120, and 10240 bytes on various systems).  This limit is
enforced by the execve() system call (from which all the other exec*() calls
are derived).  If the argument list is longer than this limit, exec()
returns an error which the shell (NOT rm) reports back to the user.

+---------------
| gave the error message of too many filenames.  I would like it if "rm"
| were similar to most other commands, ie. you could rm "*", preventing
| the expansion of the * to all file names until "rm" got it, but it
| returns the message "rm: * non-existent" on my machine, Sys5r3.0.
+---------------

Most other WHAT commands?  MS-DOS?  VMS?  *Certainly* not Unix commands.

The advantage of making the shell expand wildcards like * is that the code
need only be in the shell, and not enlarging the size of every utility which
might have to parse filenames.  In these days of shared libraries, that may
not be as necessary as it used to be; however, having it in one place does
insure that all utilities expand filenames in the same consistent way
without any extra work on the part of the programmer.

++Brandon
-- 
Brandon S. Allbery, moderator of comp.sources.misc	     allbery@ncoast.org
uunet!hal.cwru.edu!ncoast!allbery		    ncoast!allbery@hal.cwru.edu
      Send comp.sources.misc submissions to comp-sources-misc@<backbone>
NCoast Public Access UN*X - (216) 781-6201, 300/1200/2400 baud, login: makeuser