dxxb@beta.lanl.gov (David W. Barts) (04/11/89)
How many files can there be in a single UNIX directory (I realize this may depend on the variety of UNIX; I expect the Berkeley fast file system would allow more)? I need a better answer than "a lot" or "at least 2000", if possible. (This concerns an application program we are currently running on an Apollo under Aegis; it depends on a LOT of files being in a single directory and Aegis's limit of 1500 or so can be a pain.) I realize that as directories get bigger, they slow down, but how much? Just what IS the maximum directory size? Thanks in advance, David W. Barts N5JRN, Ph. 509-376-1718 (FTS 444-1718), dxxb@beta.lanl.GOV BCS Richland Inc. | 603 1/2 Guernsey St. P.O. Box 300, M/S A2-90 | Prosser, WA 99350 Richland, WA 99352 | Ph. 509-786-1024
grr@cbmvax.UUCP (George Robbins) (04/11/89)
In article <24110@beta.lanl.gov> dxxb@beta.lanl.gov (David W. Barts) writes: > > How many files can there be in a single UNIX directory > (I realize this may depend on the variety of UNIX; I expect > the Berkeley fast file system would allow more)? I need > a better answer than "a lot" or "at least 2000", if possible. At least 33,000 8-) I recently played with an archive of comp.sys.amiga from day 1 and it was on this order. > I realize that as directories get bigger, they slow down, but > how much? Just what IS the maximum directory size? Yeah, it gets real slow and turns the whole system into a dog when you are accessing the directories. Still the time is finite, and the whole restore took maybe 16 hours (I had other stuff going on). The tape went from almost continual motion, to twitching a several times a minute... I seem to recall that the Mach people at CMU were dabbling with some kind of hashed directories or auxilliary hashing scheme, this would make it lots quicker. I don't know if there is a theoreticl maximum, expept that the directory must be smaller than the maximum possible filesize, though I am curious about what constitues an efficient limit so that if I build a directory tree with n entries at each level, what is a reasonable tradeoff between tree depth and search time. This was with Ultrix/BSD, I don't know what limits might pertain to Sys V and other varients. -- George Robbins - now working for, uucp: {uunet|pyramid|rutgers}!cbmvax!grr but no way officially representing arpa: cbmvax!grr@uunet.uu.net Commodore, Engineering Department fone: 215-431-9255 (only by moonlite)
chris@mimsy.UUCP (Chris Torek) (04/11/89)
In article <24110@beta.lanl.gov> dxxb@beta.lanl.gov (David W. Barts) writes: >How many files can there be in a single UNIX directory .... >I realize that as directories get bigger, they slow down, but >how much? Just what IS the maximum directory size? The maximum size is the same as for files, namely 2^31 - 1 (2147483647) bytes. (This is due to the use of a signed 32 bit integer for off_t. The limit is larger in some Unixes [Cray], but is usually smaller due to disk space limits.) Directory performance falls off somewhat at single indirect blocks, moreso at double indirects, and still more at triple indirects. It takes about 96 kbytes to go to single indirects in a 4BSD 8K/1K file system. Each directory entry requires a minimum of 12 bytes (4BSD) or exactly 16 bytes (SysV); 16 is a nice `typical' size, so divide 96*1024 by 16 to get 6144 entries before indirecting on a BSD 8K/1K file system. The actual slowdown due to indirect blocks is not clear; you will have to measure that yourself. -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@mimsy.umd.edu Path: uunet!mimsy!chris
rikki@macom1.UUCP (R. L. Welsh) (04/11/89)
From article <24110@beta.lanl.gov>, by dxxb@beta.lanl.gov (David W. Barts): > > How many files can there be in a single UNIX directory ... You will undoubtedly run out of inodes before you reach any theoretical limit. Every new file you create will use up one inode. If you are seriously contemplating having a huge number of files (be they in one directory or many), you may have to remake a filesystem to have enough inodes -- see mkfs(1M), in particular the argument blocks:inodes. The optional ":inodes" part is often left off and the defaults taken. My manual (old ATT Sys V) says that the maximum number of inodes is 65500. Also (on Sys V) do "df -t" to check how many inodes your filesystem currently accomodates. -- - Rikki (UUCP: grebyn!macom1!rikki)
dxxb@beta.lanl.gov (David W. Barts) (04/11/89)
Thanks to everyone who responded to my question. As several responses have pointed out, the only limit is imposed by file size; however, things get painfully slow well before the directory size reaches the maximum file size. David W. Barts N5JRN, Ph. 509-376-1718 (FTS 444-1718), dxxb@beta.lanl.GOV BCS Richland Inc. | 603 1/2 Guernsey St. P.O. Box 300, M/S A2-90 | Prosser, WA 99350 Richland, WA 99352 | Ph. 509-786-1024
lm@snafu.Sun.COM (Larry McVoy) (04/12/89)
>In article <24110@beta.lanl.gov> dxxb@beta.lanl.gov (David W. Barts) writes: >>How many files can there be in a single UNIX directory .... >>I realize that as directories get bigger, they slow down, but >>how much? Just what IS the maximum directory size? If you are on a POSIX system, try this #include <unistd.h> dirsentries(dirpath) char *dirpath; { return pathconf(dirpath, _PC_LINK_MAX); } Unfortunately, on systems that allow entries up to the file size, pathconf will almost certainly return -1 (indicating "infinity"). But machines with a hard limit should give you that limit. Larry McVoy, Lachman Associates. ...!sun!lm or lm@sun.com
kremer@cs.odu.edu (Lloyd Kremer) (04/12/89)
In article <24110@beta.lanl.gov> dxxb@beta.lanl.gov (David W. Barts) writes: >How many files can there be in a single UNIX directory In article <16839@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes: >The maximum size is the same as for files, namely 2^31 - 1 (2147483647) >bytes. (This is due to the use of a signed 32 bit integer for off_t. Point of curiosity: Why was it decided that off_t should be signed? Why should it not be unsigned long where unsigned longs are supported, or unsigned int where int is a 32 bit quantity? It seems that signed long imposes an unnecessary 2GB limit on file size. There are many devices having a capacity greater than 4 or 5 GB. It seems reasonable that one might want a file greater than 2GB on such a device, such as the product of something akin to 'tar -cf' of a whole filesystem. And it doesn't make sense to have a negative offset into a file. The only exception that comes to mind is that of returning an error code from a function like lseek(), and this special case could be macro'd like #define SEEK_ERR ((off_t)(-1)) in <sys/types.h> or <sys/stat.h>. Just curious, Lloyd Kremer Brooks Financial Systems {uunet,sun,...}!xanth!brooks!lloyd
andrew@alice.UUCP (Andrew Hume) (04/13/89)
in the fifth edition, directories that could no longer fit in the directly mapped blocks caused unix to crash. nowadays, the only reason not have huge directories is that they make a lot of programs REAL slow; it takes time to scan all those dirents.
rec@dg.dg.com (Robert Cousins) (04/14/89)
In article <9195@alice.UUCP> andrew@alice.UUCP (Andrew Hume) writes: > > >in the fifth edition, directories that could no longer fit in the directly >mapped blocks caused unix to crash. > >nowadays, the only reason not have huge directories is that they >make a lot of programs REAL slow; it takes time to scan all those dirents. There is a more real limit to directory sizes in the System V file system: There can only be 64K inodes per file system. As I recall (and it has been a while since I actually looked at it), the directory entry was something like this: struct dirent { unsigned short inode; /* or some special 16 bit type */ char filename[14]; } which yielded a 16 byte entry. Since there is a maximum number of links to a file (2^10 or 1024?), then the absolute maximum would be: 64K * 1024 * 16 = 2 ^ 16 * 2 ^ 10 * 2 ^ 4 = 2 ^ 22 = 4 megabytes This brings up one of the major physical limiations of the System V file system: if you can have 2 ^ 24 blocks, and only 2 ^ 16 discrete files, then to harness the entire file system space, each file will (on average) have to be 2 ^ 8 blocks long or 128 K. Since we know that about 85% of all files on most unix systems are less than 8K and about half are under 1K, I personnally feel that the 16 bit inode number is a severe handicap. Comments? Robert Cousins Speaking for myself alone.
andrew@frip.wv.tek.com (Andrew Klossner) (04/15/89)
Larry McVoy writes: >> How many files can there be in a single UNIX directory .... > If you are on a POSIX system, try this > #include <unistd.h> > dirsentries(dirpath) > char *dirpath; > { > return pathconf(dirpath, _PC_LINK_MAX); > } This will tell you how many directories a directory can contain, not how many files. Adding a file to a directory does not increment its link count. -=- Andrew Klossner (uunet!tektronix!orca!frip!andrew) [UUCP] (andrew%frip.wv.tek.com@relay.cs.net) [ARPA]
bph@buengc.BU.EDU (Blair P. Houghton) (04/15/89)
In article <127@dg.dg.com> rec@dg.UUCP (Robert Cousins) writes: > >This brings up one of the major physical limiations of the System V >file system: if you can have 2 ^ 24 blocks, and only 2 ^ 16 discrete >files, then to harness the entire file system space, each file will >(on average) have to be 2 ^ 8 blocks long or 128 K. Since we know that >about 85% of all files on most unix systems are less than 8K and about >half are under 1K, I personnally feel that the 16 bit inode number is >a severe handicap. > >Robert Cousins > >Speaking for myself alone. I'll stand behind you, big guy. I just hacked up a program to check out my filesizes, and I'll be damned if I didn't think my thing was real big... On the system I checked (the only one where I'm remotely "typical" :), I have 854 files, probably two dozen of them zero-length (the result of some automated VLSI-data-file processing). The mean is 10.2k, stdev is 60k (warped by a few megabyte-monsters), and the median is 992 bytes (do you also guess peoples' weight? :) Of these 854 files of mine, 84% are under 8000 bytes, and a paltry eight exceed the 128k "manufacturer's suggested inode load" you compute above. For another machine: 1740 files Median 1304 bytes Mean 7752 StDev 28857 77% < 8kB And only 4 (that's FOUR) over the 128k optimal mean. Hrmph. And I thought I was more malevolent than that. At least the sysadmins can't accuse me of being a rogue drain on the resources... Consider that "block" can be 1,2,4kB or more, and you're talking some BIIIG files we have to generate to be efficient with those blocknumbers. --Blair "...gon' go lick my wounded ego... and ponder ways to make file systems more efficient, or at least more crowded. ;-)"
allbery@ncoast.ORG (Brandon S. Allbery) (04/18/89)
As quoted from <6576@cbmvax.UUCP> by grr@cbmvax.UUCP (George Robbins): +--------------- | In article <24110@beta.lanl.gov> dxxb@beta.lanl.gov (David W. Barts) writes: | > How many files can there be in a single UNIX directory | > (I realize this may depend on the variety of UNIX; I expect | > the Berkeley fast file system would allow more)? I need | > a better answer than "a lot" or "at least 2000", if possible. | | At least 33,000 8-) | | I recently played with an archive of comp.sys.amiga from day 1 and | it was on this order. +--------------- System V has no limit, aside from maximum file size (as modified by ulimit, presumably). As a PRACTICAL limit, when your directory goes triple-indirect, it is too slow to search in a reasonable amount of time. Assuming the standard 2K block size of SVR3, this is (uhh, let's see... 2048 bytes/block / 16 bytes/dirent = 128 dirent/block; times 10 is 1280 dirent direct, add single-indirect = 128 * 512 pointers/block [2048 / 4 bytes/pointer] = 65,536 entries single-direct; multiply that by 512 to get double-indirect) 33,621,248 directory entries before you go triple-indirect. (I personally think that even going single-indirect gets too slow; 1280 directory entries is more than I ever wish to see in a single directory! But even limiting to single-indirect blocks, you get 66,816 directory entries.) (I included the math deliberately; that number looks way too large to me, even though I worked the math twice. Maybe someone else in this newsgroup can double-check. Of course, I'm no Obnoxious Math Grad Student ;-) The Berkeley FFS is still based on direct and indirect blocks (it's how they're arranged on the disk that speeds things up); however, directory entries are not fixed in size in the standard FFS. (I have seen FFS with System V directory entries; the two aren't necessarily linked. But they usually are, as flexnames are nicer than a 14-character maximum.) You can't simply calculate a number; you must figure the lengths of filenames -- and the order of deletions and additions combined with file name lengths can throw in jokers, at least on systems without directory compaction. I have no doubt that if I screwed up somewhere, we'll both hear about it. ;-) ++Brandon -- Brandon S. Allbery, moderator of comp.sources.misc allbery@ncoast.org uunet!hal.cwru.edu!ncoast!allbery ncoast!allbery@hal.cwru.edu Send comp.sources.misc submissions to comp-sources-misc@<backbone> NCoast Public Access UN*X - (216) 781-6201, 300/1200/2400 baud, login: makeuser
root@helios.toronto.edu (Operator) (04/19/89)
In article <4822@macom1.UUCP> rikki@macom1.UUCP (R. L. Welsh) writes: >From article <24110@beta.lanl.gov>, by dxxb@beta.lanl.gov (David W. Barts): >> >> How many files can there be in a single UNIX directory > >You will undoubtedly run out of inodes before you reach any theoretical >limit. Another thing you may run into is that some UNIX utilities seem to store the names of all of the files somewhere before they do anything with them, and if there are a lot of files in the directory, you won't be able to run the utility on all of them at once. (This won't prevent you from creating them, though). In particular I am thinking of "rm". When cleaning up after installing the NAG library, I tried to "rm *" in the source code directory. It refused (I think the error was "too many files"). I had to go through and "rm a*", "rm b*" etc. until it was down to a level that rm would accept. I found this surprising. In at least the case of wildcard matching, why wouldn't it just read each name from the directory file in sequence, comparing each for a match, and deleting it if it was? Having to buffer *all* the names builds in an inherent limit such as the one I ran into, unless one uses a linked list or some such. Does anyone know: 1. why "rm" does it this way, and 2. are there other utilities similarly affected? I don't know exactly how many files were in the directory, but it was many hundreds. -- Ruth Milner UUCP - {uunet,pyramid}!utai!helios.physics!sysruth Systems Manager BITNET - sysruth@utorphys U. of Toronto INTERNET - sysruth@helios.physics.utoronto.ca Physics/Astronomy/CITA Computing Consortium
les@chinet.chi.il.us (Leslie Mikesell) (04/20/89)
In article <776@helios.toronto.edu> sysruth@helios.physics.utoronto.ca (Ruth Milner) writes: [ rm * fails with large number of files..] >Does anyone know: > 1. why "rm" does it this way, and > 2. are there other utilities similarly affected? Actually the shell expands the * and can't pass the resulting list to rm because there is a fixed limit to command line arguments. All programs would be affected in the same way, except those where you quote the wildcard to prevent shell expansion (find -name '*' would be the common case, and the -exec operator can be used to operate on each file, or if you have xargs you can -print |xargs command). However, if your version of unix doesn't automatically compress directories (SysV doesn't) you should rm -r the whole directory or the empty entries will continue to waste space. Les Mikesell
rwhite@nusdhub.UUCP (Robert C. White Jr.) (04/21/89)
> In article <776@helios.toronto.edu> sysruth@helios.physics.utoronto.ca (Ruth Milner) writes: >>When cleaning up after >>installing the NAG library, I tried to "rm *" in the source code directory. >>It refused (I think the error was "too many files"). The shell cant make an argumetn list that long... do the following: ls | xargs rm The ls will produce a list of files to standard output and xargs will repeatedly call it's arguments as a command with as many additional arguments as it can, taking these additional arguments from it's standard input... WALLHAH! rm of a long directory.
weaver@prls.UUCP (Michael Weaver) (04/22/89)
Note that although Aegis 9 and below had strict limits on the number of directory entries, Aegis 10, the latest version, is supposed to allow any number of files, as long as you've got the disk space. 'Just like real Unix' (almost, no inodes). -- Michael Gordon Weaver Phone: (408) 991-3450 Signetics/Philips Components Usenet: ...!mips!prls!weaver 811 East Arques Avenue, Bin 75 Sunnyvale CA 94086 USA
news@brian386.UUCP (Wm. Brian McCane) (04/27/89)
In article <776@helios.toronto.edu> sysruth@helios.physics.utoronto.ca (Ruth Milner) writes: >In article <4822@macom1.UUCP> rikki@macom1.UUCP (R. L. Welsh) writes: =>From article <24110@beta.lanl.gov>, by dxxb@beta.lanl.gov (David W. Barts): ==> ==> How many files can there be in a single UNIX directory => =>You will undoubtedly run out of inodes before you reach any theoretical =>limit. > >Another thing you may run into is that some UNIX utilities seem to store >the names of all of the files somewhere before they do anything with them, >and if there are a lot of files in the directory, you won't be able to >run the utility on all of them at once. (This won't prevent you from creating >them, though). In particular I am thinking of "rm". When cleaning up after >installing the NAG library, I tried to "rm *" in the source code directory. >It refused (I think the error was "too many files"). I had to go through and >"rm a*", "rm b*" etc. until it was down to a level that rm would accept. I > >Does anyone know: > 1. why "rm" does it this way, and > 2. are there other utilities similarly affected? > > Ruth Milner UUCP - {uunet,pyramid}!utai!helios.physics!sysruth You didn't actually run into a "rm" bug/feature, you hit a shell FEECHER. The shell expands for the regexp, and then passes the generated list to the exec'd command as the arguments. "rm" can only handle a limited number of files, (or it may be the shell will only pass a limited number, who knows, its a FEECHER after all ;-), so rm then gave the error message of too many filenames. I would like it if "rm" were similar to most other commands, ie. you could rm "*", preventing the expansion of the * to all file names until "rm" got it, but it returns the message "rm: * non-existent" on my machine, Sys5r3.0. brian (HMmmm. That new version of "rm" I mentioned sounded kinda useful, I wonder if anyone out there has 1 already?? HINT ;-) -- Wm. Brian McCane | Life is full of doors that won't open | when you knock, equally spaced amid Disclaimer: I don't think they even | those that open when you don't want admit I work here. | them to. - Roger Zelazny "Blood of Amber"
guy@auspex.auspex.com (Guy Harris) (05/02/89)
>I would like it if "rm" were similar to most other commands, ie. you >could rm "*", preventing the expansion of the * to all file names >until "rm" got it, Uhh, to what other commands are you referring? Most UNIX commands don't know squat about expanding "*"; they rely on the shell to do so, and merely know about taking lists of file names as arguments. Other OSes do things differently; perhaps that's what you're thinking of?
allbery@ncoast.ORG (Brandon S. Allbery) (05/05/89)
As quoted from <432@brian386.UUCP> by news@brian386.UUCP (Wm. Brian McCane): +--------------- | >Another thing you may run into is that some UNIX utilities seem to store | >the names of all of the files somewhere before they do anything with them, | | You didn't actually run into a "rm" bug/feature, you hit a shell | FEECHER. The shell expands for the regexp, and then passes the | generated list to the exec'd command as the arguments. "rm" can only | handle a limited number of files, (or it may be the shell will only pass | a limited number, who knows, its a FEECHER after all ;-), so rm then +--------------- Sorry, it's a kernel limitation. The combined size of all elements of argv[] must be less than some size (I have seen 1024, 5120, and 10240 bytes on various systems). This limit is enforced by the execve() system call (from which all the other exec*() calls are derived). If the argument list is longer than this limit, exec() returns an error which the shell (NOT rm) reports back to the user. +--------------- | gave the error message of too many filenames. I would like it if "rm" | were similar to most other commands, ie. you could rm "*", preventing | the expansion of the * to all file names until "rm" got it, but it | returns the message "rm: * non-existent" on my machine, Sys5r3.0. +--------------- Most other WHAT commands? MS-DOS? VMS? *Certainly* not Unix commands. The advantage of making the shell expand wildcards like * is that the code need only be in the shell, and not enlarging the size of every utility which might have to parse filenames. In these days of shared libraries, that may not be as necessary as it used to be; however, having it in one place does insure that all utilities expand filenames in the same consistent way without any extra work on the part of the programmer. ++Brandon -- Brandon S. Allbery, moderator of comp.sources.misc allbery@ncoast.org uunet!hal.cwru.edu!ncoast!allbery ncoast!allbery@hal.cwru.edu Send comp.sources.misc submissions to comp-sources-misc@<backbone> NCoast Public Access UN*X - (216) 781-6201, 300/1200/2400 baud, login: makeuser