bet@orion.mc.duke.edu (Bennett Todd) (06/15/89)
In article <4530@ficc.uu.net>, peter@ficc (Peter da Silva) writes: >I've added a cumulative total > > 392 1 392 0.57% > ... >1533 14 65848 94.96% <-- Covers most bases. > ... > 23 30 69284 99.92% <-- Covers virtually all bases. > ... > 1 51 69340 100.00% > >14 corresponds to SysV. 30 corresponds to SysV with DIRSIZ doubled. There were >56 files, or 0.08%, that were longer than this. Out of curiosity I ran this over the 2187 files under my home directory; some of the statistics came out a little differently. Specifically, the ones shown above come out like so for me: 1 237 10.84% 237 10.84% ... 14 55 2.51% 1805 82.53% ... 30 4 0.18% 2166 99.04% ... 53 1 0.05% 2187 100.00% (I just noticed my column ordering is different; I used the awk program someone posted, which I append at the end). The 14-character long names only handle ~83% of my filenames (this includes directory names, and in particular includes "." and ".." for every directory, so there is some structural weighting acting against my statistics here). Further, the 30 character names still left nearly 1% of my choices, 21 out of 2187, chopped. Some of our users would show much higher filename length distributions, others lower. Having a shell with filename completion certainly removes much of the incentive for short, cryptic filenames. Also, I personally think that collecting statistics like this should be done over home directories, not over everything below root, since many of the filenames in the root and /usr filesystems are inherited from the original UNIX system, rather than chosen since. Further, the most useful place for really large filenames I've seen is in organizing personal archives, where you can make the name sufficiently descriptive to make it easier to find later. For completeness, here's the program I used (a shell script I wrapped around an awk program someone else posted): #!/bin/sh progname=`basename $0` awkprg=/tmp/$progname$$ trap "rm -f $awkprg;exit 1" 0 1 2 3 cat >$awkprg <<'EOF' BEGIN {FS = "/"} { l = length($NF) c[l]++ if(l>max) max=l } END { for(i=1; i<=max; i++) { s += c[i] printf("%2d %5d %5.2f%% %5d %6.2f%%\n", i, c[i], c[i]/NR*100, s, s/NR*100) } } EOF if test $# -eq 0 then set '.' fi find "$@" -print | awk -f $awkprg rm -f $awkprg trap "" 0 1 2 3 exit 0 -Bennett bet@orion.mc.duke.edu P.S. Tonight I'm going to run the same thing over everyone's home directories on our system, as well as over everything from the root down; I'll post the results tomorrow if all goes well.
bet@orion.mc.duke.edu (Bennett Todd) (06/15/89)
In article <14749@duke.cs.duke.edu>, I wrote: >[...] >The 14-character long names only handle ~83% of my filenames (this >includes directory names, and in particular includes "." and ".." for >every directory, so there is some structural weighting acting against my >statistics here). ...which is of course completely wrongo. Thanks to Matt Crawford for pointing this out to me in a very polite letter. Sorry about this misinformation. Find(1) is of course smart enough to refrain from reporting "." and ".."; indeed, I shouldn't have even had to check to see what its behavior is. Upon thinking about it even briefly, it becomes obvious that many, even most of the uses to which find(1) is put would be broken if it didn't omit "." and ".." (and emitted them :-). -Bennett bet@orion.mc.duke.edu (1) Start brain (2) Engage mouth Do not perform in reverse order.
bet@orion.mc.duke.edu (Bennett Todd) (06/15/89)
I said I would post statistics over all our home directories, and over the whole system, when the runs finished. Well, they finished much quicker than I had feared, and I looked them over. Basically, they agreed much more closely with Peter da Silva's figures than with those over my home directory. I guess I'm an anomaly:-). I still think fixed-length filenames belong in the same category as fixed-length line buffers in editors and suchlike; a reasonable design compromise for a first revision, to prove the concept and keep the prototype implementation simple, but not a desirable limitation in a final production version. And, like with line length limits in editors, I believe that a GOOD implementation won't inflict either undue code bulk or undue speed degradation, as the price of going to a dynamically allocated varying length implementation. -Bennett bet@orion.mc.duke.edu
dik@cwi.nl (Dik T. Winter) (06/15/89)
In article <14752@duke.cs.duke.edu> bet@orion.mc.duke.edu (Bennett Todd) writes: > ...which is of course completely wrongo. Thanks to Matt Crawford for > pointing this out to me in a very polite letter. Sorry about this > misinformation. Find(1) is of course smart enough to refrain from > reporting "." and ".."; indeed, ... Wrongo again. Of course find is smart enough to include ".". -- dik t. winter, cwi, amsterdam, nederland INTERNET : dik@cwi.nl BITNET/EARN: dik@mcvax
dik@cwi.nl (Dik T. Winter) (06/15/89)
In article <8192@boring.cwi.nl> I write: > Wrongo again. Of course find is smart enough to include ".". Wrong of course. I should never have written this. If I could cancel, I would. -- dik t. winter, cwi, amsterdam, nederland INTERNET : dik@cwi.nl BITNET/EARN: dik@mcvax
allbery@ncoast.ORG (Brandon S. Allbery) (06/20/89)
As quoted from <8192@boring.cwi.nl> by dik@cwi.nl (Dik T. Winter): +--------------- | In article <14752@duke.cs.duke.edu> bet@orion.mc.duke.edu (Bennett Todd) writes: | > misinformation. Find(1) is of course smart enough to refrain from | > reporting "." and ".."; indeed, | ... | Wrongo again. Of course find is smart enough to include ".". +--------------- Sigh. Find includes "." ONLY if you say "find . (...)". ONE instance, maximum. (If you were correct then find would have output like: /foo/bar /foo/bar/. /foo/bar/baz ... and watch everything that uses find break!) ++Brandon -- Brandon S. Allbery, moderator of comp.sources.misc allbery@ncoast.org uunet!hal.cwru.edu!ncoast!allbery ncoast!allbery@hal.cwru.edu Send comp.sources.misc submissions to comp-sources-misc@<backbone> NCoast Public Access UN*X - (216) 781-6201, 300/1200/2400 baud, login: makeuser