[comp.sys.ibm.pc] Large number of files slows machine.

tim@j.cc.purdue.edu (Timothy Lange) (12/15/88)

I am dealing with an user that has around 650 files in a subdirectory.
We noticed that accessing the files at the bottom of the DIR listing
is much slower than the files near the beginning.  The performance
really drops off at about 512th file.  Now I know that many files in a
directory is not good, so don't flame me.  I also know that a
directory entry take 32 bytes, a sector is 512 bytes, and on this
machine a cluster is 4 sectors.  He has 'BUFFERS=40' in his config.sys
file.  What is magical about 512 files?  I cannot figure out why that
number is significant (but it looks good!)  The first 500 or so files
are accessed quite rapidly, the rest have a terrible access time.  In
fact, for files of equal size, I can make a copy of the first ten
files ten times quicker than the last ten.  I would love some info
that explains my file access times.

Tim.

-- 
Timothy Lange / Purdue University Computing Center / Mathematical Sciences Bldg
West Lafayette, IN  47907 / 317-494-1787 / tim@j.cc.purdue.edu / CIS 75410,525

ward@chinet.chi.il.us (Ward Christensen) (12/17/88)

In article <8545@j.cc.purdue.edu> tim@j.cc.purdue.edu (Timothy Lange) writes:
>I am dealing with an user that has around 650 files in a subdirectory.
>We noticed that accessing the files at the bottom of the DIR listing
>is much slower than the files near the beginning.  The performance
>really drops off at about the 512th file.
  That's an easy one.  I have had it happen many times.  (of course it
wasn't easy at FIRST  ;-).
  What is happening is the fragmentation of your directory resulting in
many seeks while processing the directory.
  THe solution is very simple: fewer files; or more realistically for your
application: compress the disk such as with Norton Advanced Utility's SD
(Speed Disk).  This brings the pieces of the directory together, and acts
quite fast.
  Good luck!

del@Data-IO.COM (Erik Lindberg) (12/20/88)

In article <7192@chinet.chi.il.us> ward@chinet.chi.il.us (Ward Christensen) writes:
>In article <8545@j.cc.purdue.edu> tim@j.cc.purdue.edu (Timothy Lange) writes:
>>I am dealing with an user that has around 650 files in a subdirectory.
>>We noticed that accessing the files at the bottom of the DIR listing
>>is much slower than the files near the beginning.  The performance
>>really drops off at about the 512th file.
>  That's an easy one.  I have had it happen many times.  (of course it
>wasn't easy at FIRST  ;-).
>  What is happening is the fragmentation of your directory resulting in
>many seeks while processing the directory.

This might seem to be what is happening on your system, but it isn't right.
I run a large disk cache on my system, the directories easily fit in cache
memory. I have observed the same behaviour: a massive increase in file
access time when the number of files in a subdirectory exceeds 512. And
that is with *NO* physical disk activity at all.

>  THe solution is very simple: fewer files; or more realistically for your
>application: compress the disk such as with Norton Advanced Utility's SD

Fewer files would be nice if the application would not suffer for it. I
doubt if SpeedDisk will provide significant improvements.

-- 
del (Erik Lindberg) 
uw-beaver!tikal!pilchuck!del

greggt@VAX1.CC.UAKRON.EDU (Gregg Thompson) (12/21/88)

	I have noticed on machines that have too many files in a directory
that if you increase the BUFFERS in config.sys that the problem goes away.
I usually set buffers to twice the files, and if there are a lot of files I
will increase as far as 99 for buffers.
-- 
To live is to die, to die is to live forever;			GRegg Thompson
Where will you spend eternity?			     greggt@vax1.cc.uakron.edu

simon@ms.uky.edu (Simon Gales) (12/21/88)

In article <8545@j.cc.purdue.edu> tim@j.cc.purdue.edu (Timothy Lange) writes:
>I am dealing with an user that has around 650 files in a subdirectory.
>We noticed that accessing the files at the bottom of the DIR listing
>is much slower than the files near the beginning.  The performance
>really drops off at about the 512th file.

Using norton's SD to de-fragment your disk mmay help, but that probably
isn't the problem.  DOS is having to search through a _lot_ of files
to find the one it wants to open.  If the directory is in your path,
it should at least be the last one.

If you aren't accessing all the files often, sort the directory so that
the files you access most are at the beginning (with norton's ds).

Try using FASTOPEN to cache your directory entries, make sure you make
the cache size as big as your directory  (FASTOPEN c=700).

Best of all, try splitting the directory into two or more directories.

-- 
/--------------------------------------------------------------------------\
  Simon Gales@University of Ky         UUCP:   {rutgers, uunet}!ukma!simon 
                                       Arpa:   simon@ms.uky.edu 
  MaBell: (606) 263-2285/257-3597      BitNet: simon@UKMA.BITNET  

les@chinet.chi.il.us (Leslie Mikesell) (12/22/88)

In article <10723@s.ms.uky.edu> simon@ms.uky.edu (Simon Gales) writes:

>DOS is having to search through a _lot_ of files
>to find the one it wants to open.  If the directory is in your path,
>it should at least be the last one.

Is there any way to avoid having DOS search your current directory
(i.e. only look in the PATH) for programs?  This is especially a problem
when working in large directories over a network.

Les Mikesell

toma@tekgvs.GVS.TEK.COM (Tom Almy) (12/23/88)

In article <8545@j.cc.purdue.edu> tim@j.cc.purdue.edu (Timothy Lange) writes:
>I am dealing with an user that has around 650 files in a subdirectory.
>We noticed that accessing the files at the bottom of the DIR listing
>is much slower than the files near the beginning.  The performance
>really drops off at about the 512th file.

I have performed tests on my machine and found out that directory
searching performance drops off dramatically (*Orders of Magnitude*) when
there are more than 12*(number of block buffers specified in config.sys)
files in the directory.  Three suggestions (in order of value):

1). Move the files into multiple subdirectories.

2). Use a disk caching program -- this will typically improve all disk
    performance, but the performance drop off is still there, just not
    as dramatic.

3). Increase the number of block buffers with the config.sys buffers=
    command.  Beyond about 20, performance actually starts dropping, and
    you would need over 50!


Tom Almy
toma@tekgvs.TEK.COM
Standard Disclaimers Apply

ward@chinet.chi.il.us (Ward Christensen) (12/23/88)

In article <1078@pilchuck.Data-IO.COM> del@pilchuck.Data-IO.COM 
  (Erik Lindberg) writes about... (attributions getting too lengthy,
  Pruned)
  The original article wondered why a disk with >650 files/subdir was slow
doing DIR and other things.
  I stated:
>>  What is happening is the fragmentation of your directory resulting in
>>many seeks while processing the directory.
   while Erik stated "This might SEEM to be what is happening on your 
system, but it isn't right", and said that in a case where his directories
completely fit in cache (eliminating seek time), he found "a massive 
increase in file access time when the number of files in a subdirectory 
exceeds 512."
  To defend my initial comment (but not say Erik is wrong the way he said I
was): my Compuserve capture directory slowed WAY down at one time, and I 
finally traced it.  Deleting files from the directory didn't help, so I 
looked - and found my directory in 3 extents "all across the disk" (say, 
at 1/4, 1/2, and 3/4 the way across).
  By copying the files in the 3rd extent to a temp directory, then
manually patching the FAT to not point to the 3rd extent, then copying the
file back - so I had the SAME NUMBER of files but in 2 extents instead of
3, the speed improvement was very significant.
  So, Erik in MY case, the NUMBER OF DIRECTORY EXTENTS was VERY significant
- probably due to the slow seek time.  In YOUR case it was the sheer number
of files.  We're both right - like a car that can mis-fire from either fuel
or electrical problems, a hard disk can slow down from either too many
files or too many directory extents (and probably more things).
  Happy Holly-Daze to you all.

dave@westmark.UUCP (Dave Levenson) (12/24/88)

In article <7244@chinet.chi.il.us>, les@chinet.chi.il.us (Leslie Mikesell) writes:
> In article <10723@s.ms.uky.edu> simon@ms.uky.edu (Simon Gales) writes:
> 
> >DOS is having to search through a _lot_ of files
> >to find the one it wants to open.  If the directory is in your path,
> >it should at least be the last one.
> 
> Is there any way to avoid having DOS search your current directory
> (i.e. only look in the PATH) for programs?  This is especially a problem
> when working in large directories over a network.


The only way in standard MS-DOS is to use complete pathnames for
your executables.  For example, don't say:

DISKCOPY A: B:

but instead, if your dos commands are loaded, for example, in C:\DOS,
use the command:

C:\DOS\DISKCOPY A: B:

In this case, no search of the current directory or the PATH is
made.  This will speed things up if you use largs PATH values and of
your current directory is large, or on a slow (e.g. network) device.
It also makes you less likely to be hit by a trojan horse named
DISKCOPY that someone sneaks into your current directory!

-- 
Dave Levenson
Westmark, Inc.		The Man in the Mooney
Warren, NJ USA
{rutgers | att}!westmark!dave