[comp.sys.amiga] AmigaDos directory knowledge

hamilton@intersil.uucp (Fred Hamilton) (11/26/89)

.....................................................C<................

I just remembered a question I've had since WorkBench 1.0.  When you type
a command like:

	List c:edd     ;(A typo, for instance)

You get an (almost) instant response of "object not found".
But if you type:

	List c:ed#?

The OS thrashes throught the entire directory before giving you the list.

What's happening here?  It's like the OS is saying "Well, I'm not sure
what EXACTLY is in the c: directory, but I KNOW there isn't a program called
"Edd." It seems to my feeble mind that for it to know what's NOT in the
directory with out a long search, it has to already know what IS in the 
directory.  What's happening here? 

-- 
Fred Hamilton                  Any views, comments, or ideas expressed here
Harris Semiconductor           are entirely my own.  Even good ones.
Santa Clara, CA

lphillips@lpami.wimsey.bc.ca (Larry Phillips) (11/26/89)

In <291@intersil.uucp>, hamilton@intersil.uucp (Fred Hamilton) writes:
>I just remembered a question I've had since WorkBench 1.0.  When you type
>a command like:
>
>	List c:edd     ;(A typo, for instance)
>
>You get an (almost) instant response of "object not found".
>But if you type:
>
>	List c:ed#?
>
>The OS thrashes throught the entire directory before giving you the list.
>
>What's happening here?  It's like the OS is saying "Well, I'm not sure
>what EXACTLY is in the c: directory, but I KNOW there isn't a program called
>"Edd." It seems to my feeble mind that for it to know what's NOT in the
>directory with out a long search, it has to already know what IS in the 
>directory.  What's happening here? 

Yes, this is exactly what happens.  The big advantage of the Amiga filing
system is that a fully known filename is found VERY quickly. The reason for
this is that filenames are hashed into tables in the directory blocks, and can
be found within a (relatively) few seeks, simply by looking in the appropriate
header and offset within that header to find the pointer to the file. Hash
'collisions' are handled by chaining from one directory entry to the next, but
even then, there are relatively few reads to do.

When even one character is unknown, there can be no hashing of the name, and
the only way the file system can find a match is by following ALl the table
entries in the directory until it finds the match, and then continuing on
because it has no way of knowing if that's the only match.

To find out there is not a file named 'edd', when you specify it fuly, might
take at most 10 or 15 seek/reads on a fairly large directory, while specifying
it as 'ed?' will take the same number of seek/reads as there are files and
subdirectories in C:.

-larry

--
" All I ask of my body is that it carry around my head."
         - Thomas Alva Edison -
+-----------------------------------------------------------------------+ 
|   //   Larry Phillips                                                 |
| \X/    lphillips@lpami.wimsey.bc.ca -or- uunet!van-bc!lpami!lphillips |
|        COMPUSERVE: 76703,4322  -or-  76703.4322@compuserve.com        |
+-----------------------------------------------------------------------+

peter@cbmvax.UUCP (Peter Cherna) (11/27/89)

In article <291@intersil.uucp> hamilton@intersil.uucp (Fred Hamilton) writes:
>.....................................................C<................
>
>I just remembered a question I've had since WorkBench 1.0.  When you type
>a command like:
>
>	List c:edd     ;(A typo, for instance)
>
>You get an (almost) instant response of "object not found".
>But if you type:
>
>	List c:ed#?
>
>The OS thrashes throught the entire directory before giving you the list.
>
>What's happening here?

If you ask for a file by name, the OS looks for it by name, and through
the magic of hash tables very quickly finds out where that file is, or
alternately that that file isn't.  If you ask for a file by pattern,
the OS has no better way than to check each possible file for a match.

Imagine if I asked you to tell me if there was anybody in the phone book
with the last name of "Cherna".  How long would it take you?  Now I'll
ask you how many people in the phone book have last names which end in
"erna".  Let me know when you're done.

>-- 
>Fred Hamilton                  Any views, comments, or ideas expressed here
>Harris Semiconductor           are entirely my own.  Even good ones.
>Santa Clara, CA

--
     Peter Cherna, Software Engineer, Commodore-Amiga, Inc.
     {uunet|rutgers}!cbmvax!peter    peter@cbmvax.cbm.commodore.com
My opinions do not necessarily represent the opinions of my employer.

"A friend of mine is into Voodoo Acupuncture.  You don't have to go.  You'll
just be walking down the street and ..... oooohhh, that's much better..."
     - Steven Wright

billsey@agora.UUCP (Bill Seymour) (11/28/89)

From article <291@intersil.uucp:, by hamilton@intersil.uucp (Fred Hamilton):
: I just remembered a question I've had since WorkBench 1.0.  When you type
: a command like:
: 
: 	List c:edd     ;(A typo, for instance)
: 
: You get an (almost) instant response of "object not found".
: But if you type:
: 
: 	List c:ed#?
: 
: The OS thrashes throught the entire directory before giving you the list.
: 
: What's happening here?

	When you have the system look for a specific file (no wildcards) it
will use the hash value for that file in order to find the header block. If
there is no header at that hash, it kicks back very quickly. If you use a
wildcard, it has to search through each and every file header looking for
names that match the pattern.

: -- 
: Fred Hamilton                  Any views, comments, or ideas expressed here
: Harris Semiconductor           are entirely my own.  Even good ones.
: Santa Clara, CA
-- 
     -Bill Seymour             ...tektronix!reed!percival!agora!billsey
                               ...tektronix!sequent.UUCP!calvin!billsey
Bejed, Inc.       NES, Inc.        Northwest Amiga Group    At Home Sometimes
(503) 691-2552    (503) 246-9311   (503) 656-7393 BBS       (503) 640-0842

filbo@gorn.santa-cruz.ca.us (Bela Lubkin) (11/28/89)

[note that I have directed followups to the .tech group]

In article <832@lpami.wimsey.bc.ca> Larry Phillips writes:
>The big advantage of the Amiga filing
     ^^^^^^^^^^^^^
>system is that a fully known filename is found VERY quickly. The reason for
>this is that filenames are hashed into tables in the directory blocks, and can
>be found within a (relatively) few seeks, simply by looking in the appropriate
>header and offset within that header to find the pointer to the file. Hash
>'collisions' are handled by chaining from one directory entry to the next, but
>even then, there are relatively few reads to do.

Oh, come ois and always has been bogus.  Even MS-DOS
does far better than AmigaDOS in this area.  In the case of a known filename,
MS-DOS reads the directory (which is effectively a single small binary file),
searches it for the named file, and is done.  Since each MS-DOS directory
entry is 32 bytes, and a sector is 512 bytes, 16 entries fit in a sector; it
must read (n/16) sectors to find a file, where n=the position of the desired
file within the directory.  In general that means it has to read 1-2 sectors
for MOST files.  It would have to read 5 sectors to find ANY file in my Amiga
C: directory (70 files); 2 for my System: directory (31 files).  Those sectors
would almost always be physically contiguous.  Compare this to AmigaDOS, which
must read an arbitrary number of sectors (often 1, but sometimes more), which
are virtually guaranteed to be all over the disk.  Then compare wildcard
operations.  MS-DOS must still read the entire directory, which is still only
a few (probably contiguous) sectors.  Amiread one sector for
each file in the directory, probably scattered all over the disk.  I don't
like this tradeoff in the least.

This is a case of overblown data structures.  I'd take a monolithic directory
any day.  In AmigaDOS' case, this would probably involve keeping the current
directory blocks, but adding a true index for each directory -- a file that
has nothing in it but filenames and the block numbers of their directory
blocks.  This could be a file of 34-byte records (30 characters + a longword
block number).  It would probably be best to put 15 of these in a sector and
use the extra two bytes for something else, rather than starting a 16th entry
there and carrying it over to the 2nd directory entry, to make it easier to
reconstruct a damaged file system.

The other argument I've heard for AmigaDOS' directory structure is "unlimited
directory size", which is also bogus.  Nothing limits any directory under
MS-DOS, >except< the root directory.  All others are just files which are
treo the root directory is limited?  Well, that's the
choice MicroSoft made with MS-DOS.  Needn't be that way.  Just define a
"super-root" directory which is not directly visible to the user and which
has only one entry -- a pointer to the actual root directory, which is a
normal directory file and can grow to whatever size is needed.

[I speak of MS-DOS for two reasons: I know it well; and it uses SIMPLE
solutions which happen to work pretty well.  Perhaps in 6 months I'd be
arguing for some form of UNIX file system or something, but I don't know
it well enough yet.  Actually, what I suggest sounds suspiciously like
inodes, in a twisted kind of way.]

Bela Lubkin    * *    //  filbo@gorn.santa-cruz.ca.us  CI$: 73047,1112 (slow)
     @       * *     //  belal@sco.com  ..ucbvax!ucscc!{gorn!filbo,sco!belal}
R Pentomino    *   \X/  Filbo @ Pyrzqxgl +408-476-4633 and XBBS +408-476-4945

a186@mindlink.UUCP (Harvey Taylor) (11/29/89)

In Msg-ID: <89.filbo@gorn.santa-cruz.ca.us>, in c.s.a, Bela Lubkin writes:
In Msg-ID: <87.filbo@gorn.santa-cruz.ca.us>, in c.s.a, Bela Lubkin writes:
In Msg-ID: <88.filbo@gorn.santa-cruz.ca.us>, in c.s.a, Bela Lubkin writes:
In Msg-ID: <89.filbo@gorn.santa-cruz.ca.us>, in c.s.a.t, Bela Lubkin writes:
In Msg-ID: <87.filbo@gorn.santa-cruz.ca.us>, in c.s.a.t, Bela Lubkin writes:
In Msg-ID: <88.filbo@gorn.santa-cruz.ca.us>, in c.s.a.t, Bela Lubkin writes:
Subject : Re: AmigaDos directory knowledge
Posted: 28 Nov 89 10:10:25 GMT
      [about MSDOS & AmigaDOS FileSystems]
 Ummm, Bela, did you mean to post _6_ copies of that message?
 If not, is there something funny going on at your site?

 <-Harvey

 "Success is the one unpardonable sin against one's fellows." -Bierce
      Harvey Taylor      Meta Media Productions
       uunet!van-bc!rsoft!mindlink!Harvey_Taylor
               a186@mindlink.UUCP

bleys@tronsbox.UUCP (bleys) (11/29/89)

In the first case, it's looking for 'e', followed by 'd', followed by 'd'. 
In the second case, it's looking for 'e', followed by 'd', followed by 'a',
then 'e', followed by 'd', followed by 'b', and so on, and so on...

ccplumb@rose.waterloo.edu (Colin Plumb) (11/30/89)

In article <88.filbo@gorn.santa-cruz.ca.us> filbo@gorn.santa-cruz.ca.us (Bela Lubkin) writes:
>[note that I have directed followups to the .tech group]

Yes, but they know all this stuff already...

>In article <832@lpami.wimsey.bc.ca> Larry Phillips writes:
>>The big advantage of the Amiga filing
>     ^^^^^^^^^^^^^
>>system is that a fully known filename is found VERY quickly. The reason for
>>this is that filenames are hashed into tables in the directory blocks, and can
>>be found within a (relatively) few seeks, simply by looking in the appropriate
>>header and offset within that header to find the pointer to the file. Hash
>>'collisions' are handled by chaining from one directory entry to the next, but
>>even then, there are relatively few reads to do.

>Oh, come ois and always has been bogus.  Even MS-DOS
>does far better than AmigaDOS in this area.  In the case of a known filename,
>MS-DOS reads the directory (which is effectively a single small binary file),
>searches it for the named file, and is done.  Since each MS-DOS directory
>entry is 32 bytes, and a sector is 512 bytes, 16 entries fit in a sector; it
>must read (n/16) sectors to find a file, where n=the position of the desired
>file within the directory.  In general that means it has to read 1-2 sectors
>for MOST files.  It would have to read 5 sectors to find ANY file in my Amiga
>C: directory (70 files); 2 for my System: directory (31 files).  Those sectors
>would almost always be physically contiguous.  Compare this to AmigaDOS, which
>must read an arbitrary number of sectors (often 1, but sometimes more), which
>are virtually guaranteed to be all over the disk.

AmigaDOS has to read an average of n/72+1 sectors, which actually beats n/16,
but the problem is that they're scattered.  MS-DOS can get scattered, too, but
because it never shrinks directories, the preallocation reduces the effect.

FFS stores hash chains in sorted order, significantly reducing the scattering.

> Then compare wildcard
>operations.  MS-DOS must still read the entire directory, which is still only
>a few (probably contiguous) sectors.  Amiread one sector for
>each file in the directory, probably scattered all over the disk.  I don't
>like this tradeoff in the least.

Even reading all the sectors isn't a problem, the problem is the OFS's bad
habit of reading the sectors in an arbitrary order with no thought
*whatsoever* to efficiency.

The FFS searches them in optimal order, producing order-of-magnitude speedups.

>This is a case of overblown data structures.  I'd take a monolithic directory
>any day.  In AmigaDOS' case, this would probably involve keeping the current
>directory blocks, but adding a true index for each directory -- a file that
>has nothing in it but filenames and the block numbers of their directory
>blocks.  This could be a file of 34-byte records (30 characters + a longword
>block number).  It would probably be best to put 15 of these in a sector and
>use the extra two bytes for something else, rather than starting a 16th entry
>there and carrying it over to the 2nd directory entry, to make it easier to
>reconstruct a damaged file system.

Everyone keeps suggesting this, but there is a problem: there is no way
to ask the file system for this information.  There are any number of
problems with the ExNext call that I won't go into, except to mention
that I'd like to introduce the fellow who thought it up to the Spanish
Inquisition, in its heyday, but one is that there is no way to ask for
less information than that provided by "list," namely file name, size,
protection bits, comment, type, and DiskKey.  The ExNext call returns
all this, and all the software that takes directories uses ExNext, so
to have a better file system produce speedups with existing software,
you have to be able to find all of this information quickly.

>The other argument I've heard for AmigaDOS' directory structure is "unlimited
>directory size", which is also bogus.  Nothing limits any directory under
>MS-DOS, >except< the root directory.

And there's no good reason for that limitation.  No, AmigaDOS doesn't have
any monopoly on large diretories.  But searching long paths, it does very
well on.  The n/72 beats MS-DOS's n/16 quite handily.  when you consider the
number of times you have to find a specific entry in a directory to find
"foo" in path x:a/b/c;y:bar/baz/quux;z:gurble/glorble/frobnitz/greeble/thlorp,
MS-DOS jumps all over the disk, too.  (If you use assign instead of putting
long strings into the path, AmigaDOS wins even bigger.  But that's an orthogonal
issue that could, in theory, be implemented on top of MS-DOS's file system.)

>[I speak of MS-DOS for two reasons: I know it well; and it uses SIMPLE
>solutions which happen to work pretty well.  Perhaps in 6 months I'd be
>arguing for some form of UNIX file system or something, but I don't know
>it well enough yet.  Actually, what I suggest sounds suspiciously like
>inodes, in a twisted kind of way.]

Yes, the problem is when you do an ls -l.  Or better yet, open each file
in a directory.  There are all kinds of access patterns that require thought.
It is true that getting a list of all the names is a very common one that
should have more thought put into it, though.
-- 
	-Colin

filbo@gorn.santa-cruz.ca.us (Bela Lubkin) (12/02/89)

In article <750@mindlink.UUCP> Harvey Taylor writes:
>In Msg-ID: <89.filbo@gorn.santa-cruz.ca.us>, in c.s.a, Bela Lubkin writes:
>In Msg-ID: <87.filbo@gorn.santa-cruz.ca.us>, in c.s.a, Bela Lubkin writes:
>In Msg-ID: <88.filbo@gorn.santa-cruz.ca.us>, in c.s.a, Bela Lubkin writes:
>In Msg-ID: <89.filbo@gorn.santa-cruz.ca.us>, in c.s.a.t, Bela Lubkin writes:
>In Msg-ID: <87.filbo@gorn.santa-cruz.ca.us>, in c.s.a.t, Bela Lubkin writes:
>In Msg-ID: <88.filbo@gorn.santa-cruz.ca.us>, in c.s.a.t, Bela Lubkin writes:
> Ummm, Bela, did you mean to post _6_ copies of that message?
> If not, is there something funny going on at your site?

Argh!!!!!!!!!!

Yes, there was something funny going on at my site.  I was losing characters
in messages -- the same ones each time, in fact.  I canceled each of the
messages, but the local software decided to send them out anyway.  I've
solved the character loss problem.  No idea on the cancellation problem.
Hopefully I won't need to cancel articles in the future.  :-(

Apologies to everyone who reads these groups for the multiple postings, lost
characters, and for this message which is 98% irrelevant.

Bela Lubkin    * *    //  filbo@gorn.santa-cruz.ca.us  CI$: 73047,1112 (slow)
     @       * *     //  belal@sco.com  ..ucbvax!ucscc!{gorn!filbo,sco!belal}
R Pentomino    *   \X/  Filbo @ Pyrzqxgl +408-476-4633 and XBBS +408-476-4945

andrewt%watsnew.waterloo.edu@cunyvm.cuny.edu (12/06/89)

> > Then compare wildcard operations.  MS-DOS must still read the entire
 directory, which is
> > still only a few (probably contiguous) sectors.
>
> Here MS-DOS would read 9 files for my C: directory, AmigaDOS would read
> 133.  That's a definite loss to AmigaDOS, even though it took much less time
> to load the directory program in the first place (I'm talking real life
> here; I refuse to use the built-in Dir command under MS-DOS, so I use a
> UNIX-alike LS which must be loaded from disk, same as the Amiga command).
>

It seems to me that the single most common interactive file system
operation is to find a directory listing.  In second place is the use
of wildcards in file names, and running a very close third is the use
of filename completion for any shell that has it (you may disagree
with my rankings).  Under the above example, you lose bigtime with
Amiga's file system, as all of these examples are equivalent to
wildcarding.  I can't complain about finding my commands quickly, but
any shell that hashes its path will speed that up after the first time
anyway.  Why is AmigaDOS castrating its interactive use in favour of
fast command lookup, when the fast command lookup will only effect the
initial use of a command?

Is it possible to have both?  Could the file system store a secondary
file containing only a set of file names?  Even an ascii file would be
fine.  It would require more work for file creation, deletion, and
moving, but it would massively speed up wildcard activities.  It could
be there as a secondary file to the filesystem so losing it would not
be important.  A simple chkdsk-like program could be used to recreate
this file if it became corrupt by finding each file the hard way.

The only other thing you have to do is keep this file out of reach of
programmers.

The disk usage is not really a problem.  With 15 files to a sector, a
disk with 500 files on it will only need 8K to store this information.
I would happily make this trade for the speed it would provide.

Is there something basically wrong with this scheme?  It really sounds
like a horrible kluge in some ways, but it's an issue that needs to be
addressed.

--

Andrew Thomas
andrewt@watsnew.waterloo.edu    Systems Design Eng.     University of Waterloo
"If a million people do a stupid thing, it's still a stupid thing." - Opus

ccplumb%rose.waterloo.edu@cunyvm.cuny.edu (12/07/89)

In article <88.filbo@gorn.santa-cruz.ca.us> filbo@gorn.santa-cruz.ca.us (Bela
 Lubkin) writes:
>[note that I have directed followups to the .tech group]

Yes, but they know all this stuff already...

>In article <832@lpami.wimsey.bc.ca> Larry Phillips writes:
>>The big advantage of the Amiga filing
>     ^^^^^^^^^^^^^
>>system is that a fully known filename is found VERY quickly. The reason for
>>this is that filenames are hashed into tables in the directory blocks, and can

>>be found within a (relatively) few seeks, simply by looking in the appropriate

>>header and offset within that header to find the pointer to the file. Hash
>>'collisions' are handled by chaining from one directory entry to the next, but

>>even then, there are relatively few reads to do.

>Oh, come ois and always has been bogus.  Even MS-DOS
>does far better than AmigaDOS in this area.  In the case of a known filename,
>MS-DOS reads the directory (which is effectively a single small binary file),
>searches it for the named file, and is done.  Since each MS-DOS directory
>entry is 32 bytes, and a sector is 512 bytes, 16 entries fit in a sector; it
>must read (n/16) sectors to find a file, where n=the position of the desired
>file within the directory.  In general that means it has to read 1-2 sectors
>for MOST files.  It would have to read 5 sectors to find ANY file in my Amiga
>C: directory (70 files); 2 for my System: directory (31 files).  Those sectors
>would almost always be physically contiguous.  Compare this to AmigaDOS, which
>must read an arbitrary number of sectors (often 1, but sometimes more), which
>are virtually guaranteed to be all over the disk.

AmigaDOS has to read an average of n/72+1 sectors, which actually beats n/16,
but the problem is that they're scattered.  MS-DOS can get scattered, too, but
because it never shrinks directories, the preallocation reduces the effect.

FFS stores hash chains in sorted order, significantly reducing the scattering.

> Then compare wildcard
>operations.  MS-DOS must still read the entire directory, which is still only
>a few (probably contiguous) sectors.  Amiread one sector for
>each file in the directory, probably scattered all over the disk.  I don't
>like this tradeoff in the least.

Even reading all the sectors isn't a problem, the problem is the OFS's bad
habit of reading the sectors in an arbitrary order with no thought
*whatsoever* to efficiency.

The FFS searches them in optimal order, producing order-of-magnitude speedups.

>This is a case of overblown data structures.  I'd take a monolithic directory
>any day.  In AmigaDOS' case, this would probably involve keeping the current
>directory blocks, but adding a true index for each directory -- a file that
>has nothing in it but filenames and the block numbers of their directory
>blocks.  This could be a file of 34-byte records (30 characters + a longword
>block number).  It would probably be best to put 15 of these in a sector and
>use the extra two bytes for something else, rather than starting a 16th entry
>there and carrying it over to the 2nd directory entry, to make it easier to
>reconstruct a damaged file system.

Everyone keeps suggesting this, but there is a problem: there is no way
to ask the file system for this information.  There are any number of
problems with the ExNext call that I won't go into, except to mention
that I'd like to introduce the fellow who thought it up to the Spanish
Inquisition, in its heyday, but one is that there is no way to ask for
less information than that provided by "list," namely file name, size,
protection bits, comment, type, and DiskKey.  The ExNext call returns
all this, and all the software that takes directories uses ExNext, so
to have a better file system produce speedups with existing software,
you have to be able to find all of this information quickly.

>The other argument I've heard for AmigaDOS' directory structure is "unlimited
>directory size", which is also bogus.  Nothing limits any directory under
>MS-DOS, >except< the root directory.

And there's no good reason for that limitation.  No, AmigaDOS doesn't have
any monopoly on large diretories.  But searching long paths, it does very
well on.  The n/72 beats MS-DOS's n/16 quite handily.  when you consider the
number of times you have to find a specific entry in a directory to find
"foo" in path x:a/b/c;y:bar/baz/quux;z:gurble/glorble/frobnitz/greeble/thlorp,
MS-DOS jumps all over the disk, too.  (If you use assign instead of putting
long strings into the path, AmigaDOS wins even bigger.  But that's an orthogonal

issue that could, in theory, be implemented on top of MS-DOS's file system.)

>[I speak of MS-DOS for two reasons: I know it well; and it uses SIMPLE
>solutions which happen to work pretty well.  Perhaps in 6 months I'd be
>arguing for some form of UNIX file system or something, but I don't know
>it well enough yet.  Actually, what I suggest sounds suspiciously like
>inodes, in a twisted kind of way.]

Yes, the problem is when you do an ls -l.  Or better yet, open each file
in a directory.  There are all kinds of access patterns that require thought.
It is true that getting a list of all the names is a very common one that
should have more thought put into it, though.
--
        -Colin