[comp.sys.amiga.tech] AmigaDos directory knowledge

filbo@gorn.santa-cruz.ca.us (Bela Lubkin) (11/28/89)

[note that I have directed followups to the .tech group]

In article <832@lpami.wimsey.bc.ca> Larry Phillips writes:
>The big advantage of the Amiga filing
     ^^^^^^^^^^^^^
>system is that a fully known filename is found VERY quickly. The reason for
>this is that filenames are hashed into tables in the directory blocks, and can
>be found within a (relatively) few seeks, simply by looking in the appropriate
>header and offset within that header to find the pointer to the file. Hash
>'collisions' are handled by chaining from one directory entry to the next, but
>even then, there are relatively few reads to do.

Oh, come ois and always has been bogus.  Even MS-DOS
does far better than AmigaDOS in this area.  In the case of a known filename,
MS-DOS reads the directory (which is effectively a single small binary file),
searches it for the named file, and is done.  Since each MS-DOS directory
entry is 32 bytes, and a sector is 512 bytes, 16 entries fit in a sector; it
must read (n/16) sectors to find a file, where n=the position of the desired
file within the directory.  In general that means it has to read 1-2 sectors
for MOST files.  It would have to read 5 sectors to find ANY file in my Amiga
C: directory (70 files); 2 for my System: directory (31 files).  Those sectors
would almost always be physically contiguous.  Compare this to AmigaDOS, which
must read an arbitrary number of sectors (often 1, but sometimes more), which
are virtually guaranteed to be all over the disk.  Then compare wildcard
operations.  MS-DOS must still read the entire directory, which is still only
a few (probably contiguous) sectors.  Amiread one sector for
each file in the directory, probably scattered all over the disk.  I don't
like this tradeoff in the least.

This is a case of overblown data structures.  I'd take a monolithic directory
any day.  In AmigaDOS' case, this would probably involve keeping the current
directory blocks, but adding a true index for each directory -- a file that
has nothing in it but filenames and the block numbers of their directory
blocks.  This could be a file of 34-byte records (30 characters + a longword
block number).  It would probably be best to put 15 of these in a sector and
use the extra two bytes for something else, rather than starting a 16th entry
there and carrying it over to the 2nd directory entry, to make it easier to
reconstruct a damaged file system.

The other argument I've heard for AmigaDOS' directory structure is "unlimited
directory size", which is also bogus.  Nothing limits any directory under
MS-DOS, >except< the root directory.  All others are just files which are
treo the root directory is limited?  Well, that's the
choice MicroSoft made with MS-DOS.  Needn't be that way.  Just define a
"super-root" directory which is not directly visible to the user and which
has only one entry -- a pointer to the actual root directory, which is a
normal directory file and can grow to whatever size is needed.

[I speak of MS-DOS for two reasons: I know it well; and it uses SIMPLE
solutions which happen to work pretty well.  Perhaps in 6 months I'd be
arguing for some form of UNIX file system or something, but I don't know
it well enough yet.  Actually, what I suggest sounds suspiciously like
inodes, in a twisted kind of way.]

Bela Lubkin    * *    //  filbo@gorn.santa-cruz.ca.us  CI$: 73047,1112 (slow)
     @       * *     //  belal@sco.com  ..ucbvax!ucscc!{gorn!filbo,sco!belal}
R Pentomino    *   \X/  Filbo @ Pyrzqxgl +408-476-4633 and XBBS +408-476-4945

limonce@pilot.njin.net (Tom Limoncelli) (11/29/89)

On my MS-DOS system, my path has about 10 entries.  On my Amiga
system, my path has about 10 entries.  About the same number of files
are searched, yet the Amiga does less seeks to the disk.  (Diskbuffers
flushed, of course).  Why is this?  Because the Amiga is doing O(N)
seeks with hashing; but the MS-DOS does O(N*M/Z) searches.

(N = number of directories; M= number of files per directory; Z =
number of directory entries per sector)

It balances out and the Amiga wins.  Also, the Amiga's FFS caches
side-sector and directory information better than MS-DOS.  I've been
told that FFS is modeled after the Berkeley Fast File System and has
been benchmarked to be the fastest FS when judged with similar
hardware.

Why do you think MicroSoft is pushing their "new" high-performance
file system for OS/2.  There is a LOT of room for improvement over
MS-DOS.

This has been debated over and over here on comp.sys.amiga.

It comes down to this (IMHO); when searching 1 directory for a couple
files, sure, MS-DOS FS is a win.  But when you are dealing with lots
of files, FFS is the winner.  Remember, a bubble sort can be faster
than a QuickSort if you have a very small number of items.  A bubble
sort re-written in tight-coded assembler is still O(N^2) and is still
a bubble sort.

[ Insert favorite computer scientist vs. engineer joke here :-) ]

-Tom
-- 
 Tom Limoncelli -- limonce@pilot.njin.net -- tlimonce@drunivac.bitnet
 rutgers!njin!tlimonce -- Drew University, Madison, NJ -- 201-408-5389

daveh@cbmvax.UUCP (Dave Haynie) (11/29/89)

in article <88.filbo@gorn.santa-cruz.ca.us>, filbo@gorn.santa-cruz.ca.us (Bela Lubkin) says:
> Xref: cbmvax comp.sys.amiga.tech:9127 comp.sys.amiga:47011
> X-Claimer: I >am< R Pentomino!

> In article <832@lpami.wimsey.bc.ca> Larry Phillips writes:
>>The big advantage of the Amiga filing
>      ^^^^^^^^^^^^^
>>system is that a fully known filename is found VERY quickly. The reason for
>>this is that filenames are hashed into tables in the directory blocks, and can
>>be found within a (relatively) few seeks

> Oh, come ois and always has been bogus.  Even MS-DOS
> does far better than AmigaDOS in this area.  

Huh? Have you actually _used_ an MS-DOS system with paths set up similar to
the way an Amiga normally is? I have about 10x as many commands on my Amiga
path as on my MS-DOS path (AT Bridge Card).  There's a noticable wait for
any path search on the MS-DOS side.  There isn't on the Amiga.  That's what
I consider the bottom line.  But onward... 

> In the case of a known filename, MS-DOS reads the directory (which is effectively 
> a single small binary file),

Actually several binary files, since the directory very often (in fact, I
can think of very few counterexamples in actual use) has far more than 16
files. 

> Since each MS-DOS directory entry is 32 bytes, and a sector is 512 bytes, 16 
> entries fit in a sector; it must read (n/16) sectors to find a file..

So we'd have to read 9 sectors, worst case, to get an arbitrary file in my
C: directory of 132 files.  And perform up to 131 string comparisons.  The
midpoint would be 4 sectors and 65 string comparisons. 

On the Amiga, I'd have to read at least 2 sectors, the directory and the
file block.  I'd have to do one longword comparison to check for a hash
chain collision, and a string compare and possible sector read for each
hash chain entry until I find the file.  For my 132 entry directory,
there's likely to be one hash chain for each entry based on the hashtable
size of 72, so the average file would be found in no more than 3 sector
reads, a string comparison, and 2 longword comparisons.  If there were 4
hash collisions in any hash table entry, that would grow to a worst case of
5 sector reads, 4 string comparisons, and 5 longword comparisons. 

> Those sectors would almost always be physically contiguous.  

On an empty hard disk, sure.  But even then, you can't have it both ways. 
If your file is contiguous with the directory header, and you a bunch of
new stuff to that directory, something's not contiguous anymore.  Either
you have to uproot all the files or you're running all over the place to
collect those directory blocks. 

> Compare this to AmigaDOS, which must read an arbitrary number of sectors (often 1, 
> but sometimes more), are virtually guaranteed to be all over the disk.  

No they're not.  Have you ever actually looked at how the file system (FFS
especially) allocates things on the disk? I have, extensively.  Try running
DiskSalv over your hard disk sometime; things are nicely grouped.  And
also, no matter how badly fragged you get, hash collisions are linked in
sorted order, so you never jump all over the place, as you imply, to find a
named file. 

> Then compare wildcard operations.  MS-DOS must still read the entire directory, which is 
> still only a few (probably contiguous) sectors.  

Here MS-DOS would read 9 files for my C: directory, AmigaDOS would read
133.  That's a definite loss to AmigaDOS, even though it took much less time
to load the directory program in the first place (I'm talking real life
here; I refuse to use the built-in Dir command under MS-DOS, so I use a 
UNIX-alike LS which must be loaded from disk, same as the Amiga command).

> Amiread bone sector for each file in the directory, probably scattered all 
> over the disk.  I don't like this tradeoff in the least.

Again, the Amiga files aren't scattered all over the disk.  Do you really
think they just allocate things at random.  Sure, the disk would support
blocks scattered all over, just as any filesystem that doesn't choke when
nearing full would have to do.  But even on my 70% full 80 meg disk, I don't 
see any problems like those you seem to believe in.

The other thing you get from the Amiga FFS (even better under SFS) is ease
of reconstruction.  You can completely rebuild the entire disk structure
even if every directory in the system were destroyed.  There's no way to
lose more than one file with a single damaged block, and actually you'd
only lose part of that file in many cases (if you destroy the file block
header).  Under SFS, there's no way to lose any more than 1 block in a
file with a randomly damaged sector.  You've already told me I can put a
serious hurtin' on 16 files with one random sector blasted away.  What
happens if I destory the root directory?

> The other argument I've heard for AmigaDOS' directory structure is "unlimited
> directory size", which is also bogus.  Nothing limits any directory under
> MS-DOS, >except< the root directory.  

Nothing limits the size of an FFS directory execpt the hard disk.  I think
we win here.  Microsoft made lots of bad decisions for MS-DOS, but the worst
was that they didn't have the forsight back then to build a system that let
these things be fixed later very compatibly.  You can yell all day about What
Could Have Been, but most folks are far more concerned with What Actually Is
and What Yet Will Be.  Though they say that the 35 meg, or whatever, limit on 
hard disks is going Real Soon Now.

Good thing about AmigaDOS is that you can pick the filesystem handler you
want.  Can't do that under MS-DOS.  So you can use the CrossDOS filesystem
for your hard disk instead of FFS.  It won't be as fast, but you'll
probably be happy with it...

> Bela Lubkin    * *    //  filbo@gorn.santa-cruz.ca.us  CI$: 73047,1112 (slow)
-- 
Dave Haynie Commodore-Amiga (Systems Engineering) "The Crew That Never Rests"
   {uunet|pyramid|rutgers}!cbmvax!daveh      PLINK: hazy     BIX: hazy
                    Too much of everything is just enough

cmcmanis%pepper@Sun.COM (Chuck McManis) (11/29/89)

In article <87.filbo@gorn.santa-cruz.ca.us> (Bela Lubkin) writes:
>Oh, come ois and always has been bogus.  Even MS-DOS
>does far better than AmigaDOS in this area.  In the case of a known filename,
>MS-DOS reads the directory (which is effectively a single small binary file),
>searches it for the named file, and is done.

Do your homework Bela, MS-DOS does a linear search of the directory structure
and several searches when it is looking through nested directories. Linear
structures work fine for small things (floppies come to mind) but don't 
scale well at all to big things like hard disks. Fill up a 20Meg hard disk
with files in directories and subdirectories and benchmark MS-DOS against
AmigaDOS when finding random files in the name space. Then fill up a 30Meg
disk, if you have DOS 3.3 or higher try it on a 40Meg disk and then an 80
meg disk. You will find that MS-DOS is getting slower and slower whereas
the time for AmigaDOS is constant. Arguing the superiority of MS-DOS using
a floppy is like arguing the superiority of AmigaDOS using a 300Meg hard 
drive. The boundary conditions are uninteresting. If you have one file system
for all disk devices in the system, what do you optimize for? I would argue
that the world is backwards, since you can hardly find an MS-DOS system that
doesn't have a hard drive, and yet lots of Amigas don't. The MS-DOS filesystem
was born of the CP/M file system and it was built for floppies. Great, the 
AmigaDOS file system was born of TriPOS which was built for Winchester drives.
The heritage shows through. Fortunately, the Amiga's file systems are loadable
so you can use one file system for floppies and one for harddisks. Now isn't
that special?

Of course the Amiga file system isn't as well suited to wildcarding as a 
linear search is. Of course if you benchmark which is used more, Open() or
ExNext() you might be suprised to find there is a bigger win optimizing 
Open() than there is ExNext(). My opinion of course,

--Chuck McManis
uucp: {anywhere}!sun!cmcmanis   BIX: cmcmanis  ARPAnet: cmcmanis@Eng.Sun.COM
These opinions are my own and no one elses, but you knew that didn't you.
"If it didn't have bones in it, it wouldn't be crunchy now would it?!"

andrewt@watsnew.waterloo.edu (Andrew Thomas) (11/30/89)

> > Then compare wildcard operations.  MS-DOS must still read the entire directory, which is 
> > still only a few (probably contiguous) sectors.  
> 
> Here MS-DOS would read 9 files for my C: directory, AmigaDOS would read
> 133.  That's a definite loss to AmigaDOS, even though it took much less time
> to load the directory program in the first place (I'm talking real life
> here; I refuse to use the built-in Dir command under MS-DOS, so I use a 
> UNIX-alike LS which must be loaded from disk, same as the Amiga command).
> 

It seems to me that the single most common interactive file system
operation is to find a directory listing.  In second place is the use
of wildcards in file names, and running a very close third is the use
of filename completion for any shell that has it (you may disagree
with my rankings).  Under the above example, you lose bigtime with
Amiga's file system, as all of these examples are equivalent to
wildcarding.  I can't complain about finding my commands quickly, but
any shell that hashes its path will speed that up after the first time
anyway.  Why is AmigaDOS castrating its interactive use in favour of
fast command lookup, when the fast command lookup will only effect the
initial use of a command?

Is it possible to have both?  Could the file system store a secondary
file containing only a set of file names?  Even an ascii file would be
fine.  It would require more work for file creation, deletion, and
moving, but it would massively speed up wildcard activities.  It could
be there as a secondary file to the filesystem so losing it would not
be important.  A simple chkdsk-like program could be used to recreate
this file if it became corrupt by finding each file the hard way.

The only other thing you have to do is keep this file out of reach of
programmers.

The disk usage is not really a problem.  With 15 files to a sector, a
disk with 500 files on it will only need 8K to store this information.
I would happily make this trade for the speed it would provide.

Is there something basically wrong with this scheme?  It really sounds
like a horrible kluge in some ways, but it's an issue that needs to be
addressed.

--

Andrew Thomas
andrewt@watsnew.waterloo.edu	Systems Design Eng.	University of Waterloo
"If a million people do a stupid thing, it's still a stupid thing." - Opus

barrett@jhunix.HCF.JHU.EDU (Dan Barrett) (12/01/89)

	I don't really understand all the concern about the speed of
directory listings and wildcards, at least for hard drives. I have a slow
(65ms access time) hard drive, with diskperf results like 140Kb/sec, and I
find directory listings to be of acceptable speed.  A 100-file directory
listing is quite quick.  (I use "ls" version 3.1.)
	And finding a particular program in my 8-directory, ~400-file
search path is instantaneous.

                                                        Dan

 //////////////////////////////////////\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
| Dan Barrett     -      Systems Administrator, Computer Science Department |
| The Johns Hopkins University, 34th and Charles Sts., Baltimore, MD  21218 |
| INTERNET:   barrett@cs.jhu.edu           | UUCP:   barrett@jhunix.UUCP    |
| COMPUSERVE: >internet:barrett@cs.jhu.edu | BITNET: barrett@jhuvms.bitnet  |
 \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\/////////////////////////////////////

usenet@cps3xx.UUCP (Usenet file owner) (12/01/89)

In article <3459@jhunix.HCF.JHU.EDU> barrett@jhunix.UUCP (Dan Barrett) writes:
>
>	I don't really understand all the concern about the speed of
>directory listings and wildcards, at least for hard drives. I have a slow

Try doing a "dir df0:c" with your workbench disk in the
floppy drive....

Now do you understand???
 Joe Porkka   porkka@frith.egr.msu.edu

morgan@camtwh.UUCP (Morgan W. Jones) (12/02/89)

In article <ANDREWT.89Nov29154933@watsnew.waterloo.edu> andrewt@watsnew.waterloo.edu (Andrew Thomas) writes:
>with my rankings).  Under the above example, you lose bigtime with
>Amiga's file system, as all of these examples are equivalent to
>wildcarding.  I can't complain about finding my commands quickly, but
>any shell that hashes its path will speed that up after the first time
>anyway.  Why is AmigaDOS castrating its interactive use in favour of
>fast command lookup, when the fast command lookup will only effect the
>initial use of a command?

You see a saving on command lookup for each and every invocation,
and caching has nothing to do with this.

The purpose of caching is so that you don't have to scan the whole
filesystem (or at least those directories listed in your path) every
time that you execute your command.  Once the command is found and
executed, the shell remembers the path to the command but the
operating system still has to search all of the directories in the
path to find the inode (or file handle on the amiga) for the file.

Where the amiga gains is in the directory search.  MessyDos and Unix
have to do a linear search on the directory (which for large
directories may involve several disk accesses) whereas the amiga does
a hashed lookup (which is usually one or two disk accesses).  If the
path to the file is several levels deep, the number of disk accesses
would be significantly reduced.

All this aside, though, I do agree that the minor (and on a HD
insignificant) increase in direct access time does not justify the
horrifically slow directory listings.
-- 
Morgan W. Jones					(morgan@camtwh)
Orama Incorporated, Toronto, Canada.		(416) 369-5088

new@udel.edu (Darren New) (12/02/89)

In article <5615@cps3xx.UUCP> porkka@frith.UUCP (Joe Porkka) writes:
>In article <3459@jhunix.HCF.JHU.EDU> barrett@jhunix.UUCP (Dan Barrett) writes:
>>	I don't really understand all the concern about the speed of
>>directory listings and wildcards, at least for hard drives. I have a slow
>Try doing a "dir df0:c" with your workbench disk in the
>floppy drive....
>Now do you understand???
> Joe Porkka   porkka@frith.egr.msu.edu

Something else I don't understand: 
The disk should be able to read about 5 tracks/second if you look only
at the revolutions.  Yet when doing something like DiskCopy, the
disk takes more than one second/track to read or write.  What's the
slowdown?  Is the blitter and system overhead really four times the read speed?
	       -- Darren

ccplumb@rose.waterloo.edu (Colin Plumb) (12/04/89)

In article <5141@nigel.udel.EDU> new@udel.edu (Darren New) writes:
> Something else I don't understand: 
> The disk should be able to read about 5 tracks/second if you look only
> at the revolutions.  Yet when doing something like DiskCopy, the disk
> takes more than one second/track to read or write.  What's the slowdown?
> Is the blitter and system overhead really four times the read speed?

First of all, you have to read about 1.1 revolutions to be sure to get
a complete copy of each sector.  Then remember that you need to read both
top and bottom surfaces, and write the worst-case (slowest rotation speed)
track, which is a bit under 1.1 nominal tracks.

That adds up to 4.4 rotations to copy a cylinder, of which a 90mm floppy
has 80.  70.4 seconds.  The fastest copy program I know takes 69 seconds,
so it's pretty close.  It, I believe, reads one track while processing
the track before so it's always reading or writing.  This hides essentially
all the blitter overhead.  But Diskcopy isn't quite as smart.
-- 
	-Colin

rar@auc.UUCP (Rodney Ricks) (12/05/89)

In article <5615@cps3xx.UUCP> porkka@frith.UUCP (Joe Porkka) writes:
>In article <3459@jhunix.HCF.JHU.EDU> barrett@jhunix.UUCP (Dan Barrett) writes:
>>
>>	I don't really understand all the concern about the speed of
>>directory listings and wildcards, at least for hard drives. I have a slow
>
>Try doing a "dir df0:c" with your workbench disk in the
>floppy drive....

Actually, that is not a good test.  There is another reason why the Amiga's
dir command seems so slow.

If my memory serves me correctly, the Amiga's dir command reads in a list of
all of the files in the directory, then sorts them, THEN it prints them out.
The MSDOS dir prints out what it gets to as it gets to it.

So the Amiga's dir command SEEMS much slower because it doesn't start printing
to the screen until it has seen ALL of the files in a directory.

Use the list command instead (I do).  It seems much faster than the dir
command in that, like the MSDOS dir command, it lists the files as it gets
to them, instead of after it has read and sorted them.

> Joe Porkka   porkka@frith.egr.msu.edu

Rodney Ricks
-- 
"We may have come over here in different ships,
 but we're all in the same boat now."   --   Jesse Jackson                   //
                                                                       \\  //
Rodney Ricks,   Morehouse College                                        \/