hamilton@intersil.uucp (Fred Hamilton) (11/26/89)
.....................................................C<................ I just remembered a question I've had since WorkBench 1.0. When you type a command like: List c:edd ;(A typo, for instance) You get an (almost) instant response of "object not found". But if you type: List c:ed#? The OS thrashes throught the entire directory before giving you the list. What's happening here? It's like the OS is saying "Well, I'm not sure what EXACTLY is in the c: directory, but I KNOW there isn't a program called "Edd." It seems to my feeble mind that for it to know what's NOT in the directory with out a long search, it has to already know what IS in the directory. What's happening here? -- Fred Hamilton Any views, comments, or ideas expressed here Harris Semiconductor are entirely my own. Even good ones. Santa Clara, CA
lphillips@lpami.wimsey.bc.ca (Larry Phillips) (11/26/89)
In <291@intersil.uucp>, hamilton@intersil.uucp (Fred Hamilton) writes: >I just remembered a question I've had since WorkBench 1.0. When you type >a command like: > > List c:edd ;(A typo, for instance) > >You get an (almost) instant response of "object not found". >But if you type: > > List c:ed#? > >The OS thrashes throught the entire directory before giving you the list. > >What's happening here? It's like the OS is saying "Well, I'm not sure >what EXACTLY is in the c: directory, but I KNOW there isn't a program called >"Edd." It seems to my feeble mind that for it to know what's NOT in the >directory with out a long search, it has to already know what IS in the >directory. What's happening here? Yes, this is exactly what happens. The big advantage of the Amiga filing system is that a fully known filename is found VERY quickly. The reason for this is that filenames are hashed into tables in the directory blocks, and can be found within a (relatively) few seeks, simply by looking in the appropriate header and offset within that header to find the pointer to the file. Hash 'collisions' are handled by chaining from one directory entry to the next, but even then, there are relatively few reads to do. When even one character is unknown, there can be no hashing of the name, and the only way the file system can find a match is by following ALl the table entries in the directory until it finds the match, and then continuing on because it has no way of knowing if that's the only match. To find out there is not a file named 'edd', when you specify it fuly, might take at most 10 or 15 seek/reads on a fairly large directory, while specifying it as 'ed?' will take the same number of seek/reads as there are files and subdirectories in C:. -larry -- " All I ask of my body is that it carry around my head." - Thomas Alva Edison - +-----------------------------------------------------------------------+ | // Larry Phillips | | \X/ lphillips@lpami.wimsey.bc.ca -or- uunet!van-bc!lpami!lphillips | | COMPUSERVE: 76703,4322 -or- 76703.4322@compuserve.com | +-----------------------------------------------------------------------+
peter@cbmvax.UUCP (Peter Cherna) (11/27/89)
In article <291@intersil.uucp> hamilton@intersil.uucp (Fred Hamilton) writes: >.....................................................C<................ > >I just remembered a question I've had since WorkBench 1.0. When you type >a command like: > > List c:edd ;(A typo, for instance) > >You get an (almost) instant response of "object not found". >But if you type: > > List c:ed#? > >The OS thrashes throught the entire directory before giving you the list. > >What's happening here? If you ask for a file by name, the OS looks for it by name, and through the magic of hash tables very quickly finds out where that file is, or alternately that that file isn't. If you ask for a file by pattern, the OS has no better way than to check each possible file for a match. Imagine if I asked you to tell me if there was anybody in the phone book with the last name of "Cherna". How long would it take you? Now I'll ask you how many people in the phone book have last names which end in "erna". Let me know when you're done. >-- >Fred Hamilton Any views, comments, or ideas expressed here >Harris Semiconductor are entirely my own. Even good ones. >Santa Clara, CA -- Peter Cherna, Software Engineer, Commodore-Amiga, Inc. {uunet|rutgers}!cbmvax!peter peter@cbmvax.cbm.commodore.com My opinions do not necessarily represent the opinions of my employer. "A friend of mine is into Voodoo Acupuncture. You don't have to go. You'll just be walking down the street and ..... oooohhh, that's much better..." - Steven Wright
billsey@agora.UUCP (Bill Seymour) (11/28/89)
From article <291@intersil.uucp:, by hamilton@intersil.uucp (Fred Hamilton): : I just remembered a question I've had since WorkBench 1.0. When you type : a command like: : : List c:edd ;(A typo, for instance) : : You get an (almost) instant response of "object not found". : But if you type: : : List c:ed#? : : The OS thrashes throught the entire directory before giving you the list. : : What's happening here? When you have the system look for a specific file (no wildcards) it will use the hash value for that file in order to find the header block. If there is no header at that hash, it kicks back very quickly. If you use a wildcard, it has to search through each and every file header looking for names that match the pattern. : -- : Fred Hamilton Any views, comments, or ideas expressed here : Harris Semiconductor are entirely my own. Even good ones. : Santa Clara, CA -- -Bill Seymour ...tektronix!reed!percival!agora!billsey ...tektronix!sequent.UUCP!calvin!billsey Bejed, Inc. NES, Inc. Northwest Amiga Group At Home Sometimes (503) 691-2552 (503) 246-9311 (503) 656-7393 BBS (503) 640-0842
filbo@gorn.santa-cruz.ca.us (Bela Lubkin) (11/28/89)
[note that I have directed followups to the .tech group] In article <832@lpami.wimsey.bc.ca> Larry Phillips writes: >The big advantage of the Amiga filing ^^^^^^^^^^^^^ >system is that a fully known filename is found VERY quickly. The reason for >this is that filenames are hashed into tables in the directory blocks, and can >be found within a (relatively) few seeks, simply by looking in the appropriate >header and offset within that header to find the pointer to the file. Hash >'collisions' are handled by chaining from one directory entry to the next, but >even then, there are relatively few reads to do. Oh, come ois and always has been bogus. Even MS-DOS does far better than AmigaDOS in this area. In the case of a known filename, MS-DOS reads the directory (which is effectively a single small binary file), searches it for the named file, and is done. Since each MS-DOS directory entry is 32 bytes, and a sector is 512 bytes, 16 entries fit in a sector; it must read (n/16) sectors to find a file, where n=the position of the desired file within the directory. In general that means it has to read 1-2 sectors for MOST files. It would have to read 5 sectors to find ANY file in my Amiga C: directory (70 files); 2 for my System: directory (31 files). Those sectors would almost always be physically contiguous. Compare this to AmigaDOS, which must read an arbitrary number of sectors (often 1, but sometimes more), which are virtually guaranteed to be all over the disk. Then compare wildcard operations. MS-DOS must still read the entire directory, which is still only a few (probably contiguous) sectors. Amiread one sector for each file in the directory, probably scattered all over the disk. I don't like this tradeoff in the least. This is a case of overblown data structures. I'd take a monolithic directory any day. In AmigaDOS' case, this would probably involve keeping the current directory blocks, but adding a true index for each directory -- a file that has nothing in it but filenames and the block numbers of their directory blocks. This could be a file of 34-byte records (30 characters + a longword block number). It would probably be best to put 15 of these in a sector and use the extra two bytes for something else, rather than starting a 16th entry there and carrying it over to the 2nd directory entry, to make it easier to reconstruct a damaged file system. The other argument I've heard for AmigaDOS' directory structure is "unlimited directory size", which is also bogus. Nothing limits any directory under MS-DOS, >except< the root directory. All others are just files which are treo the root directory is limited? Well, that's the choice MicroSoft made with MS-DOS. Needn't be that way. Just define a "super-root" directory which is not directly visible to the user and which has only one entry -- a pointer to the actual root directory, which is a normal directory file and can grow to whatever size is needed. [I speak of MS-DOS for two reasons: I know it well; and it uses SIMPLE solutions which happen to work pretty well. Perhaps in 6 months I'd be arguing for some form of UNIX file system or something, but I don't know it well enough yet. Actually, what I suggest sounds suspiciously like inodes, in a twisted kind of way.] Bela Lubkin * * // filbo@gorn.santa-cruz.ca.us CI$: 73047,1112 (slow) @ * * // belal@sco.com ..ucbvax!ucscc!{gorn!filbo,sco!belal} R Pentomino * \X/ Filbo @ Pyrzqxgl +408-476-4633 and XBBS +408-476-4945
a186@mindlink.UUCP (Harvey Taylor) (11/29/89)
In Msg-ID: <89.filbo@gorn.santa-cruz.ca.us>, in c.s.a, Bela Lubkin writes: In Msg-ID: <87.filbo@gorn.santa-cruz.ca.us>, in c.s.a, Bela Lubkin writes: In Msg-ID: <88.filbo@gorn.santa-cruz.ca.us>, in c.s.a, Bela Lubkin writes: In Msg-ID: <89.filbo@gorn.santa-cruz.ca.us>, in c.s.a.t, Bela Lubkin writes: In Msg-ID: <87.filbo@gorn.santa-cruz.ca.us>, in c.s.a.t, Bela Lubkin writes: In Msg-ID: <88.filbo@gorn.santa-cruz.ca.us>, in c.s.a.t, Bela Lubkin writes: Subject : Re: AmigaDos directory knowledge Posted: 28 Nov 89 10:10:25 GMT [about MSDOS & AmigaDOS FileSystems] Ummm, Bela, did you mean to post _6_ copies of that message? If not, is there something funny going on at your site? <-Harvey "Success is the one unpardonable sin against one's fellows." -Bierce Harvey Taylor Meta Media Productions uunet!van-bc!rsoft!mindlink!Harvey_Taylor a186@mindlink.UUCP
bleys@tronsbox.UUCP (bleys) (11/29/89)
In the first case, it's looking for 'e', followed by 'd', followed by 'd'. In the second case, it's looking for 'e', followed by 'd', followed by 'a', then 'e', followed by 'd', followed by 'b', and so on, and so on...
ccplumb@rose.waterloo.edu (Colin Plumb) (11/30/89)
In article <88.filbo@gorn.santa-cruz.ca.us> filbo@gorn.santa-cruz.ca.us (Bela Lubkin) writes: >[note that I have directed followups to the .tech group] Yes, but they know all this stuff already... >In article <832@lpami.wimsey.bc.ca> Larry Phillips writes: >>The big advantage of the Amiga filing > ^^^^^^^^^^^^^ >>system is that a fully known filename is found VERY quickly. The reason for >>this is that filenames are hashed into tables in the directory blocks, and can >>be found within a (relatively) few seeks, simply by looking in the appropriate >>header and offset within that header to find the pointer to the file. Hash >>'collisions' are handled by chaining from one directory entry to the next, but >>even then, there are relatively few reads to do. >Oh, come ois and always has been bogus. Even MS-DOS >does far better than AmigaDOS in this area. In the case of a known filename, >MS-DOS reads the directory (which is effectively a single small binary file), >searches it for the named file, and is done. Since each MS-DOS directory >entry is 32 bytes, and a sector is 512 bytes, 16 entries fit in a sector; it >must read (n/16) sectors to find a file, where n=the position of the desired >file within the directory. In general that means it has to read 1-2 sectors >for MOST files. It would have to read 5 sectors to find ANY file in my Amiga >C: directory (70 files); 2 for my System: directory (31 files). Those sectors >would almost always be physically contiguous. Compare this to AmigaDOS, which >must read an arbitrary number of sectors (often 1, but sometimes more), which >are virtually guaranteed to be all over the disk. AmigaDOS has to read an average of n/72+1 sectors, which actually beats n/16, but the problem is that they're scattered. MS-DOS can get scattered, too, but because it never shrinks directories, the preallocation reduces the effect. FFS stores hash chains in sorted order, significantly reducing the scattering. > Then compare wildcard >operations. MS-DOS must still read the entire directory, which is still only >a few (probably contiguous) sectors. Amiread one sector for >each file in the directory, probably scattered all over the disk. I don't >like this tradeoff in the least. Even reading all the sectors isn't a problem, the problem is the OFS's bad habit of reading the sectors in an arbitrary order with no thought *whatsoever* to efficiency. The FFS searches them in optimal order, producing order-of-magnitude speedups. >This is a case of overblown data structures. I'd take a monolithic directory >any day. In AmigaDOS' case, this would probably involve keeping the current >directory blocks, but adding a true index for each directory -- a file that >has nothing in it but filenames and the block numbers of their directory >blocks. This could be a file of 34-byte records (30 characters + a longword >block number). It would probably be best to put 15 of these in a sector and >use the extra two bytes for something else, rather than starting a 16th entry >there and carrying it over to the 2nd directory entry, to make it easier to >reconstruct a damaged file system. Everyone keeps suggesting this, but there is a problem: there is no way to ask the file system for this information. There are any number of problems with the ExNext call that I won't go into, except to mention that I'd like to introduce the fellow who thought it up to the Spanish Inquisition, in its heyday, but one is that there is no way to ask for less information than that provided by "list," namely file name, size, protection bits, comment, type, and DiskKey. The ExNext call returns all this, and all the software that takes directories uses ExNext, so to have a better file system produce speedups with existing software, you have to be able to find all of this information quickly. >The other argument I've heard for AmigaDOS' directory structure is "unlimited >directory size", which is also bogus. Nothing limits any directory under >MS-DOS, >except< the root directory. And there's no good reason for that limitation. No, AmigaDOS doesn't have any monopoly on large diretories. But searching long paths, it does very well on. The n/72 beats MS-DOS's n/16 quite handily. when you consider the number of times you have to find a specific entry in a directory to find "foo" in path x:a/b/c;y:bar/baz/quux;z:gurble/glorble/frobnitz/greeble/thlorp, MS-DOS jumps all over the disk, too. (If you use assign instead of putting long strings into the path, AmigaDOS wins even bigger. But that's an orthogonal issue that could, in theory, be implemented on top of MS-DOS's file system.) >[I speak of MS-DOS for two reasons: I know it well; and it uses SIMPLE >solutions which happen to work pretty well. Perhaps in 6 months I'd be >arguing for some form of UNIX file system or something, but I don't know >it well enough yet. Actually, what I suggest sounds suspiciously like >inodes, in a twisted kind of way.] Yes, the problem is when you do an ls -l. Or better yet, open each file in a directory. There are all kinds of access patterns that require thought. It is true that getting a list of all the names is a very common one that should have more thought put into it, though. -- -Colin
filbo@gorn.santa-cruz.ca.us (Bela Lubkin) (12/02/89)
In article <750@mindlink.UUCP> Harvey Taylor writes: >In Msg-ID: <89.filbo@gorn.santa-cruz.ca.us>, in c.s.a, Bela Lubkin writes: >In Msg-ID: <87.filbo@gorn.santa-cruz.ca.us>, in c.s.a, Bela Lubkin writes: >In Msg-ID: <88.filbo@gorn.santa-cruz.ca.us>, in c.s.a, Bela Lubkin writes: >In Msg-ID: <89.filbo@gorn.santa-cruz.ca.us>, in c.s.a.t, Bela Lubkin writes: >In Msg-ID: <87.filbo@gorn.santa-cruz.ca.us>, in c.s.a.t, Bela Lubkin writes: >In Msg-ID: <88.filbo@gorn.santa-cruz.ca.us>, in c.s.a.t, Bela Lubkin writes: > Ummm, Bela, did you mean to post _6_ copies of that message? > If not, is there something funny going on at your site? Argh!!!!!!!!!! Yes, there was something funny going on at my site. I was losing characters in messages -- the same ones each time, in fact. I canceled each of the messages, but the local software decided to send them out anyway. I've solved the character loss problem. No idea on the cancellation problem. Hopefully I won't need to cancel articles in the future. :-( Apologies to everyone who reads these groups for the multiple postings, lost characters, and for this message which is 98% irrelevant. Bela Lubkin * * // filbo@gorn.santa-cruz.ca.us CI$: 73047,1112 (slow) @ * * // belal@sco.com ..ucbvax!ucscc!{gorn!filbo,sco!belal} R Pentomino * \X/ Filbo @ Pyrzqxgl +408-476-4633 and XBBS +408-476-4945
andrewt%watsnew.waterloo.edu@cunyvm.cuny.edu (12/06/89)
> > Then compare wildcard operations. MS-DOS must still read the entire directory, which is > > still only a few (probably contiguous) sectors. > > Here MS-DOS would read 9 files for my C: directory, AmigaDOS would read > 133. That's a definite loss to AmigaDOS, even though it took much less time > to load the directory program in the first place (I'm talking real life > here; I refuse to use the built-in Dir command under MS-DOS, so I use a > UNIX-alike LS which must be loaded from disk, same as the Amiga command). > It seems to me that the single most common interactive file system operation is to find a directory listing. In second place is the use of wildcards in file names, and running a very close third is the use of filename completion for any shell that has it (you may disagree with my rankings). Under the above example, you lose bigtime with Amiga's file system, as all of these examples are equivalent to wildcarding. I can't complain about finding my commands quickly, but any shell that hashes its path will speed that up after the first time anyway. Why is AmigaDOS castrating its interactive use in favour of fast command lookup, when the fast command lookup will only effect the initial use of a command? Is it possible to have both? Could the file system store a secondary file containing only a set of file names? Even an ascii file would be fine. It would require more work for file creation, deletion, and moving, but it would massively speed up wildcard activities. It could be there as a secondary file to the filesystem so losing it would not be important. A simple chkdsk-like program could be used to recreate this file if it became corrupt by finding each file the hard way. The only other thing you have to do is keep this file out of reach of programmers. The disk usage is not really a problem. With 15 files to a sector, a disk with 500 files on it will only need 8K to store this information. I would happily make this trade for the speed it would provide. Is there something basically wrong with this scheme? It really sounds like a horrible kluge in some ways, but it's an issue that needs to be addressed. -- Andrew Thomas andrewt@watsnew.waterloo.edu Systems Design Eng. University of Waterloo "If a million people do a stupid thing, it's still a stupid thing." - Opus
ccplumb%rose.waterloo.edu@cunyvm.cuny.edu (12/07/89)
In article <88.filbo@gorn.santa-cruz.ca.us> filbo@gorn.santa-cruz.ca.us (Bela Lubkin) writes: >[note that I have directed followups to the .tech group] Yes, but they know all this stuff already... >In article <832@lpami.wimsey.bc.ca> Larry Phillips writes: >>The big advantage of the Amiga filing > ^^^^^^^^^^^^^ >>system is that a fully known filename is found VERY quickly. The reason for >>this is that filenames are hashed into tables in the directory blocks, and can >>be found within a (relatively) few seeks, simply by looking in the appropriate >>header and offset within that header to find the pointer to the file. Hash >>'collisions' are handled by chaining from one directory entry to the next, but >>even then, there are relatively few reads to do. >Oh, come ois and always has been bogus. Even MS-DOS >does far better than AmigaDOS in this area. In the case of a known filename, >MS-DOS reads the directory (which is effectively a single small binary file), >searches it for the named file, and is done. Since each MS-DOS directory >entry is 32 bytes, and a sector is 512 bytes, 16 entries fit in a sector; it >must read (n/16) sectors to find a file, where n=the position of the desired >file within the directory. In general that means it has to read 1-2 sectors >for MOST files. It would have to read 5 sectors to find ANY file in my Amiga >C: directory (70 files); 2 for my System: directory (31 files). Those sectors >would almost always be physically contiguous. Compare this to AmigaDOS, which >must read an arbitrary number of sectors (often 1, but sometimes more), which >are virtually guaranteed to be all over the disk. AmigaDOS has to read an average of n/72+1 sectors, which actually beats n/16, but the problem is that they're scattered. MS-DOS can get scattered, too, but because it never shrinks directories, the preallocation reduces the effect. FFS stores hash chains in sorted order, significantly reducing the scattering. > Then compare wildcard >operations. MS-DOS must still read the entire directory, which is still only >a few (probably contiguous) sectors. Amiread one sector for >each file in the directory, probably scattered all over the disk. I don't >like this tradeoff in the least. Even reading all the sectors isn't a problem, the problem is the OFS's bad habit of reading the sectors in an arbitrary order with no thought *whatsoever* to efficiency. The FFS searches them in optimal order, producing order-of-magnitude speedups. >This is a case of overblown data structures. I'd take a monolithic directory >any day. In AmigaDOS' case, this would probably involve keeping the current >directory blocks, but adding a true index for each directory -- a file that >has nothing in it but filenames and the block numbers of their directory >blocks. This could be a file of 34-byte records (30 characters + a longword >block number). It would probably be best to put 15 of these in a sector and >use the extra two bytes for something else, rather than starting a 16th entry >there and carrying it over to the 2nd directory entry, to make it easier to >reconstruct a damaged file system. Everyone keeps suggesting this, but there is a problem: there is no way to ask the file system for this information. There are any number of problems with the ExNext call that I won't go into, except to mention that I'd like to introduce the fellow who thought it up to the Spanish Inquisition, in its heyday, but one is that there is no way to ask for less information than that provided by "list," namely file name, size, protection bits, comment, type, and DiskKey. The ExNext call returns all this, and all the software that takes directories uses ExNext, so to have a better file system produce speedups with existing software, you have to be able to find all of this information quickly. >The other argument I've heard for AmigaDOS' directory structure is "unlimited >directory size", which is also bogus. Nothing limits any directory under >MS-DOS, >except< the root directory. And there's no good reason for that limitation. No, AmigaDOS doesn't have any monopoly on large diretories. But searching long paths, it does very well on. The n/72 beats MS-DOS's n/16 quite handily. when you consider the number of times you have to find a specific entry in a directory to find "foo" in path x:a/b/c;y:bar/baz/quux;z:gurble/glorble/frobnitz/greeble/thlorp, MS-DOS jumps all over the disk, too. (If you use assign instead of putting long strings into the path, AmigaDOS wins even bigger. But that's an orthogonal issue that could, in theory, be implemented on top of MS-DOS's file system.) >[I speak of MS-DOS for two reasons: I know it well; and it uses SIMPLE >solutions which happen to work pretty well. Perhaps in 6 months I'd be >arguing for some form of UNIX file system or something, but I don't know >it well enough yet. Actually, what I suggest sounds suspiciously like >inodes, in a twisted kind of way.] Yes, the problem is when you do an ls -l. Or better yet, open each file in a directory. There are all kinds of access patterns that require thought. It is true that getting a list of all the names is a very common one that should have more thought put into it, though. -- -Colin