mcp@ziebmef.uucp (Marc Plumb) (05/13/89)
I'm actually writing the file system I was blathering about so much some months ago, and am having still more problems with that nemesis of all Amiga FS writers, ExNext. I would cheerfully strangle the dimwit who defined half of its semantics so badly, and didn't define the other half at all. But I think I've already posted that flame. Suffice it to say that modifying a directory while it's being ExNext()ed can cause files to be skipped, reported twice, or even reported despite the fact that they reside in a different directory. FFS fixes some of those bugs, and they're all pretty rare, but it's still tremendously brain-damaged. Anyway, in TransAmi 2,2, John Toebes and Doug Walker mention that some programs (death by slow torture) trash everything in a FIB except the DiskKey and the first 30 characters of the name between ExNext calls. In my FS, I'm guaranteeing that when used properly (same lock used for all ExNext calls, full FIB passed), ExNext() will have no bugs, but as a concession to existing software, I'm trying to make it no more buggy than the old FS when abused. So if I could make good use of the fib_DiskKey (since a file can be renamed at any time, the name is pretty useless), it would be a bonus. However, I've also heard that some people (Ameristar?) use the DiskKey as a unique file id. So I'm not quite sure what is and is not allowed with it. Since its interpretation is not documented, then strictly speaking, I could return 32 arbitrary bits from any Examine() or ExNext() call. But in the interests of not breaking everything at once, I'm willing to include a few kludges. My questions are: Is it required that Examine()ing the same file will always return the same DiskKey? Is it required that Examine()ing different files will always return different DiskKeys? Note that these two are orthogonal - I could guarnatee that no two Examine()s will ever return the same DiskKey and satisfy the second but not the first. Is it required that ExNext() calls obey the same rules? Only with other ExNext() calls, or must similarity and uniqueness guarantees span Examine() and ExNext()? Any other information about the semantics of ExNext() or the various fields of the FileInfoBlock (what does a fib_EntryType of 0 mean?) is also greatly appreciated. Thanks! -- -Colin Plumb P.S. Does anyone have any ideas about directory formats? I'm hovering in the vicinity of 24 bytes for an empty file with a single-character name and no comment, so I can promise cheap files, but whether to use an Amiga-style directory that's a lot of pointers to file headers or a CP/M-style directory that's basically an array of file headers proper is still an open question. Then there's the Mac HFS scheme... does anyone know any other directory schemes? Note that at the level I'm working, there is no way to ask for a file's name without size, date, flags, and comment, so splitting them up "to get all those big comments out of the way since dir never uses them anyway" is not useful. You could argue for splitting up the information reported in a fib from the pointers to the file's disk space, but this means that growing a file would require you to update both places - one to record the location of the new data, and the other to record the new size. Thanks a lot for any ideas!
page%swap@Sun.COM (Bob Page) (05/15/89)
You walk on rice paper, grasshopper. Since a disk key in a Lock is a physical block number, it should be easy to see if the fib_DiskKey corresponds to it. (I just did some testing with the "pickpacket" program and everything looked OK on OFS; I didn't get too far into testing FFS when I sent a bad packet and hung the task (in fact I have a requester telling me to finish up all disk activity and reboot right now)). If you want to know about all the other handlers, you'd going to have to ask them. At one session during the last Amiga Developer's Conference (May '88) Steve Beats (Commodore's Mr.Filesystem) asked how many people were writing their own file system. Over two dozen people raised their hands. Steve (after his shock) said he'd put together a document to help those folks out, including all the stress points you have to pass with Workbench, which is very file-system nasty. I never saw the document, maybe others did, or maybe he's finishing it up for the DevCon next month. ;-) On directory formats, I'd go for fast directory scans at the expense of file open speed. The difference won't be noticable at the open() call, but will be at the scan, since any increase in scan time is cumulative for each file. I happen to *like* the FFS directory format, except I'd like to have the file names in the directory header, which would make scans much faster if you didn't need the fib. Since it's relatively atomic (you'll have to read several disk blocks for a large directory) you lose all the funky race conditions that ExNext has. For applications like file requesters, or pattern matches, or shell file-name completion, you win big time. It means directories take up more space, but I think it's worth it, and would give the Amiga a radically different feel. For compatibilty, you can instruct programmers to add some magic cookies in Arg3/Arg4 (like arg3: 0xf0f0a5a5, arg4: *char buffer) before they call ExNext, that tell ExNext implementations "in the know" they're only interested in a bulk file name dump, not individual fibs. The return code would tell the program whether it got a "new" ExNext or an old one. [Gotta reboot my machine. How come layers are so slow when a requester is up? Somebody do a Forbid()? Is this needed? The de-dicer is so slow I feel like I'm working on a Sun. :-) ] ..bob
rick@sbcs.sunysb.edu (Rick Spanbauer) (05/15/89)
In article <1989May12.183725.25380@ziebmef.uucp>, mcp@ziebmef.uucp (Marc Plumb) writes: [text deleted] > However, I've also heard that some people (Ameristar?) use the DiskKey > as a unique file id. So I'm not quite sure what is and is not allowed > with it. For the first few revs of software we kept the NFS readdir cookie in the disk key field. The latest rev of software hides all NFS state variables in an opaque data type stored around the Lock. NFS returns the diskkey set to the inode number of the file. Such numbers may or may not be unique - we pass them on unprocessed from the server. We do not rely on them for internal purposes. > Any other information about the semantics of ExNext() or the various fields > of the FileInfoBlock (what does a fib_EntryType of 0 mean?) is also greatly > appreciated. Thanks! One bug we found a few years back was that the C data structure which defines the FIB had its comment field defined too large. If you ref'ed the last few elements of the array it would crash certain programs. We reported this to Commodore, but never checked if they did something about the definition. > -Colin Plumb Rick Spanbauer Ameristar > P.S. Does anyone have any ideas about directory formats? I'm hovering in > the vicinity of 24 bytes for an empty file with a single-character name and > no comment, so I can promise cheap files, but whether to use an Amiga-style > directory that's a lot of pointers to file headers or a CP/M-style > directory that's basically an array of file headers proper is still an > open question. Then there's the Mac HFS scheme... does anyone know any > other directory schemes? Note that at the level I'm working, there is > no way to ask for a file's name without size, date, flags, and comment, so > splitting them up "to get all those big comments out of the way since dir > never uses them anyway" is not useful. You could argue for splitting > up the information reported in a fib from the pointers to the file's disk > space, but this means that growing a file would require you to update both > places - one to record the location of the new data, and the other to record > the new size. Well, here is one idea to help you bootstrap your filesystem. A technique I use in my prototype NFS server to provide better directory semantics is to layer a new filesystem over FFS. After all FFS does a fine job at handling reads/writes so why rewrite those immediately? To speed directory handling, the server does it own directory caching/creation/deletion, etc to get at least a 10X speedup over FFS Examine/ExNext, etc. Of course the NFS server does consume quite a lot more memory but in exchange it does provide faster dirops, symbolic links, etc. I would urge you to consider enhancing functionality rather than limiting it, ie don't reduce names from 30 chars -> 24 chars - expand them from 30 chars -> 128 chars! Add symbolic links, generalized attributes, more meaningful protection, additional times information, etc. The future has multimbyte floppies, huge hard disks, etc. Saving a few bytes solves a problem that will not exist in a year.
wade@pnet01.cts.com (Wade Bickel) (05/17/89)
In reading the PS to this posting I noticed you are thinking of dropping the comment field. I've been toying with the idea of using this field to store a syntax specification of CLI commands. I can see you reasoning, but please leave at least 32 characters. Thanks, Wade. UUCP: {nosc ucsd hplabs!hp-sdd}!crash!pnet01!wade ARPA: crash!pnet01!wade@nosc.mil INET: wade@pnet01.cts.com
mcp@ziebmef.uucp (Marc Plumb) (05/23/89)
In article <2826@sbcs.sunysb.edu> rick@sbcs.sunysb.edu (Rick Spanbauer) writes: >In article <1989May12.183725.25380@ziebmef.uucp>, mcp@ziebmef.uucp (me) writes: > >[text deleted] >> However, I've also heard that some people (Ameristar?) use the DiskKey >> as a unique file id. So I'm not quite sure what is and is not allowed >> with it. > > For the first few revs of software we kept the NFS readdir cookie > in the disk key field. The latest rev of software hides all NFS > state variables in an opaque data type stored around the Lock. > NFS returns the diskkey set to the inode number of the > file. Such numbers may or may not be unique - we pass them on > unprocessed from the server. We do not rely on them > for internal purposes. Thanks for the info. Now what about doing NFS to an Amiga volume? How do you translate a FIB returned by ExNext() into readdir() or whatever? Do you translate the DiskKey into the inode number or something? > One bug we found a few years back was that the C data structure > which defines the FIB had its comment field defined too large. If > you ref'ed the last few elements of the array it would crash certain > programs. We reported this to Commodore, but never checked if > they did something about the definition. In the 1.3 Includes, the comment fields is now 80 bytes followed by 36 bytes of reserved space. Annoying, since I let a user set a 255-byte comment. >> P.S. Does anyone have any ideas about directory formats? I'm hovering in >> the vicinity of 24 bytes for an empty file with a single-character name and >> no comment > I would urge you to consider > enhancing functionality rather than limiting it, ie don't reduce > names from 30 chars -> 24 chars - expand them from 30 chars -> 128 > chars! Add symbolic links, generalized attributes, more meaningful > protection, additional times information, etc. The future has > multimbyte floppies, huge hard disks, etc. Saving a few bytes > solves a problem that will not exist in a year. I have 127-character filenames, and can go to 255 if anyone complains. The question of what to do with names more than 107 characters long when returning FIBs is an open one. (First time round, I'll just truncate.) When I mentioned 24 bytes, that's the size of a file header block with a short name. I can add 127 bytes of name, 255 bytes of comment, and a lot of pointers to extents to make it a lot larger. I'd like to add symlinks, more dates, better protection, etc., but the new features aren't understood by existing software, and I don't know how to design the interfaces. Multiple links, arbitrary graph file systems (the parent of the current dir is the one you came in via), symbolic links, mount points, more timestamps, etc. are all useful things, but I'd like to either design a whole new FS interface (with no BPTRs anywhere and using exec I/O messages), and/or get at least one other person to agree with me on the best way to tack it on to the existing framework. If a file has multiple links, should the comment go with the name or the the file? Should the hidden bit just be a convention observed by directory listing programs, or should it be enforced by ExNext? If it is enforced, how to find the names of hidden files? (How about ExNext returning hidden files if there are no non-hidden ones in the directory?) I'd be perfectly happy to add features to the bottom layers if someone can show me a good interface to them. Consider it a challenge! But for now, a file has name, comment, size, timestamp, parent, and protection bits. I have added a comment to the root directory and the ability to seek past the end of a file, but no more obvious extrapolations come to mind. I'm seriously considering allowing Unix-style create-and-delete files without names, but it's tricky. Thnaks for your comments. -- -Colin Plumb
mcp@ziebmef.uucp (Marc Plumb) (05/23/89)
Bob Page wrote: > You walk on rice paper, grasshopper. It's trickier than I thought at first. > Since a disk key in a Lock is a physical block number, it should be > easy to see if the fib_DiskKey corresponds to it. I know this is how the old file system does it, but I'm *writing* a new handler, not using an existing one, so I want to know what undocumented or obscure features of the old ones people use so I can try to give the sinners time to repent. My file system is *very* different from the current ones and there is no obvious analogue. Steve Beats' document sounds very interesting, if only I could get my hands on it. In the interim, the series in the Amiga Transactor seems to be the best thing going. I think the FFS directory format is incredibly wasteful. In my system, a small (up to a few K) file "foo" has about 30 bytes of overhead (including entry in parent directory), so a 1-byte file takes 31 bytes of disk space. FFS uses 1K. Half a K for an empty file? I need about 23 bytes. While I do use the pointers-to-file-headers scheme, the headers' small size lets me keep lots on one track and I use clustering heuristics. The hashing is a neat idea - I use a 256-way hash myself. This produces fast failed opens, which is needed for path searching. Your idea of providing a different way to get a directory listing is nice, although I don't understand the suggested implementation. One way is just to use open/close/read/write and return a data stream in a standard format. This conflicts with the PATH: device, but I think Rico will survive. Or why not just create a new call that uses locks instead of FIBs and also returns the file's name (and maybe a file/dir bit). You could Examine() the lock if you need more info, Open() it if you want the file (no race condition), or pass it back to the call (null lock passed starts a directory scan, null lock returned indicates end) to get the next one. Or DupLock() the lock if you're globbing to a list of files and want to build it up, or just drop the lock and end the scan. A handler that didn't understand this would return a packet not understood error and you could fall back to ExNext. Anyway, thanks for the vote for scans over opens. While I've already decided most things, I was considering a hack for .info files that would keep small files with their headers, reducing the number of headers that would fit on a track. I think I'll punt on that, unless I want to let the file system know that files with names ending in ".info" should be optimised this way. It's a premature optimisation, anyway. (I've just gone and talked to a few people who actually *use* WorkBench, and they've convinced me to at least look at the problem. I've got a fair bit to do before then, though.) I'm trying really hard for killer directory speeds. We'll see... I've rambled a bit. Oh, well. I just worked out a scheme that would let me return DiskKeys in permanent one-to-one correspondence with files and use only them to keep track of my position in a directory listing, but it requires keeping 6 bytes per entry in the directory being scanned for the duration of the lock being ExNext()ed along. Does that seem reasonable? -- -Colin Plumb
limonce@pilot.njin.net (Tom Limoncelli) (05/24/89)
In article <1989May22.162035.19970@ziebmef.uucp> mcp@ziebmef.uucp (Marc Plumb) writes: > I'm trying really hard for killer directory speeds. We'll see... How about using a hash table? Wildcards would be a little slow if the cache system isn't good (could be improved in a later FS release) but lookups (finding commands, opening files, all the things I usually do) would be SUPER FAST! Nah, it'll never work. :-) > -Colin Plumb -- Tom Limoncelli -- tlimonce@drunivac.Bitnet -- limonce@pilot.njin.net Drew University -- Box 1060, Madison, NJ -- 201-408-5389 Standard Disclaimer: I am not the mouth-piece of Drew University
mcp@ziebmef.uucp (Marc Plumb) (05/28/89)
In article <4208@crash.cts.com> wade@pnet01.cts.com (Wade Bickel) writes: >In reading the PS to this posting I noticed you are thinking of dropping the >comment field. Nope! 255 character comments, the limit allowed by the BSTR interface at the packet level. Does anyone have any ideas better than truncation for returning 80 (or 116 if you're willing to risk bugs) character and longer comments? One idea I had was to return the proper length byte and just truncate the data, but I don't think the dos.library is bright enough to cope with that when it translates to a C string. In fact, I've added the ability to have a comment on the root directory, and the only thing that's keeping me from adding lots more features (more timestamps and things) is the lack of a way to get the new information to the user. -- -Colin Plumb
mcp@ziebmef.uucp (Marc Plumb) (05/31/89)
In article <May.24.00.02.54.1989.15356@pilot.njin.net> limonce@pilot.njin.net (Tom Limoncelli) writes: >How about using a hash table? Wildcards would be a little slow if the >cache system isn't good (could be improved in a later FS release) but >lookups (finding commands, opening files, all the things I usually do) >would be SUPER FAST! > >Nah, it'll never work. :-) I do! A directory is an array of pointers to inodes, so you have all the pointers at once and can sort them by track for fast access, but each array element also contains an 8-bit hash value (an 8-bit CRC of the name) so you can throw away 255/256 of the directory if you know the name exactly. And you have pointers directly to all the file names with that hash value, so you can sort *them*, trying everything in cache before resorting to disk access. The hashing is one good idea AmigaDOG had that I didn't throw away. I considered an extensible hashing scheme (if I detect a collision, I use a second-, third-, etc. order hash on the colliding names until they are resolved) that would allow a directory to map a name onto a unique inode directly, but it didn't seem worth it. Does anyone dissent? -- -Colin Plumb