guy@auspex.auspex.com (Guy Harris) (01/15/91)
(Perhaps this should move to "comp.os.misc". I've sent followups there; if you disagree, remember to send your posting and any followups elsewhere.) >Plan 9 (an experimental system from AT&T) is better, but not worth >considering, as it is a research effort at the moment. Not worth considering from what standpoint? Not worth considering as something to buy for a site with N different vendors' machines, perhaps, as it's probably not available for them. I think it's worth considering as a source of ideas on how to make a good widely-available OS, though (along with many other OSes). >Consider what would happened if the process table was kept in *the* >(file system) name space. Then ls /proc to would report on which >processies are running. Yup - and it *does* on some versions of UNIX, e.g. S5R4 and, I think, the later Research versions. >It also eliminates the need for a program to get rid of unwanted processies: >just use the utility which removes files (rm). Well, maybe. I don't think the Research "/proc" did that, and I seem to remember that they didn't consider it the right way to go. I'm not sure why, but *do* note that there isn't a UNIX program "to get rid of unwanted processes" that I know of - there's a program to send an *arbitrary* signal to a process or set of processes, but not one to simply "get rid" of them (you can send SIGTERM with "kill", which lets the process clean up before exiting, but isn't a guaranteed kill; you can send some signal that causes a core dump, which may not let it clean up but may give it a core dump, but that's not guaranteed either; you can send a SIGKILL, which will kill anything not stuck in some unbreakable wait). >I have yet to see a system to implement good IPC. MS-DOS doesn't have >any. UNIX has pipes, fifos, sockets, ad nausium. There should be one >type of IPC (link) which should cover all cases and work exactly like a file. Exactly like a file being accesses sequentially, anyway. If you send "seek" messages to cope with random access, it won't look like a file to the guy on the other end, as they'll have to interpret those messages....
new@ee.udel.edu (Darren New) (01/15/91)
In article <5233@auspex.auspex.com> guy@auspex.auspex.com (Guy Harris) writes: >(Perhaps this should move to "comp.os.misc". I've sent followups there; >if you disagree, remember to send your posting and any followups >elsewhere.) > >>Plan 9 (an experimental system from AT&T) is better, but not worth >>considering, as it is a research effort at the moment. From the 15 pages I've read about it, I see Plan-9 doing all the same things wrong that UNIX did. They seem to have only files which are byte arrays, in spite of the fact that 99.44% of programs I've seen want records, and most want keyed records. UNIX then seems to attempt to stuff every object that *isn't* a file into this same mode, and poorly at that. Plan-9 seems to be going the same way. They have put the windowing stuff neither in the kernel, where you would expect it to have to work to be commercially viable, or totally out of the kernel where you can replace it when you need to (unless there are installable device drivers/servers/whatever). In addition, the interface is via bitblt, leading to device dependance and probably inefficient display over a slower-speed network. I suspect that the security features of the file systems are just as bad as on UNIX too; however, I have no evidence to that effect except that they have kept the rest of the uglynesses too. I realize this is pretty imflamable. Content-free flames via email, please. Flames that actually make a point can post. Thanks. -- Darren -- --- Darren New --- Grad Student --- CIS --- Univ. of Delaware --- ----- Network Protocols, Graphics, Programming Languages, Formal Description Techniques (esp. Estelle), Coffee, Amigas ----- =+=+=+ Let GROPE be an N-tuple where ... +=+=+=
plinio@boole.seas.ucla.edu (Plinio Barbeito/;093091;allsites) (01/15/91)
In article <5233@auspex.auspex.com> guy@auspex.auspex.com (Guy Harris) writes: >>Consider what would happened if the process table was kept in *the* >>(file system) name space. Then ls /proc to would report on which >>processies are running. [...] >>It also eliminates the need for a program to get rid of unwanted processies: >>just use the utility which removes files (rm). I agree with the (former) poster; I also see this as an elegant solution. You could get rid of a family of processes all at once by using wildcards! Simple example: kill all of the rpc deamons running by typing 'rm rpc.*d'. >Well, maybe. I don't think the Research "/proc" did that, and I seem to >remember that they didn't consider it the right way to go. I'm not sure >why, but *do* note that there isn't a UNIX program "to get rid of >unwanted processes" that I know of - there's a program to send an >*arbitrary* signal to a process or set of processes, but not one to >simply "get rid" of them (you can send SIGTERM with "kill", which lets >the process clean up before exiting, but isn't a guaranteed kill; you >can send some signal that causes a core dump, which may not let it clean >up but may give it a core dump, but that's not guaranteed either; you >can send a SIGKILL, which will kill anything not stuck in some >unbreakable wait). Any process that catches a signal, would effectively be modifying its file protection bits. To a 'process' file like this: pr-------- 2 user 512 Jan 11 13:53 process The user could do something like: chmod +w process to get: prw------- 2 user 512 Jan 11 13:53 process so that a subsequent rm would be able to remove the process. Other characters besides 'w' could be used for the different signals; this is just an example. The complexity is thus moved to the file protection bits, which is OK, IMO, because by looking at these you could tell beforehand whether or not you'd have to do the equivalent of a kill -9. Typically, I'll do a kill on a process to find out that it didn't go away, then have to do a kill -9 to get rid of it. plini b ----- Diclaimers: No Disclaimers
kenw@skyler.calarc.ARC.AB.CA (Ken Wallewein) (01/15/91)
Some want a stream of bytes; some want a fully structured data file. I want both. I want to be able to choose. When one is dealing with complex non-volatile data structures, a stream of bytes file can be awkward. Suppose one is using shared record access, and a process crashes? It's nice to have the OS cooperate to clear the locks. On the other hand, when one _doesn't_ want records, being forced to deal with them can be a pain. Seems to me what we need is an object-oriented OS. In a sense, that's what record-oriented file systems are trying to give us. Both approaches, however, are getting a little old. -- /kenw Ken Wallewein A L B E R T A kenw@noah.arc.ab.ca R E S E A R C H (403)297-2660 C O U N C I L
new@ee.udel.edu (Darren New) (01/16/91)
In article <1211@cvbnetPrime.COM> aperez@cvbnet.UUCP (Arturo Perez x6739) writes: >Well, actually, what I would like is a "method" for accessing the file that >is defined by the application that creates the file. Yes, that would be the UNIX-like way of doing it right (if I might be so bold). Keep the accessor out of the kernel. I've thought about this. The only problems would be these: 1) composible accessors, so you didn't have to reinvent the wheel if you wanted an encrypted, shared, record-oriented file. 2) security of the accessors: make sure that I could not circumvent your accessors to bypass your security. 3) a minimal set of functionality, so that a program like (say) vi could at least read a file, and preferably write it. For example, say that the accessor of a file would handle putting line numbers on each record in the editor such that each change did not renumber all the lines in the file and hence you could keep a listing around that would still have some connection to reality after the first change. How would the compiler ask the accessor program to get records back one at a time in the correct order? How would the compiler report errors by line number? >Records are pretty >bogus, if you ask me, because the OS has to maintain them when everyone knows >that the OS doesn't give a hoot about the internal structure of files. Baloney. The OS certainly cares about the internal structure of files we call directories, as well as the internal structure of block and some of the char special files, all of which are record-oriented. Our networks are record oriented (and we hide that via TCP, which X Windows then has to go break up again). Our disks are record oriented. Out tapes are record oriented. In addition, some of the OS that isn't in the kernel (login, for example) uses records in it's control files. About the only programs that don't talk about records are cp and rlogin. Other than that, I can't think offhand of any program that does not maintain some idea of record structure. Applications that really need random-accessed keyed files under UNIX tend to instead make a directory with files naming the keys. Observe, for instance, termcap. >In other >words, record-oriented filesystems are a kind of holdover because nobody knows >how to get the OS to do more than that; e.g, what would be more useful is a >"binary-tree" file for storing data with an associated "file call" rather than >an 80-byte record file where blah, blah, blah. No, records are there for the same reason that files are there: because they are a natural way of breaking up data. Any program that deals with less than an entire file at a time deals with records. I agree that a more sophisticated system than records would be useful. KSAM files, Mac-style resource forks, accessor functions, the FTAM virtual file store, etc could all be useful. I would be interested in discussing the 'ideal' system with anyone who might also be so inclined. >It's a sort of resource oriented approach to files whereby an application can >"install" a file-access method with the file. If there is no method the default >can be to read the file as a byte-array (or stream, if you prefer), write by >appending records sequentially. I agree, except that there should be a richer interface between the accessor and reader by default. Calls like "How would the user identify this record" leap to mind, in order that the compiler can answer in editor-specific line numbers for a source file, but a login-creator can answer with the user name, or whatever is used for the key in the /etc/passwd equivalent. >Another example would be backup information. It would be nice if a file "resource" >for last time backup'd could be included so that the backup program could use >something other than the date to decide whether to back a file up. This was one of the timestamps on CP/V: last backup date+time. You could show that to the operator and she could get the correct tape. Otherwise, you (the user) have no idea of when the most recent backup file was made. >Forgive my Macintosh weeny-OS >leanings here, but it is a good idea... I have nothing against the Mac OS (except that it was a pain to program back when I was programming Macs). I think there are many good ideas. Look how clean much of the system is simply by adding the idea of a resource fork (i.e., a keyed record-priented file system) to the OS. Now you can replace menus with other languages, add fonts to the system file, and so on, because it is built into the OS and every program can assume it is there and works. >> They have put the windowing stuff neither in the kernel, where you >> would expect it to have to work to be commercially viable, > >I am really curious about your statement and give the above as justification >for why I don't get it. I'm not trying to start an argument or anything! As I said in a previous post, I didn't recommend putting it in the kernel. I only meant that *if* it was in the kernel, it had better work well or people won't buy the system. >> I suspect that the security features of the file systems are just as >> bad as on UNIX too; however, I have no evidence to that effect except >> that they have kept the rest of the uglynesses too. > >I think my idea above might be able to offer a pretty good security system for >'sensitive' files. Except that it is up to each user to get it right unless you include some pretty sophisticated accessor functions that can be composed. CP/V was pretty good this way. Each file could have a list of accounts that could read, a list that could write, a list that could execute, a password, an encryption seed (not stored with the file, but the file would en/decrypt automatically if a seed was given), and an "under" language. Basically, if the file was executable but not readable and the user was running the "under" program, then that program, when it openned the file, would get an open file along with an error message saying the file is execute-only. The language would then take steps to keep the file from being examined incorrectly. (This could be used in UNIX to make execute-only shell scripts. You would just set the files perms to be read(none),execute(all),under(csh).) In practice, this was used to make BASIC and APL programs execute-only. Take, for example, and adventure game. You could simply set the file as above except under(adv) and only adventure could access the file. Under UNIX, you have to use SetUID, which requires either that you request the management to make a new account for this one program, or you have to be very careful that the ADV program never reads or writes any files except the exactly correct one. Only two levels of permissions in UNIX really makes for a difficult security system to get right, and we are still plagued with problems this simplistic mechanism has caused. -- Darren -- --- Darren New --- Grad Student --- CIS --- Univ. of Delaware --- ----- Network Protocols, Graphics, Programming Languages, Formal Description Techniques (esp. Estelle), Coffee, Amigas ----- =+=+=+ Let GROPE be an N-tuple where ... +=+=+=
guy@auspex.auspex.com (Guy Harris) (01/16/91)
>From the 15 pages I've read about it, I see Plan-9 doing all the same >things wrong that UNIX did. They seem to have only files which are >byte arrays, As opposed to VMS, which has only files that are disk-block arrays? :-) I.e., are you complaining about the fact that UNIX doesn't come standard with an ISAM package (some vendors may actually provide one standard with their UNIX releases, I dunno), or about the fact that UNIX doesn't come standard with one *below user-mode*? >in spite of the fact that 99.44% of programs I've seen >want records, and most want keyed records. Different people see different things; most of the programs I've seen recently want lines or byte streams (not necessarily *character* streams, just *byte* streams - and yes, I include compilers/assemblers and linkers into the latter category). >UNIX then seems to attempt to stuff every object that *isn't* a file >into this same mode, and poorly at that. Plan-9 seems to be going the >same way. Eh? To which objects are you referring? Many devices either fit the byte-stream model, or the sort-of record model wherein each "write()" writes one record/block and each "read()" reads one. The same applies to many network connection types. Even "/proc" actually seems to fit the file model pretty well. >They have put the windowing stuff neither in the kernel, where you >would expect it to have to work to be commercially viable, DEC, Sun, IBM, HP, etc. all seem to be commercially-viable companies making products that use a window system implemented *not* in the kernel of their systems, but in a user-mode process that accepts IPC connections.... (You may not *like* X, but it seems to be "good enough" for lots of people.) >or totally out of the kernel where you can replace it when you need to (unless >there are installable device drivers/servers/whatever). From what I read of the Plan 9 paper, the only parts of the window system that *might* be done in the kernel are the lowest-level "/dev/cons", "/dev/mouse", and "/dev/bitblt" drivers. The rest is in user mode, where it's replaceable, as far as I can tell; it might be nice if you could replace the remaining bits, but the same could be said about other kernel features (in UNIX, and in other systems as well). >In addition, the interface is via bitblt, leading to device dependance >and probably inefficient display over a slower-speed network. Anybody who's actually used it know how good or bad it actually *is* over various speeds of network (how slow is a "slower-speed" network?)? >I suspect that the security features of the file systems are just as >bad as on UNIX too; however, I have no evidence to that effect except >that they have kept the rest of the uglynesses too. To what are you referring here? The lack of ACLs? Something else?
guy@auspex.auspex.com (Guy Harris) (01/16/91)
>Any process that catches a signal, would effectively be modifying >its file protection bits. Umm, but that's not *all* it's be doing; it's not catching signals *in general*, it's catching *specific* signals. >so that a subsequent rm would be able to remove the process. OK, what does "remove" mean here? Send SIGKILL? Send SIGTERM? Send SIGABRT? Send SIGHUP? Send SIGUSR1? etc.. >Other characters besides 'w' could be used for the different signals; Such as? We're talking UNIXoid system here, which gives you, at present, only 9 protection bits, but more signals. In addition, it seems to me to be a rather silly form of overloading to say "oh yeah, the group execute bit being set means that the process has caught SIGABRT"; the bits in question wouldn't mean anything *like* what they'd mean for a plain file. In fact, even the owner-write bit would have a different meaning; the "rm" command is what normally asks before removing a file not writable by the user - the "unlink()" call only cares about the permissions on the *containing directory*. I suppose you could make "/proc" a sticky directory.... >The complexity is thus moved to the file protection bits, which is OK, >IMO, because by looking at these you could tell beforehand whether or not >you'd have to do the equivalent of a kill -9. Typically, I'll do a >kill on a process to find out that it didn't go away, then have to >do a kill -9 to get rid of it. The question there is whether the process is *ignoring* SIGTERM, or *catching* it but just not cleaning up after itself as quickly as you'd like. Which one are you suggesting be reflected as missing "owner write" permission?
new@ee.udel.edu (Darren New) (01/16/91)
In article <5258@auspex.auspex.com> guy@auspex.auspex.com (Guy Harris) writes: >I.e., are you complaining about the fact that UNIX doesn't come standard >with an ISAM package (some vendors may actually provide one standard >with their UNIX releases, I dunno), or about the fact that UNIX doesn't >come standard with one *below user-mode*? Mainly that the programs that UNIX uses the most cannot depend on having an ISAM package available. I.e., that the editors can't store line numbers because the compilers won't use them. >>in spite of the fact that 99.44% of programs I've seen >>want records, and most want keyed records. > >Different people see different things; most of the programs I've seen >recently want lines or byte streams (not necessarily *character* >streams, just *byte* streams - and yes, I include compilers/assemblers >and linkers into the latter category). And lines are records, are they not? I think it is incorrect to state that compilers want byte streams. I think that compilers for languages which are not line-bound (C, Pascal) want byte streams. I think that the shell, awk, make, SCCS, assemblers, FORTRAN, BASIC all want lines. Even C knows that things are records, because it gives error numbers in terms of lines. >>UNIX then seems to attempt to stuff every object that *isn't* a file >>into this same mode, and poorly at that. Plan-9 seems to be going the >>same way. > >Eh? To which objects are you referring? Many devices either fit the >byte-stream model, or the sort-of record model wherein each "write()" >writes one record/block and each "read()" reads one. But that isn't how it works, is it? If I write many small records to a tape and then read back one big one, it won't work the same as if I do it to a file on disk. The terminal, the mouse, and the windows are not files. Sure, you can make an interface to them that looks like a file, but that is the same thing as making records on top of keyed files, and has all the same problems. The terminal is record oriented (unless you switch it to raw). The mouse is clearly record oriented, in that reading half a `record' tells you nothing. The paper I read indicated that IOCTLs are used to do bitblts to the window (if I remember right), and that isn't like a file either. >The same applies >to many network connection types. Well, on the network side, we have IP (record oriented). On top of that, we put TCP to make it reliable. Many protocols I've seen put records on top of TCP, because the behaviour is inherently record- oriented. >Even "/proc" actually seems to fit >the file model pretty well. Say what?! It fits the name-space model OK, but the idea that /proc/1406/status is a file containing an ASCII representation of the CPU time used so far is nothing to do with a process being a file. I think that /dev/kmem is much more `file like' than /proc is, but I don't think /dev/kmem is the way to go either. >>They have put the windowing stuff neither in the kernel, where you >>would expect it to have to work to be commercially viable, >DEC, Sun, IBM, HP, etc. all seem to be commercially-viable companies >making products that use a window system implemented *not* in the >kernel of their systems, but in a user-mode process that accepts IPC >connections.... (You may not *like* X, but it seems to be "good enough" >for lots of people.) OH!!! I see why so many people misinterpreted what I meant! I mistyped!!! I meant to say "They have *to* put the windowing...", i.e., either it is in the kernel and works or it is not in the kernel. I meant to make the statement a conditional, not past tense. Sorry for the confusion! >>I suspect that the security features of the file systems are just as >>bad as on UNIX too; however, I have no evidence to that effect except >>that they have kept the rest of the uglynesses too. > >To what are you referring here? The lack of ACLs? Something else? Lack of ACLs; lack of file passwords; only two privledge levels (root and non-root); for non-privledged stuff (i.e., setUID) lack of any good way to know that it is right and won't trash or permit access to other files of the same account; lack of any good network security; etc. -- Darren -- --- Darren New --- Grad Student --- CIS --- Univ. of Delaware --- ----- Network Protocols, Graphics, Programming Languages, Formal Description Techniques (esp. Estelle), Coffee, Amigas ----- =+=+=+ Let GROPE be an N-tuple where ... +=+=+=
new@ee.udel.edu (Darren New) (01/16/91)
In article <KENW.91Jan15131820@skyler.calarc.ARC.AB.CA> kenw@skyler.calarc.ARC.AB.CA (Ken Wallewein) writes: > [[ lots of good stuff]] > Seems to me what we need is an object-oriented OS. In a sense, >that's what record-oriented file systems are trying to give us. Both >approaches, however, are getting a little old. Well put. I can see (offhand) three ways of having a filesystem that is unlike the traditional file systems (defined as those I am familiar with :-): 1) An object-oriented file system. Probably requires that the base system be object-orientedly-programmed. 2) A filesystem wherein everything (including user process address spaces) is just one big address space. Probably unusable. Kind of a varient of the FORTH block system. 3) An active file system. An object-oriented file system where each object (i.e., each file) is actually a running program which could maintain itself and modify itself asynchronously. It looks to me like the Plan-9 filesystem is either (1) or (3), but with a really underpowered interface to the objects. I've been thinking along the following lines: Each name in the address space would be able to receive asynchronous messages. Synchronous operations would be a special case of async messages (send then wait) instead of how UNIX now works it. Each message would be passed to each name in the path of the object being sent to until that object declared that it could handle the message. For example, The message 'status /proc/16543' would first go to "/", which would say "I can't answer that." Then 'proc' would get 'status 16543' and would be able to answer the status of that process. The message 'dir usr/joe/bin' would go to usr, which would send 'dir joe/bin' to joe, which would send 'dir bin' to bin, which would return the directory of files in that subdir. You get the idea so far, I'm sure. Once the appropriate object is found, the message is sent to that object. Since it would be nice to have composible access functions, I include in each file a list of access functions. For example, one file might have the following access list: key-to-seq-converter (user-supplied) handles 'rewind' 'readnext' 'readprev' outputs 'readbykey' 'findfirstkey' 'findnextkey' openinstancedata 'currentkey' block-to-key-converter (user-supplied) handles 'readbykey' 'findfirstkey' 'findnextkey' 'whichkey' 'writebykey' outputs 'readblock' 'writeblock' access-control (system-supplied, reads auth info) handles 'changemode' 'open' 'close' 'remove' outputs 'open' 'close' 'remove' fileinstancedata 'readers=a,b,c; writers=a,d,f' block-control (system-supplied, does I/O instructions) handles 'open' 'close' 'readblock' 'writeblock' outputs 'allocblock' fileinstancedata 'slow block device for data' allocation-control (system supplied, reads/writes quotas) handles 'allocblock' fileinstancedata 'slow block device for data' block-control (system-supplied, does I/O instructions) handles 'open' 'close' 'readblock' 'writeblock' 'allocblock' fileinstancedata 'fast block device for keys' Note that these access objects might even get messages through the network, allowing (say) the keys to be stored on a different machine that the data. Feedback from the user process (to obtain authorization information, say) could be in the form of callback-like messages, allowing the openning process to either supply the information (which would need to be check for correctness in some way) or to refuse to, lest a trojan access-control function attempt to collect all account names, say. At the lowest level, there will be I/O devices that are similar to the special devices under UNIX, which will have privledges that would not be released to normal users. Anyway, I've rambled enough. I'll be thinking about this for a few days (in spite of the need to get real work done :-). -- Darren -- --- Darren New --- Grad Student --- CIS --- Univ. of Delaware --- ----- Network Protocols, Graphics, Programming Languages, Formal Description Techniques (esp. Estelle), Coffee, Amigas ----- =+=+=+ Let GROPE be an N-tuple where ... +=+=+=
guy@auspex.auspex.com (Guy Harris) (01/17/91)
>Mainly that the programs that UNIX uses the most cannot depend on having >an ISAM package available. I.e., that the editors can't store line >numbers because the compilers won't use them. I.e., you're complaining that 1) UNIX doesn't come standard with an ISAM package and 2) that UNIX doesn't support stuff like "variable with fixed-length control" files with the "fixed-length control" used for line numbers. I have no great problem with the former complaint (I haven't needed an ISAM package for what I've done, but others do use it). I'm not sure what having line numbers built into the lines buys you, though. >And lines are records, are they not? They're also sequences of bytes terminated with some character or characters, on many systems. >I think it is incorrect to state that compilers want byte streams. I think it's incorrect to state that they want records, too; they want lines, and generally don't care whether the lines are defined as "a bunch of bytes with a (LF/CR-LF/CR) at the end" or "a byte count followed by the bytes of the line". >I think that compilers for languages which are not line-bound (C, >Pascal) want byte streams. I think that the shell, awk, make, SCCS, >assemblers, FORTRAN, BASIC all want lines. Lines, not records. >Even C knows that things are records, because it gives error numbers in >terms of lines. Lines, not records - and since: 1) UNIX text files don't have embedded line numbers and 2) nevertheless, UNIX C compilers give error numbers in terms of lines why do you want embedded line numbers? >But that isn't how it works, is it? If I write many small records to a >tape and then read back one big one, it won't work the same as if I do >it to a file on disk. No, it doesn't; indeed, programs that don't understand record boundaries can't always deal properly with tapes. However, programs that *do* understand tape blocks *can* deal with disk files as well as tapes; e.g., "tar" and "cpio" can maintain their archives directly on tape or on a disk file, which is quite useful (consider the number of FTP-able or UUCP-able packages archived as possibly-compress "tar" files). >The terminal, the mouse, and the windows are not >files. Sure, you can make an interface to them that looks like a file, >but that is the same thing as making records on top of keyed files, I don't think you meant that, unless "making records on top of keyed files" means something quite non-obvious. >and has all the same problems. The terminal is record oriented (unless you >switch it to raw). Yes, but I don't see why that's a problem. Programs that read and write text files can generally read and write the terminal as well. >The mouse is clearly record oriented, in that reading >half a `record' tells you nothing. Yes, but I'm not sure why this renders the fact that you access the mouse by opening "/dev/mouse" and doing "read()" calls from it at all bad. >The paper I read indicated that IOCTLs are used to do bitblts to the >window (if I remember right), Either you read a different paper, or don't remember right; the paper *I* read indicated that "...when a client process in a window *writes a message* (to the 'bitblt' file) to clear the screen, the window system clears only that window." >and that isn't like a file either. No, it isn't identical to a disk file in its behavior (one generally doesn't write messages to a disk file and expect it to do something), but: 1) it might make it possible to trace the window system actions being performed by your program by interposing something like "tee" between it and "/dev/bitblt"; 2) even if it doesn't look exactly like a disk file, the fact that it (or other kinds of objects) exists in the same *namespace* as files, and supports some subset of the same kinds of operations, can be useful if for no other reason that multiple kinds of these specialized objects behave similarly (e.g., the low-level screen bitblt code and the window-system bitblt-in-a-window code). Note that the key thing here isn't purely the file-like behavior; the notion that many types of objects inherit behavior from some common class is part of it. It may be that "file" isn't the right class, but.... >>The same applies >to many network connection types. > >Well, on the network side, we have IP (record oriented). Most programs don't talk raw IP. Some talk UDP, or SPP, or... (record oriented) and, as I noted, the "write() sends a single packet, read() reads one" model still holds. >On top of that, we put TCP to make it reliable. Many protocols I've seen put >records on top of TCP, because the behaviour is inherently record- >oriented. Which, of course, says nothing about the advantages of byte-stream-orientation vs. record-orientation in UNIX; the byte-stream-orientation vs. record-orientation is a characteristic of the protocol. Most programs don't >>Even "/proc" actually seems to fit >the file model pretty well. > >Say what?! It fits the name-space model OK, but the idea that >/proc/1406/status is a file containing an ASCII representation >of the CPU time used so far is nothing to do with a process being >a file. Yup, it's more that a process is a *directory*. :-) >I think that /dev/kmem is much more `file like' than >/proc is, but I don't think /dev/kmem is the way to go either. "/proc/1406/mem" is a file more like "/dev/kmem" (and, in S5R4 and, I think, the Research UNIXes that introduced "/proc", "/proc/<PID>" is a file containing the virtual memory of the process). >Lack of ACLs; lack of file passwords; Why are file passwords a good idea? I'm genuinely curious; what do they buy you that either 1) file encryption with a password or 2) some access control mechanism like file modes or ACLs doesn't? >for non-privledged stuff (i.e., setUID) I presume here you mean "i.e., not setUID". >lack of any good way to know that it is right and won't trash or >permit access to other files of the same account; Or perhas you mean "privileged stuff (i.e., setUID)"; i.e., if you write a set-UID program, there's no good way to verify that it's secure? Or you *did* mean non-privileged stuff, and the complaint is that you don't know that running some J. Random program won't trash *your* stuff? >lack of any good network security; etc. To what sort of network security are you referring here? Security problems with allowing network access to files without providing some network authentication scheme such as Kerberos or the "DES authentication" scheme in ONC RPC?
guy@auspex.auspex.com (Guy Harris) (01/17/91)
>Right. But the issue of which signal it's catching, and exactly what >mode bit is being used for which signal, etc., is an implementation detail. >Mainly, I was trying to attack the user interface problem; the reasoning >was that it would simplify things to unify the user interface for dealing >with all system resources. I think the reasoning is fundamentally flawed, because I don't think it *does* unify the user interface. >That I won't be able to solve all of the >implementation problems right here and now doesn't mean that one could >not develop an attractive and practical solution. But it might mean that the solution being proposed can't be made attractive and practical due to fundamental flaws; the appearance of problems that can't be solved right here and now might be due to those flaws. >You could present separate bits (not necessarily rwxrwx... but perhaps >something like prakqt21) for the different signals. But then you haven't unified the user interface; those bits don't apply at all to plain files! >But if we need more bits, we get more bits! Are you worried about >overhead, No. >filesystem incompatibility? Yes - I'm worried that the objects you're constructing now end up looking sufficiently different from files that you really haven't unified the user interface.... >I think those issues can be contained. I don't; you'll have to demonstrate a way of containing them before I believe they can be contained. >Whether or not it turns out to be silly depends on the implementation. >Yes, it could turn out to be rather tacky if you desire. Some things, >like the date of creation, user name, translate fairly well across >boundaries. Other things can be decided by the user interface that >'ls' or lower level routines wish to provide. Bits are bits. How you >interpret or show what they mean is up to the user interface. I suspect *any* implementation that tries to pretend that processes are *that* much like files will turn out tacky. Or, to put it another way, I refuse to believe that it *won't* turn out tacky until somebody demonstrates a design that doesn't, so if you want to convince me, you're stuck with solving the problems here and now.... In effect, this seems to resemble using "file" as a base class, and deriving "process" as a subclass; doing this by overriding some of the class methods seems to require that you end up with methods of the same name ("ls", "rm") but with behaviors that don't resemble the behaviors of those methods in the parent class. I'd be inclined to add new methods ("ps", "kill") instead.... If your complaint is that you can't find out which signals a process is catching/ignoring/blocking, a better solution might be to have "ps" or some such command be capable of reporting that. (Note, BTW, that just because a process is catching SIGTERM, that doesn't mean you should kill it with SIGKILL; in fact, I'd be *more* inclined to kill it with SIGTERM, because if it's catching SIGTERM, it presumably has stuff that it wants to clean up before it exits, and if I kill it with SIGKILL that stuff won't get cleaned up. The ones you need to kill with SIGKILL are the ones that: 1) ignore SIGTERM - which I suspect is an error on their part; they should probably *block* it for a short time, instead, so that the SIGTERM gets to them eventually; 2) have a bug in their SIGTERM cleanup code, such that it doesn't exit. Were those problems less common, SIGTERM would be more likely to work, and I'd probably have no need to care, except in rare cases, whether the process is catching SIGTERM.) >>In fact, even the owner-write bit would have a different meaning; the >>"rm" command is what normally asks before removing a file not writable >>by the user - the "unlink()" call only cares about the permissions on >>the *containing directory*. I suppose you could make "/proc" a sticky >>directory.... >You could jumble everything into a /proc, and/or have a /proc/user, or >~user/proc. Except that the write permission on the *file* still wouldn't indicate whether it could be unlinked.
guy@auspex.auspex.com (Guy Harris) (01/17/91)
>Not so. None of the editors on UNIX use keys (line numbers). Why? Because >none of the other programs (say, compilers) know about them. How many >files would benefit from having keys that were quick to access? >/etc/passwd /etc/group termcap /etc/hosts and many more, including every >directory on the system. Well: 1) the keys you *want* on the aforementioned files aren't line numbers, they're things like the user name, the user ID, the terminal type, etc; 2) some file system types for UNIX might well *give* directories indices - directories generally aren't plain-text files in any case; 3) some UNIX systems have keyed-access versions of some of those files, although the keyed-access version has to be generated by a separate program after you edit them - but, if the program that edits them doesn't know the format of the file, that'd have to be the case *anyway*, unless you had, as an attribute of the file, something that told the keyed access package or the editor how to scan the text of a line to figure out the key. >I agree that the windowing system should not be in the kernel. My only >point was that if it *was* in the kernel, it would have to work when >you got it. I.e., kernels don't have bugs? I don't believe that. I'm not convinced that merely by putting the window system, or the print spooler, or whatever into the kernel, you necessarily increase the chances that it'll work when delivered - except perhaps by putting it there reducing its complexity, probably by trading away functionality.... >I realise that. However, if I can't write and install my own device >drivers from a non-privledged account (hopefully without rebooting), >then I can't replace the windowing system and it might as well be in >the kernel. Here, I can run sunview or X as needed. I don't know if >plan-9 can do that or not. Then perhaps it can, and your argument against it is incorrect. As they note, the window system is a user-mode server; I suspect that, in fact, you *can* run your own window system server without needing privileges (and run it within a window of another instance of the window system). >Basically, that pretty much sums it up. I think relatively minor >additions in the kernel of UNIX (primarily file handling) "kernel" in the sense of "privileged supervisor" or "core set of OS services"? The object-oriented scheme suggested by some postings might be better implemented, to a large degree, outside the privileged supervisor.
sef@kithrup.COM (Sean Eric Fagan) (01/17/91)
In article <5293@auspex.auspex.com> guy@auspex.auspex.com (Guy Harris) writes: >>The mouse is clearly record oriented, in that reading >>half a `record' tells you nothing. Yes, and that's why records in files are bad. Thanks for pointing that out. -- Sean Eric Fagan | "I made the universe, but please don't blame me for it; sef@kithrup.COM | I had a bellyache at the time." -----------------+ -- The Turtle (Stephen King, _It_) Any opinions expressed are my own, and generally unpopular with others.
sef@kithrup.COM (Sean Eric Fagan) (01/17/91)
>The mouse is clearly record oriented, in that reading >half a `record' tells you nothing. Incidently, I once *did* have a use for reading 'half records' from a mouse; I did this via awk, and it fit very nicely in with the rest of my "program" (a pipeline). In unix, "records" are consensual hallucinations. And, frankly, this is how most of us want them. -- Sean Eric Fagan | "I made the universe, but please don't blame me for it; sef@kithrup.COM | I had a bellyache at the time." -----------------+ -- The Turtle (Stephen King, _It_) Any opinions expressed are my own, and generally unpopular with others.
new@ee.udel.edu (Darren New) (01/17/91)
In article <5293@auspex.auspex.com> guy@auspex.auspex.com (Guy Harris) writes: [many questions related to but mostly orthagonal to the points I tried to make] >I have no great problem with the former complaint (I haven't needed an >ISAM package for what I've done, but others do use it). I'm not sure >what having line numbers built into the lines buys you, though. The ability to change a file without invalidating all the hardcopies you may have of the file. The same problem leads documentors to include "this page intentionally left blank" so all indexes don't have to be changed every time a page is inserted. *I* have used both and *I* prefer line numbers. I never missed line numbers in a system where I would never look at a hardcopy (like Smalltalk). >I think it's incorrect to state that they want records, too; they want >lines, and generally don't care whether the lines are defined as "a >bunch of bytes with a (LF/CR-LF/CR) at the end" or "a byte count >followed by the bytes of the line". Either is fine. I'm considering lines and records to be the same thing here. I don't know why you think I want a particular format for records or lines, if you do. >Lines, not records. Why don't you tell me the difference as you see it. I don't understand what this statement is implying. >why do you want embedded line numbers? See above. If I get ten error messages from the compiler, I have to fix them in reverse order lest the line numbers get changed while I edit. Increasing more-sophisticated tools can be used to overcome this problem (like a tool that intersperses error messages into the source file), but I believe this to be curing the symptoms. >No, it doesn't; indeed, programs that don't understand record boundaries >can't always deal properly with tapes. However, programs that *do* >understand tape blocks *can* deal with disk files as well as tapes; >e.g., "tar" and "cpio" can maintain their archives directly on tape or >on a disk file, which is quite useful (consider the number of FTP-able >or UUCP-able packages archived as possibly-compress "tar" files). No dispute there. Certainly UNIX is elegant in that many utilities can be made to work on unexpected inputs because everything is a file. My point is only that the file access metaphor is not what I would have chosen. Tapes are a superset of byte streams. Windows are a superset of byte streams (hence SIGWINCH). Processes are supersets of byte streams. Everything out there is a superset of a byte stream. If you force an application to access your device as a bytestream, you can get contorted interfaces (like all the different names for my one tape drive) but your tools work. If you allow an application to access your device in a non-bytestream way (like some windowing systems where things are calls rather than streams), none of your tools work. I believe that with a higher lowest common denominator, both of the above problems could be avoided. Nothing prevents you from *also* allowing access as a bytestream: withness the stream-oriented access to the Macintosh resource forks, which allows uploading and copying of structured files. >>The terminal, the mouse, and the windows are not >>files. Sure, you can make an interface to them that looks like a file, >>but that is the same thing as making records on top of keyed files, >I don't think you meant that, unless "making records on top of keyed >files" means something quite non-obvious. Sorry. I meant "making records on top of stream files". Brain damage, you know. I promise to proof-read my posts better in the future. The point is that the access to the mouse as a file is bogus because the mouse is not a memory-like device. If I read it now, I'll get a different answer that if I read it later. Either that, or I have information piling up and out of date. The mouse is an active entity, and if you want to access everything as a file, then files should be active also. >Yes, but I don't see why that's a problem. Programs that read and write >text files can generally read and write the terminal as well. Again, no argument from me. I'm not against having devices all be readable as files. I object to the simplicity of the interface between the user and the files. This simplicity leads (IMHO) to elegance of simple tools but reinvention of the wheel on larger programs. >Yes, but I'm not sure why this renders the fact that you access the >mouse by opening "/dev/mouse" and doing "read()" calls from it at all >bad. Same point. I object to the lack of records, not to the mouse being a file. >Either you read a different paper, or don't remember right; the paper I probably misremember. I don't remember a bitblt file, but a /dev/cons file with a bitblt IOCTL. Oh well. >1) it might make it possible to trace the window system actions being > performed by your program by interposing something like "tee" between > it and "/dev/bitblt"; Very useful. >2) even if it doesn't look exactly like a disk file, the fact that it (or > other kinds of objects) exists in the same *namespace* as files, and > supports some subset of the same kinds of operations, can be useful > if for no other reason that multiple kinds of these specialized > objects behave similarly (e.g., the low-level screen bitblt code and > the window-system bitblt-in-a-window code). Bingo! Sophisticated devices support a *superset*, not a subset, of what stream-oriented files can do. However, there is no way in UNIX of accessing these enhanced features except through the one IOCTL call, (vastly overloaded and widely varying from release to release let alone vendor to vendor,) and that IOCTL cannot be used by most tools. Also, file names themselves play an important part of the I/O behaviour of a file object, which seems really strange to me (the tape device comes to mind). >Note that the key thing here isn't purely the file-like behavior; the >notion that many types of objects inherit behavior from some common >class is part of it. It may be that "file" isn't the right class, >but.... No, that forcing the behaviour of everything to be accessed via the behaviour of (almost) the simplest file structure imaginable is the problem. >>On top of that, we put TCP to make it reliable. Many protocols I've seen put >>records on top of TCP, because the behaviour is inherently record- >>oriented. > >Which, of course, says nothing about the advantages of >byte-stream-orientation vs. record-orientation in UNIX; the >byte-stream-orientation vs. record-orientation is a characteristic of >the protocol. Right. The file access mode is inherently tied up with the file you are accessing. Hence, you have to go through extra work to put the lower-level structure (IP), as hidden by the higher level structure (TCP), back. By using TCP for reliability, you lose the advantages of a record-oriented file, and have to rebuild it from streams. RPC went the other way, and used the records of UDP and had to tack the reliability back on top. >>Say what?! It fits the name-space model OK, but the idea that >>/proc/1406/status is a file containing an ASCII representation >>of the CPU time used so far is nothing to do with a process being >>a file. > >Yup, it's more that a process is a *directory*. :-) Right. Here, we substitute "file" for the word "record" and "directory" for the word "keyed file". Other points deleted because you are right. >>Lack of ACLs; lack of file passwords; >Why are file passwords a good idea? I'm genuinely curious; what do they >buy you that either > 1) file encryption with a password Costs time on every read/write to encrypt/decrypt. Others can still make a copy or even destory it if no ACLs are provided. Difficult to change while file is open (like a TP file which stays open all the time), hence making security breaches a bigger problem. Passwords (I assumed) were checked only at open time and could be changed while the file is open, much like UNIX mode bits. > 2) some access control mechanism like file modes or ACLs I can tell you the password and have you give it to whomever needs it. I can't do that with an ACL, unless one of the flags of the ACL says who is allowed to change the ACL, which is probably the best solution of all. I have no objection to both ACLs and passwords and encryption -- each for the appropriate use. >>for non-privledged stuff (i.e., setUID) >I presume here you mean "i.e., not setUID". I meant "non-privledged" as in "not superuser". If I want to give limited access to a file without first obtaining superuser privledges for some length of time, I need to make a setUID program, which then has my permissions when accessing *any* file. >Or perhas you mean "privileged stuff (i.e., setUID)"; i.e., if you write a >set-UID program, there's no good way to verify that it's secure? Yes, that. >Or you *did* mean non-privileged stuff, and the complaint is that you >don't know that running some J. Random program won't trash *your* stuff? I think that as long as I don't set *any* of my programs to setUID and nobody with root privledges messes up a setuid program owned by the root, then I'm acceptably safe from Jay's programs. As it currently stands, with only one level of privledge, setUID programs must be hand crafted to achive the goals. To step outside the rwxrwxrwx sophistication requires hand-crafting of a setUID program. Stepping outside of more complex ACLs also requires hand-crafting some sort of application program to control access, but I think it is much more rare and much easier to assure that it will at worst mung only files associated with the given application when the ACLs are indeed sufficiently powerful. Also, with only one level of privledge, it is easy to accidentally cause a major lossage, like archiving the tape to the disk partition :-(. As a counter-example, CP/V had the following privledge levels (probably amongst others): 00 -- could run startup command only, could not read/write any files not explicitly permitted in the file's ACL. 10 -- added: could run programs in the system directory naming this account 20 -- added: could read/write files owned by that account and files read(public). 30 -- added: could run own programs 40 -- normal privledges. added: could read files marked read(all), etc 50 -- could list accounts. equiv of being able to ls /usr. 60 -- could list files in other accounts (ls -l of otherwise-non-readable dirs). 80 -- could read any memory (like ps). 90 -- could read any non-passworded file not owned by "root" A0 -- could read any file (used for backups) B0 -- could write any file C0 -- could write any memory, all privledge checking disabled. (Superuser, including the ability to bump privledge still further.) D0 -- untracked resorce allocation (could request pages from disk w/o putting in file, pages of memory that stick around when program exits, etc) E0 -- Realtime control, ability to turn off task switching, etc. Using the right level for the right job prevented many problems. I personally saw the system saved from crashes at least three times because the sysprog had boosted his privs only to what he needed to implement and test system program being worked on. >>lack of any good network security; etc. >To what sort of network security are you referring here? Security >problems with allowing network access to files without providing some >network authentication scheme such as Kerberos or the "DES >authentication" scheme in ONC RPC? Well, the ACLs in UNIX are still non-network ACLs for the most part. We have {user, group, and the entire universe} access. Why not {user, group, machine, local net, world} accesses? It would be nice to be able to put something up for anon FTP that anybody at University of Delaware could download but that nobody else could download. Kerberos works, but I don't know that it addresses the problems of which I am thinking as it seems to allow setUID programs (say, the mail server) to confirm that the caller is as represented (and vica versa). It does nothing to allow me, Darren New, to let only certain files be accessed by Jay Random. These two problems are orthagonal. We still have problems with internet worms, passwords in anonymous-FTPable files, and so on. How far would the internet worm have gotten if the only files it was allowed to access were those that were explicitly read(ftpd)? -- Darren -- --- Darren New --- Grad Student --- CIS --- Univ. of Delaware --- ----- Network Protocols, Graphics, Programming Languages, Formal Description Techniques (esp. Estelle), Coffee, Amigas ----- =+=+=+ Let GROPE be an N-tuple where ... +=+=+=
new@ee.udel.edu (Darren New) (01/17/91)
In article <5298@auspex.auspex.com> guy@auspex.auspex.com (Guy Harris) writes: > 1) the keys you *want* on the aforementioned files aren't line > numbers, they're things like the user name, the user ID, the > terminal type, etc; Right. Line numbers are a special case of a more general feature. > > 2) some file system types for UNIX might well *give* directories > indices - directories generally aren't plain-text files in any > case; Not any more. They used to be. This broke a lot of programs at the time. > [...] but, if the > program that edits them doesn't know the format of the file, > that'd have to be the case *anyway*, unless you had, as an > attribute of the file, something that told the keyed access > package or the editor how to scan the text of a line to > figure out the key. Exactly my point. If keys are in the filesystem as opposed to a library, then every program has access to that information automatically. Compare this to file locking: if it is in the kernel, then all programs will respect a lock. If it is in a library, then I can still change things under your program in spite of the fact that your program has a lock on the file. Which is `better'? >I.e., kernels don't have bugs? I don't believe that. It has been my experience that bugs in the kernel get fixed faster than bugs in the non-kernel packages because more people complain and bugs are generally more catastrophic and there is less often a fall-back program. This is not to say that I think it is a good idea. >>Basically, that pretty much sums it up. I think relatively minor >>additions in the kernel of UNIX (primarily file handling) >"kernel" in the sense of "privileged supervisor" or "core set of OS >services"? The object-oriented scheme suggested by some postings might >be better implemented, to a large degree, outside the privileged >supervisor. Right. Since the filesystem *is* part of the privledged supervisor in UNIX, then the question is moot. However, it does lead me to my next flame-bait topic: What *should* be in the kernel? What *must* be in the kernel? Is it possible to come up with a definitive minimal set of services that *must* be in the kernel of a multi-use, multi-tasking, multi-user computer OS? For example, we can see that authentication need not be a privledged operation (see Kerberos). However, I don't know of any filesystems that allow the user to specify his own routines for ACLs on files; rather, you need to make the filesystem an active entity (like a mail server) and have the server call Kerberos. For example, some of the things that are in the UNIX kernel that are not in other OSs: Closing file handles upon exit, which was done by the shell in CP/V. Core dumps, which were done by the shell in CP/V. The filesystem, which is a library in AmigaDOS. Device drivers, which are libraries in AmigaDOS and several other systems. Executable loading (exec()), which is done by the shell in MS-DOS. Can somebody give me a good reference on Mach or Multics? Thanks in advance! -- Darren -- --- Darren New --- Grad Student --- CIS --- Univ. of Delaware --- ----- Network Protocols, Graphics, Programming Languages, Formal Description Techniques (esp. Estelle), Coffee, Amigas ----- =+=+=+ Let GROPE be an N-tuple where ... +=+=+=
sef@kithrup.COM (Sean Eric Fagan) (01/17/91)
In article <41907@nigel.ee.udel.edu> new@ee.udel.edu (Darren New) writes: >> 2) some file system types for UNIX might well *give* directories >> indices - directories generally aren't plain-text files in any >> case; >Not any more. They used to be. This broke a lot of programs at the time. Bzzt. Wrong answer, but thanks for playing. Directories have never been plain-text files; they were, in fact, the only place that the kernel enforced records. Directories have never been writable, except through a limited number of system calls (creat, unlink, link, mknod, and later mkdir and rmdir), but they have always been readable. NFS does not let one read a directory, if I remember correctly, but I have read directories under BSD many, many times. -- Sean Eric Fagan | "I made the universe, but please don't blame me for it; sef@kithrup.COM | I had a bellyache at the time." -----------------+ -- The Turtle (Stephen King, _It_) Any opinions expressed are my own, and generally unpopular with others.
sef@kithrup.COM (Sean Eric Fagan) (01/17/91)
In article <41902@nigel.ee.udel.edu> new@ee.udel.edu (Darren New) writes: >The ability to change a file without invalidating all the hardcopies >you may have of the file. The same problem leads documentors to >include "this page intentionally left blank" so all indexes don't >have to be changed every time a page is inserted. Gee, most documentors I've seen (myself included, at one point in the not-so-distant past) used blank pages because chapters should begin on odd-number pages. I have never seen an odd-numbered page marked, 'This page intentionally left blank.' Anyone who has done so is being very stupid. >*I* have used >both and *I* prefer line numbers. I never missed line numbers in >a system where I would never look at a hardcopy (like Smalltalk). *I* have used both, and *I* prefer simple bytestreams. -- Sean Eric Fagan | "I made the universe, but please don't blame me for it; sef@kithrup.COM | I had a bellyache at the time." -----------------+ -- The Turtle (Stephen King, _It_) Any opinions expressed are my own, and generally unpopular with others.
fmayhar@hermes.ladc.bull.com (Frank Mayhar) (01/18/91)
In article <1991Jan16.201253.3869@kithrup.COM>, sef@kithrup.COM (Sean Eric Fagan) writes: |> In article <5293@auspex.auspex.com> guy@auspex.auspex.com (Guy Harris) writes: |> >>The mouse is clearly record oriented, in that reading |> >>half a `record' tells you nothing. |> |> Yes, and that's why records in files are bad. Thanks for pointing that out. Huh??? You apparently read something into that statement that I didn't see. Care to elaborate? "Reading half a record" depends on the application in question. I can imagine many cases where some applications need entire records, and others (using the same file) get by fine with only a part of the record. In fact, I do this myself occasionally. -- Frank Mayhar fmayhar@hermes.ladc.bull.com (..!{uunet,hacgate}!ladcgw!fmayhar) Bull HN Information Systems Inc. Los Angeles Development Center 5250 W. Century Blvd., LA, CA 90045 Phone: (213) 216-6241
fmayhar@hermes.ladc.bull.com (Frank Mayhar) (01/18/91)
In article <1991Jan17.004509.5435@kithrup.COM>, sef@kithrup.COM (Sean Eric Fagan) writes: |> In article <41907@nigel.ee.udel.edu> new@ee.udel.edu (Darren New) writes: |> >> 2) some file system types for UNIX might well *give* directories |> >> indices - directories generally aren't plain-text files in any |> >> case; |> >Not any more. They used to be. This broke a lot of programs at the time. |> |> Bzzt. Wrong answer, but thanks for playing. |> |> Directories have never been plain-text files; they were, in fact, the only |> place that the kernel enforced records. Directories have never been |> writable, except through a limited number of system calls (creat, unlink, |> link, mknod, and later mkdir and rmdir), but they have always been readable. |> NFS does not let one read a directory, if I remember correctly, but I have |> read directories under BSD many, many times. Hmm. McKusick and Karels (and a guy named Chris Landaur (sp?)) disagree with you. According to them, directories originally were basically plaintext files, and handled no differently than other files. This caused problems, though, and was changed. All this was very early on in Unix evolution. -- Frank Mayhar fmayhar@hermes.ladc.bull.com (..!{uunet,hacgate}!ladcgw!fmayhar) Bull HN Information Systems Inc. Los Angeles Development Center 5250 W. Century Blvd., LA, CA 90045 Phone: (213) 216-6241
guy@auspex.auspex.com (Guy Harris) (01/18/91)
>> 2) some file system types for UNIX might well *give* directories >> indices - directories generally aren't plain-text files in any >> case; >Not any more. They used to be. This broke a lot of programs at the time. Wow! You used UNIX back before 1975? :-) Directories weren't plain text files back in V6 (which was around in the 1975 time-frame); a directory entry was a 2-byte binary inumber followed by 14 bytes of possibly-null-terminated file name. You may be thinking of the transition from the V7 file system to the 4.2andupBSD file system, but that wasn't a change from plain-text-file directories to non-plain-text-file directories, it was a change from one non-plain-text format to another. >> [...] but, if the >> program that edits them doesn't know the format of the file, >> that'd have to be the case *anyway*, unless you had, as an >> attribute of the file, something that told the keyed access >> package or the editor how to scan the text of a line to >> figure out the key. >Exactly my point. If keys are in the filesystem as opposed to a library, >then every program has access to that information automatically. No, not exactly your point at all, as far as I can tell. *My* point is that in order to have "/etc/passwd" as an indexed file, but one that you can edit as a plain-text file, some indication of how to generate the key for an entry in that file would have to be recorded in or with the file; merely having "the filesystem" support keyed files isn't sufficient. This also raises questions of what "the filesystem" is, as opposed to "a library". In RSX-11, as I remember, the Files-11 ACP's only support of records, keys, etc. was that it would store various file attributes in the file header, and let user-mode code retrieve and change them. The model presented to user-mode code, at the QIO level, was that of an array of blocks. The stuff that implemented records, keys, and the like was, in fact, a user-mode library. So, in RSX-11, are keys in "the filesystem" or in "a library"? In VMS, I think the only difference is that the RMS library runs in executive mode. Does that change whether keys are in "the filesystem" or "a library"? Were all UNIX systems to come with an ISAM library - with the C-ISAM programmatic interface, say - and were all UNIX systems to have a file system that permits storing and retrieving of attributes the way Files-11 does, and were all the ISAM libraries to use that, would keys in "the filesystem" or in "a library"? >It has been my experience that bugs in the kernel get fixed faster than >bugs in the non-kernel packages because more people complain and bugs >are generally more catastrophic and there is less often a fall-back >program. This is not to say that I think it is a good idea. If you didn't think it was a good idea, why did you present it as part of your argument against the Plan 9 window system? >Right. Since the filesystem *is* part of the privledged supervisor in >UNIX, then the question is moot. Again, the question of "what is 'the filesystem'?" comes to mind. Imagine a UNIX system (this is inspired by my understanding of the way Apollo systems work - but I'm not sure whether code implementing file types is in any way privileged or not; it might be) with run-time dynamic linking. (SunOS 4.1[.x], S5R4, probably OSF/1 - and also Domain/OS, as you might guess from the previous parenthetical note....) Let's also add one small feature to the current UNIX file systems: a 32-bit or 64-bit arbitrary attribute for every file, with kernel calls to set and get that attribute. The kernel doesn't interpret it at all. (There's probably room in most 4.2andupBSD file system inodes for this.) Now, imagine that "open()", "close()", "read()", "write()", "lseek()", "ioctl()", etc. didn't trap directly into the kernel. Imagine them as library routines - shared library, if that's what it takes to move them out of "a (mere) library" :-) - which get at the raw data in the file using something like "mmap()". "open()" might have to sneak around the back and do the standard trap into the kernel around which "open()" is normally just wrapped. The "open()" routine would read the attribute in question, and use it to find some dynamically-loadable object that implements the "read()", "write()", etc. operations on that file. Some file types could have additional operations implemented on them, such as "read by key". This could even handle the "/etc/passwd" problem - you'd add a "password file" file type, wherein "read()" would just give you the file's data as is (or perhaps give you only one line at a time), and "write()" would have to update the keys. Now, in this system, what is "the filesystem"? Is it the code - which in this case, let us say is still in the kernel - that supports a namespace, and "bags of bytes", with associated attributes, to which the names in that namespace refer? Or is it that code *plus* the "open()"/"read()"/"write()"/etc. implementation, which supports text files, keyed files, etc., etc., etc.? >The filesystem, which is a library in AmigaDOS. So are the file system's features in AmigaDOS in "the filesystem" or "a library"? :-)
guy@auspex.auspex.com (Guy Harris) (01/18/91)
>In article <5293@auspex.auspex.com> guy@auspex.auspex.com (Guy Harris) writes: >>>The mouse is clearly record oriented, in that reading >>>half a `record' tells you nothing. Actually, he writes stuff that's a response to the aforementioned (or, at least, to something in the same posting as the aforementioned); Mr. New said the stuff you're quoting.
guy@auspex.auspex.com (Guy Harris) (01/18/91)
>The ability to change a file without invalidating all the hardcopies >you may have of the file. Which requires that line numbers be something other than simply the ordinal number of the line within the file, as I presume is the case on the systems where you used them. How often, if ever, did you have to "renumber" the lines in a file, say because you needed to insert something between two lines with adjacent line numbers? >I never missed line numbers in a system where I would never look at a >hardcopy (like Smalltalk). I never miss line numbers in a system where I rarely look at hardcopies, and even more rarely, if ever, line-number them in any case (like UNIX). >Either is fine. I'm considering lines and records to be the same >thing here. OK, then in *that* case, UNIX *does* support records, to some extent; see "fgets()". (Yeah, it's in "a library"; see my other posting, which asks what the difference really is, and why it makes a difference.) >>Lines, not records. > >Why don't you tell me the difference as you see it. I'm assuming *you* saw a difference, because you didn't consider UNIX to support "records" in the file system, but it does have a standard interface for reading a single line from a file ("fgets()"). That interface may not be the lowest-level interface you can use to get at the data in a file, but then it's not the lowest-level interface on at least some systems that *are* considered to support records "in the file system".... >See above. If I get ten error messages from the compiler, I have to >fix them in reverse order lest the line numbers get changed while >I edit. Increasing more-sophisticated tools can be used to overcome >this problem (like a tool that intersperses error messages into the >source file), but I believe this to be curing the symptoms. I don't, unless your editor on the system that handles line numbers uses some "insert a line" primitive on the file being edited that causes a correct line number (one between the line numbers of the lines between which the new line is being inserted) to be chosen automatically - and even in that case, I wouldn't consider that a solution if the requirement to manage the file that way imposed restrictions on the view it could give me of the file. (I tend to use EMACSish editors that, to a large degree, model the file as one huge character string, treating newlines like any other character, which can be inserted or deleted, and that let me cut arbitrary substrings of that string without regard to line boundaries.) Unless you have such an editor, as far as I can see you need some specialized handling; either you need an editor that manages line numbers, or an editor or you need either the aforementioned tool (by which I presume you mean something such as "error") or something like the mechanism in some EMACSish editors for running a compile, capturing the error output, parsing it, and walking through the file(s) to which the errors refer and going to each error. (I don't know if those mechanisms continue to work if you update the file before going to the next error, but I wouldn't be surprised if they did.)
new@ee.udel.edu (Darren New) (01/18/91)
In article <1991Jan17.004509.5435@kithrup.COM> sef@kithrup.COM (Sean Eric Fagan) writes: >Directories have never been plain-text files They were plain text files in the sense that you used the same open(),read() and close() system calls to get a list of file names as you did to get info out of a plain text file. Now, instead, one uses readdir, opendir, etc. I suspect that this is one of the reasons that there are no directory-handling routines in the K&R standard I/O library. Of course I'm aware that the system maintains the directories for you and prevents you from writing to them. -- Darren -- --- Darren New --- Grad Student --- CIS --- Univ. of Delaware --- ----- Network Protocols, Graphics, Programming Languages, Formal Description Techniques (esp. Estelle), Coffee, Amigas ----- =+=+=+ Let GROPE be an N-tuple where ... +=+=+=
new@ee.udel.edu (Darren New) (01/18/91)
In article <1991Jan17.004729.5517@kithrup.COM> sef@kithrup.COM (Sean Eric Fagan) writes: >Gee, most documentors I've seen (myself included, at one point in the >not-so-distant past) used blank pages because chapters should begin on >odd-number pages. I have never seen an odd-numbered page marked, 'This page >intentionally left blank.' Anyone who has done so is being very stupid. Why would you insult people like this? You have a real attitude problem. "I have not seen this, so anyone who does it must be stupid." Jeez. Have you ever considered what it would take to update a manual set that contains a dozen loose-leaf volumes, where the index is published as a separate volume? Why send 500 pages when there are changes on 20 pages? Why send a new 100 page index to every user when the only change has been to delete three paragraphs in the middle of one loose-leaf binder? Tell me how you would handle this in a better way than just leaving the pages blank or addig extra pages in the middle saying "ignore these pages", or I will just have to assume that you are even more stupid than anyone who has done it my way. (See, I know how to play with and otherwise offend other people's egos, as well as hold a conversation.) (Aside: Really people, I'm not looking for a fight here, but a discussion. If you can't check your defensive aggression at the door, why bother posting, since you aren't going to change my mind by calling anybody stupid.) -- Darren -- --- Darren New --- Grad Student --- CIS --- Univ. of Delaware --- ----- Network Protocols, Graphics, Programming Languages, Formal Description Techniques (esp. Estelle), Coffee, Amigas ----- =+=+=+ Let GROPE be an N-tuple where ... +=+=+=
new@ee.udel.edu (Darren New) (01/18/91)
In article <5337@auspex.auspex.com> guy@auspex.auspex.com (Guy Harris) writes: >>The ability to change a file without invalidating all the hardcopies >>you may have of the file. > >Which requires that line numbers be something other than simply the >ordinal number of the line within the file, as I presume is the case on >the systems where you used them. Wrong assumption. vi under UNIX will already give me that kind of line number. Normally, the line numbers I'm talking about either have decimal places or are normally numbered more than one apart, say, every ten or hundred. Think about old BASICs where the editor is built in; how did you number your programs? >How often, if ever, did you have to "renumber" the lines in a file, say >because you needed to insert something between two lines with adjacent >line numbers? The lines I used had three decimal places; I rarely ran out. I usually renumbered before making a new, clean listing, which was usually when I ran out of differnt color pens :-) >>I never missed line numbers in a system where I would never look at a >>hardcopy (like Smalltalk). >I never miss line numbers in a system where I rarely look at hardcopies, >and even more rarely, if ever, line-number them in any case (like UNIX). >>Either is fine. I'm considering lines and records to be the same >>thing here. > >OK, then in *that* case, UNIX *does* support records, to some extent; >see "fgets()". (Yeah, it's in "a library"; see my other posting, which >asks what the difference really is, and why it makes a difference.) Right. To some extent. Which is why most programs consider a newline to be the end of a line. However, I think the real difference between 'records' and 'lines' as I would use them comes when you want to replace a line in the middle of a file with another line of a different length. >I'm assuming *you* saw a difference, because you didn't consider UNIX to >support "records" in the file system, but it does have a standard >interface for reading a single line from a file ("fgets()"). And the fputs() interface for writing a 'record' can change other records, can write multiple records, can leave a record 'unterminated', and so on. Also, fgets() only works on text files; reading binary files with fgets() is error-prone. >I don't, unless your editor on the system that handles line numbers uses >some "insert a line" primitive on the file being edited that causes a >correct line number (one between the line numbers of the lines between >which the new line is being inserted) to be chosen automatically Of course. Just as vi or emacs interprets newline to mean go to the next line on the screen. >even in that case, I wouldn't consider that a solution if the >requirement to manage the file that way imposed restrictions on the view >it could give me of the file. On any filesystem that is not extensible, you have problems with viewing the files in a different way. UNIX has problems viewing files in a way that you can change the lengths of lines in the middle of a file. Record- oriented filesystems have problems viewing files as bytearrays. I would prefer a filesystem that supports both, or even better, an active filesystem that supports anything I care to write an interface for. >Unless you have such an editor, as far as I can see you need some >specialized handling; either you need an editor that manages line >numbers, An editor on such a system would naturally handle inserting lines between other lines without renumbering, just as editors on UNIX treat \014 as 'end of line' and handle 'preserve-type' functions because they can't update the middle of a file when you make the change. Again, think about old microcomputer BASIC editors. Of course you can insert lines between other lines. Eventually, you must renumber the lines, but it does not happen every time you make a change. Using vi under UNIX can be like using a TeX-like typesetter where you had to give the numeric citation in the text, and find and renumber them every time you inserted a citation that came earlier in the bibliography. >the mechanism in some EMACSish editors for running a compile, capturing >the error output, parsing it, and walking through the file(s) to which Which fails to satisfy the hardcopy problem. Again, the ideal solution would support both (or all three: numbered, unnumbered, and stream) types of access (as it did under CP/V). >(I tend to use EMACSish editors that, to >a large degree, model the file as one huge character string, treating >newlines like any other character, which can be inserted or deleted, and >that let me cut arbitrary substrings of that string without regard to >line boundaries.) You can make a tool to do that in a record-oriented environment as easily as you can recognise \014 as an end-of-line character. Just read each record into memory, separating them with your desired separation character. Lines that had not been changed need not even be written out if there are keys on the file. If lines were inserted, they can be given appropriate keys upon saving of the file. Of course, using this scheme, you keep the whole file in memory, limiting how big your files can get and extending how long it takes to make small changes to large files. Sometimes with vi, I want to add "%!PS-Adobe" to the front of a postscript file so out printer driver recognises it. A one-line change can take several minutes because I have to suck in a multi-megabyte file, write it back out to /tmp, make one change, and store it back out again, rewriting the file each time, paging like mad. With keyed files, I don't even need to read or write anything but the first line. If the editor (or even the system) dies, all changes I've made so far are saved in the file and I don't need to do anything magic in the editor to preserve those changes. If I *want* to be able to back out, the editor can store just the changes lines (and keys) either in a separate file or in memory until I chose to apply those changes. -- Darren -- --- Darren New --- Grad Student --- CIS --- Univ. of Delaware --- ----- Network Protocols, Graphics, Programming Languages, Formal Description Techniques (esp. Estelle), Coffee, Amigas ----- =+=+=+ Let GROPE be an N-tuple where ... +=+=+=
barmar@think.com (Barry Margolin) (01/18/91)
In article <42010@nigel.ee.udel.edu> new@ee.udel.edu (Darren New) writes: >But if the filesystem supports keys, then your editors probably do also. This appears to be the crux of your argument (with various replacements for "editors"). I think this is an incorrect assumption to make. Many OSes have keyed files as a basic filesystem type, but they generally aren't used for user-editable text files, or if they are the keys are ignored (making them equivalent to non-keyed variable-length record files). The problem with using general-purpose text editors to manipulate keyed files is that the editor doesn't know the semantics of the keys. A text or program file might have line numbers in the keys, while the password file might have the user names in the keys. Many systems use keyed files as the building block of databases, and then provide database query/update applications that understand the semantics by looking at a data dictionary or some database-specific attribute file. This may not be a bad thing, by the way. Why is it necessary that the password file be editable with a general-purpose text editor? If it were a keyed file with binary record data it would be faster to parse (no scanning for ':' characters) and harder to screw up (it would be updated by a program that understands the format, rather than edited by hand). By the way, to answer Guy's (I think) query about the existence of editors that store unchanging line numbers, I can recall one such system: DTSS. Its standard editor was basically the standard BASIC program input/update user interface. I remember editing entering and editing text formatter files in this environment in the late 70's. -- Barry Margolin, Thinking Machines Corp. barmar@think.com {uunet,harvard}!think!barmar
boyd@necisa.ho.necisa.oz.au (Boyd Roberts) (01/18/91)
In article <KENW.91Jan15131820@skyler.calarc.ARC.AB.CA> kenw@skyler.calarc.ARC.AB.CA (Ken Wallewein) writes: > > Some want a stream of bytes; some want a fully structured data file. >I want both. I want to be able to choose. > I've always thought that the I/O system should be re-written so that everything is a stream. That way you could get arbitrary functionality by pushing a line discipline (or stack of them) onto _any_ file. Although, the semantics of a tty with a record I/O line discipline on it may be a bit interesting. Sticking stuff in the file-system like /proc and Plan 9 is a really neat idea. If all those files were streams, the possibilities are endless. Plan 9 also has an append file type. All writes go at the end of the file. I don't know whether this implies exclusive access, but if it does, writing a mail deliverer would be trivial & there are all sorts of other applications for such a file-type. Boyd Roberts boyd@necisa.ho.necisa.oz.au ``When the going gets wierd, the weird turn pro...''
boyd@necisa.ho.necisa.oz.au (Boyd Roberts) (01/18/91)
In article <41907@nigel.ee.udel.edu> new@ee.udel.edu (Darren New) writes: > >It has been my experience that bugs in the kernel get fixed faster than >bugs in the non-kernel packages because more people complain and bugs >are generally more catastrophic and there is less often a fall-back >program. This is not to say that I think it is a good idea. So tell me about the `System V lost inode bug' and its resolution? Boyd Roberts boyd@necisa.ho.necisa.oz.au ``When the going gets wierd, the weird turn pro...''
francis@cs.ua.oz.au (Francis Vaughan) (01/18/91)
Hmm, just to be a stir the mud a bit..... Most of the comments I've seen so far seem to be asking not so much "What constitutes a good OS? " but rather what constitutes a good OS that looks like the OS's that we are used to? Much of this discussion seems to assume that OS means "something a lot like what we use UNIX/VMS etc for" and begs the question as to what an OS is. A nice meta level definition we use a lot, is to define to OS as a virtual machine. No different in concept to the way a microcoded processor presents the user with a virtual machine (at a different level). So what is an OS? Does it include the file system? The record managment system? The window manager? The shells? The network manager? The compliers? DB Query language? Try ansering all of these questions with a yes/no with the OS's you are familiar with. Try Unix, VMS, Mulics, PICK, Burroughs, Macintosh, MS-DOS. It is hard to get a definite yes no for each one, but perhaps revealing to address the questions anyway. Should your definition of OS include the kernel? (this gets involved in the discussion about how much of the above list gets to be in the kernel.) Look at Mach and Chorus. ( Mach 3.0 at least ) You could happily build a VMS emulator on these kernels and never be able to tell the difference at the user level. Yet both of these are used as Unix platforms (I am writing this on an Encore Multimax running Mach, looks a lot like Unix to me unless I use the extra features). An implicit assumption has been that we want files. The argument has been what sort of access to files are wanted, not to question why files at all. Files as byte streams, files as structured records, files that support the programs I am writing. People seem to want files into which arbitary data structures can be placed. Our work in persistent systems presents the user with a flat persistent space in which programs can run. This is distributed across peers on a network. Nobody has any concept of a file, or indeed a querey language. Structures are simply placed into the space by the program with no need to worry whether they are persistent of ephemeral, such issues are orthogonal to the programmers task. Other programs access these structures as needed in the same space. Programmers are also freed of the need to reason about the consistency of data, as program state is also part of the persistent space and the underlying stability mechanisms guarantee that a program will never see an inconsistent data space (even though the program may have to be resumed from an intermediate state if failure occurs in part of the system). Of course there is much left unsaid here, but we ARE building an OS. Just a little different. Just like to stir things up a bit :-) Francis Vaughan.
francis@cs.ua.oz.au (Francis Vaughan) (01/18/91)
In article <2271@sirius.ucs.adelaide.edu.au>, I write.... |> |> Hmm, just to be a stir the mud a bit..... Urk, something nasty happened here. Needless to say this reads a lot better if you remove the "be a". Sorry. Francis Vaughan.
jack@cwi.nl (Jack Jansen) (01/18/91)
>>... >> The message 'dir usr/joe/bin' would go to usr, which would send >> 'dir joe/bin' to joe, which would send 'dir bin' to bin, which would >> return the directory of files in that subdir. >> >> You get the idea so far, I'm sure. >>... > Oh, I like it :-). 'bin' could be a library or archive or tar file, >and could return a 'dir' of it's contents -- like I used to be able to >do on my old CP/M system :->. Been wanting that on a "real" system >for years! This is more-or-less exactly the way things work under Amoeba. I say more-or-less, because there are a few details that differ: 1. The sending of the 'dir' message to 'joe/bin' is not done by the server for 'usr', because this would tie down resources in the usr-server, which is a bad thing. In stead, a pointer to the server for 'joe' is returned to the calling program with the indication that the lookup isn't finished yet. 2. There is an optimalisation that a server will gobble up multiple pathname components if it notices that it can. So, if usr and usr/joe live on the same directory server it will do both lookups. This reduces the number of messages needed for the lookup. 3. The idea of using it for tar files as well can be implemented, but it doesn't happen automatically. You have to provide a server that knows what a tar file looks like and will do the lookups for you. For more information about Amoeba: there are ample papers available in the literature, and the full manuals are available for ftp access. Contact me for addresses. -- -- Een volk dat voor tirannen zwicht | Oral: Jack Jansen zal meer dan lijf en goed verliezen | Internet: jack@cwi.nl dan dooft het licht | Uucp: hp4nl!cwi.nl!jack
guy@auspex.auspex.com (Guy Harris) (01/19/91)
>They were plain text files in the sense that you used the same open(),read() >and close() system calls to get a list of file names as you did to get >info out of a plain text file. Err, umm, no, you didn't - at least not in V6 and later (the claim was made in an earlier posting that "early in the evolution of UNIX", directories were plain-text files, so maybe it was true then). The way you get info out of a plain text file is that you open the file and read text from it. The way you get a list of file names out of a V7-style directory is that you open the file and read *directory entries* from it, extracting the names from the directory entries. There was never any guarantee that you could, for example, see if there's a file named "foobar" in the directory "bletch" by doing "grep foobar bletch". It might work if you were lucky, or it might not. >Now, instead, one uses readdir, opendir, etc. The reason one uses "readdir()", "opendir()", etc. is that having the format of directories known by programs that read directories meant that those programs had to be rewhacked when Berkeley changed the directory format. They actually introduced those routines *before* they changed the directory format; they wrote a version of those routines, which was implemented using "open()", "read()", and "close()", that ran on 4.1BSD-flavored systems that still had a V7-flavored file system with V7-style directories. They then implemented a version, which was *also* implemented using "open()", "read()", and "close()", to read BSD-style directories. Later, Sun added the "getdirentries()" call, which asks the particular file system type to return directory entries in a file-system-independent format (yes, it happened to look like the format of 4.2BSD directory entries - but not all the file systems atop which it was implemented used that format internally, e.g. the MS-DOS file system). This was done to let a program read directories on any mountable file system type, as SunOS in 2.0 added the VFS mechanism to let you have multiple file system types (the original ones Sun did were the BSD file system/UFS, NFS, and the MS-DOS file system; the latter was only a consulting special until SunOS 4.1.1). "readdir()" used that call (and later "getdents()"); "opendir()" still uses "open()". So, unless you have a fairly odd notion of what "plain text file" means, directories weren't plain text files in V6/V7/4.1BSD.
new@ee.udel.edu (Darren New) (01/19/91)
In article <1991Jan18.023602.11202@kithrup.COM> sef@kithrup.COM (Sean Eric Fagan) writes: >And, once again, I've cat'ed out directories under BSD. Hmmm.... My mistake. I tried it on a diskless workstation instead of rlogging in to the file server, where this seems to work. Thus, it seems that the network filesystem semantics don't match the local filesystem semantics. -- Darren -- --- Darren New --- Grad Student --- CIS --- Univ. of Delaware --- ----- Network Protocols, Graphics, Programming Languages, Formal Description Techniques (esp. Estelle), Coffee, Amigas ----- =+=+=+ Let GROPE be an N-tuple where ... +=+=+=
guy@auspex.auspex.com (Guy Harris) (01/19/91)
>I meant that you used the standard open() read() and close() routines, >as opposed to readdir(), opendir(), etc. Hence, you could use >cat, sed, etc to look at and process directories, Only to a limited extent. See Sean Fagan's posting. Directories contained *B*I*N*A*R*Y* data back in V6/V7 days, and did *N*O*T* necessarily have nice newlines in them, and there was *N*O* guarantee that "sed" could understand them. The fact that you could use "cat" was irrelevant, unless you were using one of the "cat" flags that expected the file to be a text file; I can use "cat" on "/vmunix" if I want to. >which you can no longer do. I can still use "cat" on directories on the local machine, although I can't use the flags that expect the file to be a text file, nor use it to display the directories in some nice easily-read format - but then I couldn't exactly do that under V6/V7 either. >But if the filesystem supports keys, then your editors probably do also. I wouldn't assume that - VMS includes keyed file support, but I don't think its editors support keys. (I assume by "editor" you mean "text editor"; however, since by "text file" you appear to have meant, in the discussion on whether directories were plain text files or not, "files you can read with "open()", "read()", and "close()", which is a rather non-standard definition, I'd like some verification that you actually mean the same thing by "editor" that many of the rest of us do. If not, bear in mind that by using words in the same way that others do, you make communication easier....) VMS's editors do, I think, support line numbers - but those aren't keys. >>So, in RSX-11, are keys in "the filesystem" or in "a library"? > >Both. Why should I have to choose? Because, in an earlier posting, you said: If keys are in the filesystem as opposed to a library, then every program has access to that information automatically. which indicates that you do *NOT* consider them equivalent, and therefore that keys *cannot* both be in "the filesystems" and in "a library". Please be more careful in the way you phrase things, and more consistent between postings; it's hard to consider ideas - and I think a lot of what you have to say is worth considering - when you have to spend cycles interpreting the way they're phrased due to unusual use of descriptive phrases, or have to shift mindset from posting to posting because in posting A something in "a library" wasn't the same as something in "the filesystem" while in posting B they were the same. >The real choice is between a user-mode library and the kernel, and >maybe even that can be resolved. OK, so now if keys are in "a library", a program may still have access to that information automatically (*every program* doesn't, unless somehow the library can manage to *completely* hide the management of the keys from the program, or every program is written to use the calls that manage keys), and having keys in "the filesystem" is no longer opposed to having them in "a library"? Choose one or the other; I refuse to accept a choice of both.... >*** The other choice (which is what you are actually asking) is a >choice between a link-time library and an open-time library. If I need >to code special calls into my application to handle different formats >of files, What exactly do you mean by an "open-time library", and how does the above have anything to do with the choice between a "link-time library" and an "open-time library"? >then I can't use new formats with old programs; Exactly my point in the parenthetical note above - if you have to explicitly make calls to manage keys, any application that you'd want to manage keys would have to be written knowing about those calls. Maybe when *reading* a file, you can have an interface to get only the data, not the keys, from a file; that doesn't work when writing out a new file, unless your library can magically figure out what key should be assigned to each unit of the file. >for example, I can't use .PAG and .DIR files with old versions of 'cp' >without expanding the actual size the files take by filling in the holes. One way to solve that *without* teaching "cp" about ".dbm" files is to have it check for all-zero blocks and write them out as holes. >One of my complaints with the UNIX file system is that none of the tools >that make UNIX useful can handle any of the complexly-structured files >I need to build. If the mechanism for complex files was in place from >the start, then all the tools would handle them. I don't believe that - you'll have to prove it. As indicated, that requires that "all the tools" know about any stuff that can't be hidden in the library interface. >Even a simple mechanism whereby a portion of the middle of a file could >be replaced by a different size portion (records), and preferable able >to be named and retrieved in order (keys), would make many of my programs >much easier. Do you need that ability for *all* files, or would an ISAM file type - such as you'd have were all UNIXes to come standard with ISAM libraries - be sufficient? >Actually, the RSX-11 method seems pretty elegant to me, from what I've >heard friends say of it. (I've never actually programmed on it.) Don't >you think that the RSX-11 mechanism is more in the UNIX philosophy than >the current UNIX filesystem? More modular? More getting of stuff out >of the kernel? More reusable tool-ish? No, not necessarily. They *did* happen to move the *low-level* file system (the thing that manages a namespace containing "bags of bytes") out of the kernel, into a separate process - but I'm told that VMS moved it back in (XQPs as opposed to ACPs). That is, however, not at all connected to the issue of the user-mode file access libraries in RSX including support for keyed files as standard vs. the user-mode file access libraries in UNIX (unless you count "dbm") not including them, nor the issue of the RSX libraries *perhaps* permitting the *same* calls being used to read, and perhaps write, both formats. (In other words, can you use PIP to copy an ISAM file, and have the copied file usable as an ISAM file?) >>Were all UNIX systems to come with an ISAM library - with the C-ISAM >>programmatic interface, say - and were all UNIX systems to have a file >>system that permits storing and retrieving of attributes the way >>Files-11 does, and were all the ISAM libraries to use that, would keys >>in "the filesystem" or in "a library"? > >A library. I would be possible and easier to write tools that did not >use the library It's *possible* to do that under RMS; just do your own QIOs. It may be a *pain* to do so, but I'm not sure that makes a difference. >and hence were incompatible with any files written with >the library. Therefore, most tools would be incompatible, just as many >tools under UNIX are incompatible with binary files. If text-processing >software (editors, compilers, etc) all made use of the keys, then I >would be happy to use this mechanism. *ONCE AGAIN*, and please *ANSWER* this question: How are arbitrary programs that *generate* text files to assign keys to them, or is it the case that there's no *need* for them to generate keyed files? The "standard" sorts of systems that provide that functionality do not appear to have any way of letting it happen automagically. The "object-oriented" systems of the Domain/OS sort might do it, especially if it's relatively easy to make a new class that inherits the methods from "boring sequential text file" by inheriting methods from "keyed file" and building the "boring sequential text file" operations from them. >It's kind of like having the paragraph string in VI, whereby you can tell >vi what troff macros indicate a paragraph break. Sadly, vi can't tell >what TeX macros indicate a paragraph break and which LaTeX macros >indicate a paragraph break. If `paragraphs' were part of the >filesystem, then vi, LaTeX, TeX, and everyone else would probably use >this information and interoperate. That depends on whether the filesystem's notion of what a "paragraph" is matches what all those programs want. Bear in mind that TeX has to run on operating systems other than WonderfulOS (and that LaTeX is just a TeX macro package and as such probably inherits *all* its notion of what a "paragraph" is from TeX). >>Again, the question of "what is 'the filesystem'?" comes to mind. > >In my mind, the filesystem is the interface between applications and >the kernel for purposes of doing input and output operations. OK, what does "the kernel" mean here? Is it "the privileged supervisor"? If so, *please* make up your mind - earlier *in this very posting to which I'm responding* you indicated that code in a user-mode library could be part of the file system. Now you're saying it has to be in "the kernel". You can't have it both ways.... Is it "that part of the systems software that applications generally go through to get at e.g. files"? If so, that's a fairly different usage of the term "kernel" than I've ever heard - and, frankly, I'd be tempted to just call that part of the systems software "the filesystem", and answer the question "what is 'the filesystem'?" that way.... Or is the problem that you really mean "in my mind, *in typical UNIX implementations* the filesystem is the interface between applications and the kernel for doing input and output operations"? It sounds then as if you're essentially complaining that UNIX has multiple levels at which you can access files - you can use "read()" or "fgets()", for example, to read a text file - and that not everybody uses the same level. Many applications use standard I/O for doing I/O, others go straight to the "kernel" calls (which may or may not actually be in "the kernel"). >>Now, in this system, what is "the filesystem"? Is it the code - which >>in this case, let us say is still in the kernel - that supports a >>namespace, and "bags of bytes", with associated attributes, to which the >>names in that namespace refer? Or is it that code *plus* the >>"open()"/"read()"/"write()"/etc. implementation, which supports text >>files, keyed files, etc., etc., etc.? > >The latter. Aplications doing an open() get the "correct" routine >without contortions and without being able to bypass it without >special work, just as BSD UNIX puts symbolic links in the filesystem >which applications must do special work to bypass. OK, fine, so as of this paragraph, the filesystem is *NOT* the interface between the application and the kernel, it's the interface that typical applications use to get at files *regardless* of whether it runs in privileged mode or not.
new@ee.udel.edu (Darren New) (01/19/91)
In article <5362@auspex.auspex.com> guy@auspex.auspex.com (Guy Harris) writes: >>which you can no longer do. >I can still use "cat" on directories on the local machine, Sorry. I forgot to test this on a local disk, and only tried it over nfs, which told me Phhhtttt! >>But if the filesystem supports keys, then your editors probably do also. >I wouldn't assume that - VMS includes keyed file support, but I don't >think its editors support keys. Ok. Again, my mistake. Every system that had keys built in I've used has had editors that could edit based on keys. >(I assume by "editor" you mean "text editor"; Yes. The problem in part is that this conversation appears to have encompased many different areas in the same thread. I started out saying "I want keys" and one reason was "so the editor could use them". Hence, editors (text editors, that is) got mixed up into the conversation. Then we got mixed up with directories (I don't remember how) and object-oriented file systems and lots of other stuff. I probably am mixing and matching from different sub-threads. Sorry. >were plain text files or not, "files you can read with "open()", >"read()", and "close()", I was trying to distinguish "plain text" from "block special" or "character special." Clearly, I failed by chosing the wrong word and by testing only the NFS files. Never mind. >VMS's editors do, I think, support line numbers - but those aren't keys. Well, the difference in my mind is that line numbers of later lines change when you insert lines before them, whereas keys don't. How do VMS's line numbers work? Lets take this as a definition: line -- a piece of information in a file that is of fixed size once written. record -- a piece of information in a file that can be replaced by a different-sized piece later. line-number [of a record] -- the ordinal number of lines or records before this line or record. key -- an arbitrary identifier associated with a line or record which can be used for random access. Sometimes (usually in text editors) these keys are numbers, giving rise to confusion between keys and line numbers. >>>So, in RSX-11, are keys in "the filesystem" or in "a library"? >>Both. Why should I have to choose? >Because, in an earlier posting, you said: > If keys are in the filesystem as opposed to a library, > then every program has access to that information automatically. >which indicates that you do *NOT* consider them equivalent, and >therefore that keys *cannot* both be in "the filesystems" and in "a >library". OK. As I tried to say later, CORs (which I called a library above) which are "connected" to the application when the application opens a file (at open-time) would be considered (by me) to be in a library and in the filesystem. Routines (like the ISAM libraries which people have mentioned) are not in the filesystem because not all programs use them because they have to be present when the program is compiled (link-time libraries). In this case, I meant "library" as "collection of routines," which I shall henceforth abbreviate as COR. >Please be more careful in the way you phrase things, and more consistent >between postings; it's hard to consider ideas - and I think a lot of >what you have to say is worth considering - when you have to spend >cycles interpreting the way they're phrased due to unusual use of >descriptive phrases, or have to shift mindset from posting to posting >because in posting A something in "a library" wasn't the same as >something in "the filesystem" while in posting B they were the same. I apologise. Henceforth, I'll try to follow up with mutliple postings quoting only the relavent bits. Otherwise, I tend to use your terminology sometimes and my terminology sometimes, and where mine does not match yours, confusion reigns. This thread is especially bad, because I'm using words that everybody knows what they mean (library, file system, kernel) but nobody really agrees. >>The real choice is between a user-mode library and the kernel, and >>maybe even that can be resolved. > >OK, so now if keys are in "a library", a program may still have access >to that information automatically (*every program* doesn't, unless >somehow the library can manage to *completely* hide the management of >the keys from the program, or every program is written to use the calls >that manage keys), and having keys in "the filesystem" is no longer >opposed to having them in "a library"? Consider this: Other than the directory management stuff, did switching from the V7 disk layout to the BSD4.x layout "somehow ... manage to *completely* hide the management of keys [in this case, file names] from the program"? Yes, because every program *was* written to use the calls (open(), ...) that manage the "keys" (in this case, file names). Programs that *didn't* use the calls (fsck) had to be rewritten. In this case, I'm drawing an analogy between keys as record identifiers and keys as file identifiers. In the same way that a filesystem with keyed files in it can present keyed files to an application that does not care about keys in such a way as to make keyed files look like consecutive files, the BSD NFS routines present an IP network to an application and made it look like a local disk with files on it. An application compiled without any non-local disks can nevertheless access non-local disks as collections of files when moved to an appropriate machine; such a change does not require any special calls. >>*** The other choice (which is what you are actually asking) is a >>choice between a link-time library and an open-time library. If I need >>to code special calls into my application to handle different formats >>of files, >What exactly do you mean by an "open-time library", and how does the >above have anything to do with the choice between a "link-time library" >and an "open-time library"? Ah, here is the real crux. If the library behaviour is determined by the file you opened, then it is an open-time library (a collection of routines, or COR). If it is determined by what was compiled into the program, it is a link-time COR. (I make up the phrase COR to attempt to reduce ambiguity about "library".) The choice between NFS routines or the local disk management routines is made when you open the file. If I want to create a new layout (say, one which stores an entire collection of files on one file called a "stuffed file"), I can do it in one of two ways: I can write a third open-time COR, and all my binaries (cat, vi, cc, ...) can access my stuffed files, or I can make it a link-time COR which only my own applications can use. I would probably wind up writing a program to "unstuff" files which would have to be run before I could compile programs. UNIX already does this to some extent with tar and SCCS and so on. Imagine that SCCS was written like NFS instead of like tar. What would you have? Maybe something similar to VMS's file generations? Except that with SCCS, you need to check in and check out, whereas with VMS every program knows how to handle generational files. >Maybe >when *reading* a file, you can have an interface to get only the data, >not the keys, from a file; that doesn't work when writing out a new >file, unless your library can magically figure out what key should be >assigned to each unit of the file. True. However, if you are writing a file from scratch and the standard text editor normally numbers a file from 100 in steps of 10, then maybe the interface COR could "magically" chose the same defaults. In reality, I've used systems where you could not open a keyed file for writing in consecutive mode, for exactly the reason you stated above. In practice, this wasn't a problem. >One way to solve that *without* teaching "cp" about ".dbm" files is to >have it check for all-zero blocks and write them out as holes. Certainly. And you have to do this also with tar, cpio, .... What about all those other programs you don't have source to? >>One of my complaints with the UNIX file system is that none of the tools >>that make UNIX useful can handle any of the complexly-structured files >>I need to build. If the mechanism for complex files was in place from >>the start, then all the tools would handle them. >I don't believe that - you'll have to prove it. As indicated, that >requires that "all the tools" know about any stuff that can't be hidden >in the library interface. Look at the macintosh. Every program I know of can handle the resource fork. Sometimes it isn't visible to the user, but if you open any font, you have just used a complexly-structured file. Even the code segments are in the resource fork, just like VM paging is in UNIX. Saying that tools on the Mac don't all use "the mechanism for complex files" is like saying that not all programs under UNIX use virtual memory. I think I've sufficiently "proven" my point by an example of an OS where the complex file systems were there from the start. >>Even a simple mechanism whereby a portion of the middle of a file could >>be replaced by a different size portion (records), and preferable able >>to be named and retrieved in order (keys), would make many of my programs >>much easier. > >Do you need that ability for *all* files, or would an ISAM file type - >such as you'd have were all UNIXes to come standard with ISAM libraries >- be sufficient? It would also be *nice* if reading an ISAM file with plain old "open()" would return the keyed records as newline-separated bytestreams in order of the keys. >the issue of the RSX libraries >*perhaps* permitting the *same* calls being used to read, and perhaps >write, both formats. That is what I want. >(In other words, can you use PIP to copy an ISAM file, and have the >copied file usable as an ISAM file?) I would hope so. Maybe I have not used any *really bad* implementations of a complex file system. >>A library. I would be possible and easier to write tools that did not >>use the library >It's *possible* to do that under RMS; just do your own QIOs. It may be >a *pain* to do so, but I'm not sure that makes a difference. I think it would. I think most applications that didn't need the extra complexity would not include it. Under RMS, as you say, you need extra complexity in the code to make it *not* work. >*ONCE AGAIN*, and please *ANSWER* this question: >How are arbitrary programs that *generate* text files to assign keys to >them, or is it the case that there's no *need* for them to generate >keyed files? I would think that if the human user is not assigning keys (like compiler error message listings) then unkeyed files would be OK. This assumes there would be a utility to put the "standard" keys for the text editor onto the file. Not every file needs keys, even if they come with the OS. >That depends on whether the filesystem's notion of what a "paragraph" is >matches what all those programs want. True. I was making an analogy and didn't expect you to take me literally. >>>Again, the question of "what is 'the filesystem'?" comes to mind. >>In my mind, the filesystem is the interface between applications and >>the kernel for purposes of doing input and output operations. >OK, what does "the kernel" mean here? Umm... the part that isn't an application? I'm not sure where user open-time CORs come into this. Basically, I don't knwo how to define this cleanly and will attempt to avoid "kernal" in the future. >Is it "the privileged supervisor"? The privledged supervisor is certainly wholely part of the kernel. I was thinking something like the things that are accessed via calls in section two of the UNIX manual. Not all things accessed through the kernel are necessarily privledged or supervisor-resident, nor are they filesystem-related (e.g., signal()). >If so, *please* make up your mind - earlier *in this very posting to >which I'm responding* you indicated that code in a user-mode library >could be part of the file system. Now you're saying it has to be in "the >kernel". You can't have it both ways.... In the case where user-mode open-time CORs get attached to the application when the application opens the file, the CORs are not part of the privledged supervisor but are part of the filesystem. In this case, the kernel calls the usermode CORs, which define in part the interface between the application and the stored data (the "filesystem"), but clearly the CORs are not part of the privledged supervisor. Here, parts of the filesystem lie outside the kernel but are presumedly (for security) accessible only through kernel calls. Analogise with a UNIX system where NFS is a user-mode process and attempts to talk to a NFS file were intercepted by the NFS deamon. There, NFS would be part of the filesystem (as it defines the interface between the application and its data) but not part of the privledged supervisor. If the NFS deamons provide services that other parts of the filesystem don't (say, something that reports via IOCTL the load on the system holding the file) then I would say that (i.e., the new call) is part of the filesystem, just as /dev/pty became part of the filesystem between V7 and BSD. >OK, fine, so as of this paragraph, the filesystem is *NOT* the interface >between the application and the kernel, it's the interface that typical >applications use to get at files *regardless* of whether it runs in >privileged mode or not. OK, you are right. If the filesystem is entirely outside the kernel, then the above is correct, as it is in AmigaDOS. However, on all multiuser machines I have ever seen, there is a desire to limit access to the filesystem, and hence access to files has always been mediated by a privledged chunk of code that cannot normally be bypassed, which I called the kernel above. This was the assumption that I think caused some of the confusion. I was indeed mixing bits of different things. I would be quite interested in studying any system where the files are secure but not guarded by a privledged bit of code enforced in hardware. -- Darren -- --- Darren New --- Grad Student --- CIS --- Univ. of Delaware --- ----- Network Protocols, Graphics, Programming Languages, Formal Description Techniques (esp. Estelle), Coffee, Amigas ----- =+=+=+ Let GROPE be an N-tuple where ... +=+=+=
guy@auspex.auspex.com (Guy Harris) (01/21/91)
>Sorry. I forgot to test this on a local disk, and only tried it over nfs, >which told me Phhhtttt! Yup. Many NFS implementations forbid using "read()" to read directories over the wire; this was done to catch naive programs that still thought "read()" would give you what you wanted (NFS implementations would mostly give you the *raw bits* of the directory, if they gave you anything at all, rather than the directory entries in some "canonical" form understandable by the system reading them; an alternative might have been to have an NFS read operation on a directory act like a "read directory" operation). This is only indirectly connected with the difference between the V7 and BSD file systems. NFS is *not* a BSD-ism; SunOS was the first system to have it - BSD, as in "a system you get from Berkeley", didn't have it until 4.3-reno - and systems with V7/S5 file systems, BSD file systems, DOS file systems, VMS file systems, etc., etc. have NFS implementations of various sorts (some client-only, some server-only, some both - and if you guessed "client-only" for DOS, you're wrong, there *do* exist server implementations...). The multiplicity of file system types is the reason why NFS makes reading a directory an operation that returns "canonicalized" directory entries rather than the raw directory data, and why SunOS modified the "readdir()" routine to use a "read canonicalized directory entries" calls rather than "read()". (As indicated, making "read()" on a directory return canonicalized entries might have been another way of doing that.) >>VMS's editors do, I think, support line numbers - but those aren't keys. > >Well, the difference in my mind is that line numbers of later lines >change when you insert lines before them, whereas keys don't. How do >VMS's line numbers work? They may change, or they may not. According to an EDT manual lying around here, EDT will try to use decimal fractions for line numbers in order to avoid renumbering lines, but "in extreme cases EDT may be forced to renumber lines after the last line you insert". I assume those line numbers become the "fixed control" portion of the line if the file is in "variable with fixed control" format. I don't see anything in the manual that indicates what happens if you insert stuff between line 17.1 and line 17.2, say. >Lets take this as a definition: >line -- a piece of information in a file that is of fixed size once written. >record -- a piece of information in a file that can be replaced by a > different-sized piece later. >line-number [of a record] -- the ordinal number of lines or records before > this line or record. >key -- an arbitrary identifier associated with a line or record which can > be used for random access. Sometimes (usually in text editors) these > keys are numbers, giving rise to confusion between keys and > line numbers. In that case, as far as I know, in RSX and VMS, text files consist of lines, not records, and files don't have line numbers as keys, by those definitions. (Text files aren't random-access files. You *might* be able to replace a line with a shorter line, but not with a longer line, and probably not even with a shorter line. Line numbers are attributes hung off lines, but aren't keys for random access.) I've never seen a system that uses records, by that definition, as lines, nor any that use line numbers as keys. I presume you have. >Consider this: >Other than the directory management stuff, did switching from the V7 >disk layout to the BSD4.x layout "somehow ... manage to *completely* >hide the management of keys [in this case, file names] from the >program"? Yes, because every program *was* written to use the calls >(open(), ...) that manage the "keys" (in this case, file names). >Programs that *didn't* use the calls (fsck) had to be rewritten. I.e., not *every* program was written to use the calls. "fsck" wasn't, but it's a special case; it manipulates raw file systems, rather than files on a file system. More relevantly, programs that read directories weren't written to use calls that hide the implementation details of directories from the program, because no such calls existed. Nowadays, programs are generally being written to use those calls, now that they exist, so the programs need at most be recompiled when built on a new system, and may not even need to be recompiled if you plug a new file system type into a system. >Imagine that SCCS was written like NFS instead of like tar. >What would you have? DSEE. :-) Apollo's OS has the mechanism I described earlier, and one of the uses they made of it was to implement SCCS-oid files as "file types", so that the ordinary "stream I/O" read operations on an SCCS file would return the text of the latest version of the file, by default (I think there's a way of having it give some other version, perhaps by setting something like an environment variable). I'm not sure what'd happen if you did a "write this file out" operations; it might create a new version. Of course, that might have the problem that it wouldn't know where to get the history comments for the new version of the file, so perhaps that's not how it works. (This is another case, like the case of keyed files, where having an abstract "stream read" operation works a bit better than an abstract "stream write" operation; the "stream read" operation generally only has to *discard* data to make a more complexly-structured file look like a stream of lines or whatever, while a "stream write" operation would have to *add* data, and few systems implement the Read User's Mind operation.) >True. However, if you are writing a file from scratch and the standard >text editor normally numbers a file from 100 in steps of 10, then maybe >the interface COR could "magically" chose the same defaults. > >In reality, I've used systems where you could not open a keyed file >for writing in consecutive mode, for exactly the reason you stated above. >In practice, this wasn't a problem. It wasn't a problem because text files weren't keyed files, or it wasn't a problem because all the editors knew about keys? >>>One of my complaints with the UNIX file system is that none of the tools >>>that make UNIX useful can handle any of the complexly-structured files >>>I need to build. If the mechanism for complex files was in place from >>>the start, then all the tools would handle them. >>I don't believe that - you'll have to prove it. As indicated, that >>requires that "all the tools" know about any stuff that can't be hidden >>in the library interface. > >Look at the macintosh. Every program I know of can handle the resource >fork. Sometimes it isn't visible to the user, but if you open any font, >you have just used a complexly-structured file. Even the code segments >are in the resource fork, just like VM paging is in UNIX. Saying that >tools on the Mac don't all use "the mechanism for complex files" is like >saying that not all programs under UNIX use virtual memory. > >I think I've sufficiently "proven" my point by an example of an OS >where the complex file systems were there from the start. Well, maybe. What happens if - let's use the example you chose of "the tools that make UNIX useful", under the assumption that many of them are programs whose output is a character-stream or byte-stream file - you need to have one of those programs generate a file with stuff in the resource fork? Would it be possible to have some Mac C implementation's implementations of "write()" or of the standard I/O library routines generate the right stuff to shove in the resource fork, and would that be the case for *all* of the tools? Or would some tools have to be modified to put "non-default" stuff in the resource fork, or to put anything there at all? Again, this sounds like a case where providing a "stream of lines" read interface to a structured file is easy, but providing a "stream of lines" write interface to a structured file isn't so easy. (Also, are there any cases where the tool would really *want* to know about the extra structure?) >>(In other words, can you use PIP to copy an ISAM file, and have the >>copied file usable as an ISAM file?) > >I would hope so. Maybe I have not used any *really bad* implementations >of a complex file system. I don't know which systems let you do that - or how many of them accomplish that by not letting you add your own *new* structured file types to the system. (Note that the latter tends to require that *somewhere* in the system there's a way of getting at the "raw bits" of the file. I think the Apollo system used memory-mapping to do that, sort of as if "read()" and "write()" and the like were implemented atop "mmap()".) >>How are arbitrary programs that *generate* text files to assign keys to >>them, or is it the case that there's no *need* for them to generate >>keyed files? > >I would think that if the human user is not assigning keys (like >compiler error message listings) then unkeyed files would be OK. >This assumes there would be a utility to put the "standard" keys for >the text editor onto the file. Yup, but now we're back to "tar" managing "tar" files, and SCCS having special utilities to manage SCCS files, and.... >Umm... the part that isn't an application? I'm not sure where >user open-time CORs come into this. Basically, I don't knwo how >to define this cleanly and will attempt to avoid "kernal" in the >future. Good idea; in SunOS 4.x/S5R4, and probably in other systems, stuff bound to at run-time needn't live in the kernel. It's conceivable that you could re-implement "read()" and "write()" in userland, perhaps atop "mmap()" (and perhaps with the object-oriented file system notions mentioned before) - and even have existing dynamically-linked binaries run without change. >In the case where user-mode open-time CORs get attached to the >application when the application opens the file, the CORs are not part >of the privledged supervisor but are part of the filesystem. In this >case, the kernel calls the usermode CORs, Not necessarily. In the example in my previous paragraph, the application would directly call the usermode routines, which would then ultimately call routines that trap into the kernel in order to actually get a file descriptor for the file, or map it into the process's address space, or whatever. >which define in part the interface between the application and the >stored data (the "filesystem"), but clearly the CORs are not part of >the privledged supervisor. Here, parts of the filesystem lie outside >the kernel but are presumedly (for security) accessible only through >kernel calls. I wouldn't necessarily presume that; in some systems, the "abstract" file access routines (the ones providing the interface that most programs see) are completely unprivileged; while this does permit a program running with sufficient privileges to write a file to destroy the structuring of the file, it doesn't permit a program *not* running with those privileges from modifying the file in any way. (That is, as far as I know, the case with RSX; it may be the case with VMS, unless you can't QIO a write on top of an RMS file outside of executive mode.) If you want to have that level of security, I might prefer to have a fairly general mechanism to provide it. You may forbid arbitrary unprivileged code that has write permission on a keyed file from destroying the key structure, but that wouldn't necessarily forbid a program from doing keyed accesses to, destroy some higher-level structure above the key structure. E.g., if the file is a database of routines and variables in some large program, you could enter the name of a non-existent routine into it and say it lives in some non-existent source file or some source file that actually doesn't contain that routine. How much protection do you want to enforce with that level of security? Do you provide a mechanism that basically encapsulates arbitrary abstract data types and prevents any code other than the type manager from accessing the representation of that data type (barring, of course, compiler, hardware/microcode, privileged software, etc. bugs)? Or do you "draw the line" somewhere and say "stuff below the line is protected, stuff above the line isn't"? (I'm not advocating or attacking either technique - there are costs and benefits to both - I'm just asking.)
guy@auspex.auspex.com (Guy Harris) (01/21/91)
>I've always thought that the I/O system should be re-written so that >everything is a stream. Including files you access randomly? >That way you could get arbitrary functionality by pushing a line >discipline (or stack of them) onto _any_ file. After moving the streams mechanism out of the kernel? Or do I have to add all this functionality as kernel-mode streams modules? In some ways this seems equivalent to an Apollo-like object-oriented file system, or implementable atop that (in fact, Apollo implemented S5-style streams, or STREAMS if you will :-), atop their object-oriented file system; see "A Dynamically Extensible Streams Implementation" in the proceedings of the Summer 1987 USENIX). >Plan 9 also has an append file type. All writes go at the end of the >file. I.e., it's more or less a file that, under UNIX, would always have O_APPEND turned on whenever it's opened for writing?
guy@auspex.auspex.com (Guy Harris) (01/21/91)
>>Which requires that line numbers be something other than simply the >>ordinal number of the line within the file, as I presume is the case on >>the systems where you used them. > >Wrong assumption. vi under UNIX will already give me that kind of line >number. Normally, the line numbers I'm talking about either have decimal >places or are normally numbered more than one apart, say, every ten or >hundred. Umm, then, in that case, exactly as I said, line numbers are something other than simply the ordinal number of the line within the file; my presumption appears to be *correct*. Or did you read me as saying that I presumed that the line numbers in your files *were* just ordinal numbers? Unfortunately, the way I stated it, it could conceivably have been read either way; sorry if it wasn't clear what I stated. >Right. To some extent. Which is why most programs consider a newline to >be the end of a line. However, I think the real difference between >'records' and 'lines' as I would use them comes when you want to >replace a line in the middle of a file with another line of a different >length. In that case, as indicated in another posting, I've never seen a system where lines are implemented as "records" by your definition; they've all tended to implement text files as sequential-access files with no keys. I presume such systems exist, but I don't know how common they are. >>the mechanism in some EMACSish editors for running a compile, capturing >>the error output, parsing it, and walking through the file(s) to which > >Which fails to satisfy the hardcopy problem. So those for whom that's a problem need to use some kind of other editor. Lots of us don't really have the hardcopy problem as you present it.... >You can make a tool to do that in a record-oriented environment as >easily as you can recognise \014 as an end-of-line character. Just >read each record into memory, separating them with your desired >separation character. Yup, that's probably how EMACS works on systems such as VMS wherein text files can be sequential files composed of "records" (in a sense other than yours). >Of course, using this scheme, you keep the whole file in memory, >limiting how big your files can get and extending how long it takes to >make small changes to large files. A limitation that hasn't been severe for me; I've read some pretty humongous files with EMACS and not had any horrible problem editing them. (Although "vi" doesn't keep the whole file in memory; it uses some other scheme. In some sense, you could possibly think of the scheme as maintaining the file in something vaguely like a keyed file, although "replacing" a record means "adding it to the end of the file" - it doesn't bother doing trying to reuse the space for the previous record, or at least "ed" doesn't and I don't think "ex"/"vi" does either.) >Sometimes with vi, I want to add "%!PS-Adobe" to the front of a >postscript file so out printer driver recognises it. A one-line change >can take several minutes because I have to suck in a multi-megabyte >file, write it back out to /tmp, make one change, and store it back out >again, rewriting the file each time, paging like mad. With keyed files, >I don't even need to read or write anything but the first line. I guess that seems sufficiently like a sufficiently rare problem to me - and probably to lots of other people, which may explain why I've never run into any systems that maintain text files as keyed files.
guy@auspex.auspex.com (Guy Harris) (01/21/91)
>Actually, it's my belief (being a file management sort of guy) that "lines" and >"records" are one and the same thing. Unfortunately, Darren doesn't; he had specific notions in mind of what a "record" was that included more than just what a "line" is - i.e., he was thinking of "records" as the components of a randomly-accessible and randomly-updatable file. >Mostly because if I add a single line early in the file, it throws off the >entire line numbering of the listing I'm holding in my hand. The more lines I >add, the worse the problem. Alternatively, if the compiler listed the >"imbedded line numbers," it wouldn't necessarily throw things off; my listing >would still be useful. This has annoyed me many times, particularly since in >my CP6 (successor to CP-V, btw) work, Hmm. Methinks I'm starting to see more evidence that the Baby Duck Syndrome is showing up here on the anti-UNIX side, as well as on the pro-UNIX side. Perhaps UNIX people tend not to hold line-numbered listings in their hand, have gotten used to the idea of doing things differently, and don't give a fig about line numbering, while CP-V/CP6 people use line-numbered listings frequently and find the notion of a system that *doesn't* use line numbers heavily to be unpleasant. I know *I* sure don't miss big printed listings, much less big *line-numbered* printed listings....
new@ee.udel.edu (Darren New) (01/22/91)
In article <5392@auspex.auspex.com> guy@auspex.auspex.com (Guy Harris) writes: >They may change, or they may not. According to an EDT manual lying >around here, EDT will try to use decimal fractions for line numbers in >order to avoid renumbering lines, but "in extreme cases EDT may be >forced to renumber lines after the last line you insert". Edit (under CP-V) would simply bop you out of insert mode if you tried to insert something between 1736.165 and 1736.166. >In that case, as far as I know, in RSX and VMS, text files consist of >lines, not records, and files don't have line numbers as keys, by those >definitions. (Text files aren't random-access files. You *might* be >able to replace a line with a shorter line, but not with a longer line, >and probably not even with a shorter line. Line numbers are attributes >hung off lines, but aren't keys for random access.) What a shame. Then editors have to read an entire file and then write it back out again to make one change in the middle. The line numbers become the equivalent of card sequence numbers. Yuk. >I've never seen a system that uses records, by that definition, as >lines, nor any that use line numbers as keys. I presume you have. Yes, CP-V and CP6 both have such file types. Makes life much easier, believe me. >(This >is another case, like the case of keyed files, where having an abstract >"stream read" operation works a bit better than an abstract "stream write" >operation; the "stream read" operation generally only has to *discard* >data to make a more complexly-structured file look like a stream of >lines or whatever, while a "stream write" operation would have to *add* >data, and few systems implement the Read User's Mind operation.) Agreed. I don't see any problem with making a poor guess or allowing a simple file type (stream or non-keyed records) to coexist with keyed files. >It wasn't a problem because text files weren't keyed files, or it wasn't >a problem because all the editors knew about keys? It wasn't a problem because the editor could read unkeyed files and put keys on if you wanted. More important, you rarely actually edited files that were generated in such a way as to require keys; that is, the files that were generated consecutively were not often modified randomly, and when they were you just had to put on the keys. Since the keys had no significance to the contents, the lack of good initial keys was not a problem in practice. I can see where putting the "correct" keys on might be useful. For example, the Pascal compiler could put error messages in a specially-flagged comment: {%%% Error 104: Undefined variable} and key it in such a way that the error message could be merged into the source at the correct place. The special tag could make the errors easy to delete. The compiler could check to see if the error message is already there in the source and not generate it again if it is. >Well, maybe. What happens if - let's use the example you chose of "the >tools that make UNIX useful", under the assumption that many of them are >programs whose output is a character-stream or byte-stream file - you >need to have one of those programs generate a file with stuff in the >resource fork? This is unfair, but let me answer first before I explain. Take, for example, kermit on the Mac. That implementation has been changed to recognise an initial tag during the download of the resource fork of a file. That is, if the initial N bytes of a binary transfer are 0x????????, then that download is read into the resource fork. Upon upload, the user is prompted on whether to read or write the resource fork. On the Mac, there are both a character-byte-stream oriented portion, and a "resource" object-oriented portion (kind of). When you create a file, you must tell the OS the type of the file and who is creating it. The type is used to allow programs to recognise compatible files and to get the right icon for the file when the program writes several kinds of files (e.g., tables, reports, views, ...). The creator is used when the file is openned to determine which application should be started. Every open call must specify these two parameters. I would imagine that an fopen() implementation would have values hardcoded in or fetched from the environment in some way. However, this info must be specified somewhere. The resource-oriented files are accessed in a different way, closer to mmap() than anything else. You ask for a particular resource, and you get a pointer to it. You can change it and write it back, or create new empty resources. The reason I claim that the question is unfair is that there is no need for the UNIX-like programs to put stuff in the resource fork. What goes in the resource fork is Mac-specific, and UNIX ports don't use Mac-specific stuff. The tools that make the Mac useful either put the correct stuff into the resource fork because that is their job (like a font installer) or they don't need to change the resource fork at all (like a text editor). What makes the resource fork useful is that things there can be changed after the application is compiled, new items can be added that are never looked at by the originating application, and so on. >Again, this sounds like a case where providing a "stream of lines" >read interface to a structured file is easy, but providing a "stream of >lines" write interface to a structured file isn't so easy. Correct. Specifically, the Mac stores text like UNIX does, so the "stream of lines" is the same on both. However, if I'm not speaking English, I can modify all the Mac menus to be written in (say) German. This is because I can change the menus in the resource fork, making them different sizes and in different places, and not screw up anything else. >>>(In other words, can you use PIP to copy an ISAM file, and have the >>>copied file usable as an ISAM file?) >>I would hope so. Maybe I have not used any *really bad* implementations >>of a complex file system. >I don't know which systems let you do that Again, CP-V and (I understand) CP6 would allow you to even write keyed files to a tape, allowing you to access them exactly as if they were on a disk, random access and all, except that you could not write them. > - or how many of them >accomplish that by not letting you add your own *new* structured file >types to the system. Yes, CP-V was lmited in this way. I think an extensible file system would be best, but just a slightly more complex file system would make much things easier. >Yup, but now we're back to "tar" managing "tar" files, and SCCS having >special utilities to manage SCCS files, and.... You are always going to need some lowest-level structure. If all you have is keyed files, then there won't be any programs that don't write them. If you have multiple types of files, you need programs to translate back and forth. The advantage in the latter case is that you dont have to write those programs yourself! >I wouldn't necessarily presume that; in some systems, the "abstract" >file access routines (the ones providing the interface that most >programs see) are completely unprivileged; Actually, I was thinking of the following method to go with my OO filesystem: Each file has the "stack" of access mechanisms. The user may, at open time, ask for new access modes to be added on top (say, adding stream access on top of keyed files). The access control would say which modules and which users could send which messages to the file. So therefore, if you had access accesscontrol handles any sends any read-user-account-name extra-info open(john,bill) read-by-key(bill) read-next(john) access consec-over-keys handles read-next open close sends read-by-key open close access keys-over-diskblocks handles read-by-key open close write-by-key sends read-block write-block alloc-block In this case, john and bill can both open the file, bill can read it randomly and john can read it consecutively. Had the access-control been put below the consec-over-keys, then all consecutive calls would have been translated to keyed calls before getting to the access-control mechanism and the access-control mechanism would not have prevented bill from reading consecutively. Of course, some other access control would be imposed upon creation of the file to prevent read-block and write-block from being invoked with block numbers not belonging to that file. >If you want to have that level of security, I might prefer to have a >fairly general mechanism to provide it. Done. Although under CP-V, I think the answer was that you could not write unkeyed records to a keyed file and vica versa. On the other hand, I can imagine a file type where each key consists of "key+lines" (where there are unkeyed lines after this key which cannot be accessed except consecutively). >How much protection do you want to enforce with that level of security? With an active file system, that level of security is not a problem. >Do you "draw the line" somewhere and say "stuff below the line is >protected, stuff above the line isn't"? (I'm not advocating or >attacking either technique - there are costs and benefits to both - I'm >just asking.) I would let the user draw the line. There is no reason why you could not allow some users to access stuff based on "key" and others to access it based on "higher-level-data-type". Right now, UNIX does not give you that choice either except in the most cumbersum way. There is nothing that prevents you from corrupting the structure of SCCS files unless you make a user that owns all the SCCS files on the system and you only access them through SCCS, in which case you have created an active file system, and one which does not work well with the rest of the UNIX tools. Actually, adding keyed records to UNIX would not be too bad if it could be done by rewriting fopen, fgets, .... The only problem would be with programs that ftell and expect it to be a byte offset into the file. Then all that would be needed would be recompiling everything. Extra calls could be used to indicate that a program recognises the keyed file and that keyed I/O is requested. Putting keyed files on top of stream files is not difficult; maintaining the consistancy is difficult. -- Darren -- --- Darren New --- Grad Student --- CIS --- Univ. of Delaware --- ----- Network Protocols, Graphics, Programming Languages, Formal Description Techniques (esp. Estelle), Coffee, Amigas ----- =+=+=+ Let GROPE be an N-tuple where ... +=+=+=
new@ee.udel.edu (Darren New) (01/22/91)
In article <5394@auspex.auspex.com> guy@auspex.auspex.com (Guy Harris) writes: >Or did you read me as saying that >I presumed that the line numbers in your files *were* just ordinal >numbers? Yes. Sorry. >I presume such systems exist, but I don't know how common they are. Sadly, not too common. Although I don't think keyed text files are the only reason to have keyed files, it was helpful at the time I was using them. My main thrust is not that I want keys on text files, but that I want keyed files, and if such files are implemented, it should be possible to use them for text files. This is not the case in UNIX, because even though I can implement keyed files, I cannot store text in them and expect tools to recognise it. >Lots of us don't really have the hardcopy problem as you >present it.... Lots of you never worked on a hardcopy-only terminal. Being able to go back three of four pages and look to see what you did is a great help. I could edit faster on 300 baud decwriters that I often can with modern editors. How much would you enjoy using *any* of the UNIX editors on a decwriter? EMACS, vi, ed, take your pick... >I guess that seems sufficiently like a sufficiently rare problem to me - >and probably to lots of other people, which may explain why I've never >run into any systems that maintain text files as keyed files. Sure there are always workarounds. However, I've seen EMACS users say how terrible vi is because it doesn't have this and that and thother. I'm saying UNIX doesn't have this and that and thother and you're saying "well, there is this great workaround called blah, and besides I've never needed it." I suspect that you *have* needed keyed record files and have reimplemented them every time you need them. For example, passwd must read the entire password file, *find the right record by key*, make the change, and write it back out again as an atomic operation, requiring extra locking files and all. With keyed files, one would simply read the record into memory, make the change, and write it back out again. No locking, no messy error-recovery files, no user-coded lookup mechanisms, no problems with large passwd files running you out of memory, etc. I recognise that this is kind of bogus in that passwd records usually don't change length and that passwd files are not large enough to run you out of memory, but the analogy is still good. Another file needing keyed records (where it was assumed from the start that it would need fast random access) is the terminal descriptions. Termcap (or is it terminfo?) uses file names inside directories as keys for one line of information. Why not have one keyed file with the terminal name as the key and the record holding the description? because UNIX does not support keyed files and it is easier to make them separate files. Actually, there are many programs that maintain text files as keyed files (via DBM), and the passwd file is often one of them. If it is such a rare problem, why have we seemed to have fixed it? I contend that it is because we have finally gotten to where we have systems large enough that simple bytestream text files are not good enough. This is the same reason we moved from HOSTS.TXT accessed via FTP to the DNS heirarchy. Unfortunately, every solution ends up being ad hoc, inefficient or buggy in some way, or incompatible with the old way of looking at files. I had hoped that at the base level, Plan-9 would have rethought this and chosen differently. I have no doubt that they rethought it, but I am a little suprised that they did not redo the lowest interface (as I understand it). -- Darren -- --- Darren New --- Grad Student --- CIS --- Univ. of Delaware --- ----- Network Protocols, Graphics, Programming Languages, Formal Description Techniques (esp. Estelle), Coffee, Amigas ----- =+=+=+ Let GROPE be an N-tuple where ... +=+=+=
barmar@think.com (Barry Margolin) (01/22/91)
In article <42346@nigel.ee.udel.edu> new@ee.udel.edu (Darren New) writes: >What a shame. Then editors have to read an entire file and then write it >back out again to make one change in the middle. The line numbers become >the equivalent of card sequence numbers. Yuk. Even on systems where line numbers are keys, you still need to rewrite a file in order to insert a change in the middle. Unless the file is implemented as a linked list of records, but I've never heard of a system that does this for ordinary (i.e. non-DBMS) files. The filesystem might hide this rewriting from the application, but it still has to be done. -- Barry Margolin, Thinking Machines Corp. barmar@think.com {uunet,harvard}!think!barmar
new@ee.udel.edu (Darren New) (01/22/91)
In article <1991Jan21.235826.7250@Think.COM> barmar@think.com (Barry Margolin) writes: >Even on systems where line numbers are keys, you still need to rewrite a >file in order to insert a change in the middle. Not at all. You either rewrite the record in place (if it is the same or smaller length) or you put the new record at the end and point the old key to it. How else would you be able to insert a record in the middle without rewriting the entire file? Granted, sometimes inserting a key could require several new blocks of keys to be added to the file (like when one of the keyblocks fills up) but that is exceptional. -- Darren -- --- Darren New --- Grad Student --- CIS --- Univ. of Delaware --- ----- Network Protocols, Graphics, Programming Languages, Formal Description Techniques (esp. Estelle), Coffee, Amigas ----- =+=+=+ Let GROPE be an N-tuple where ... +=+=+=
guy@auspex.auspex.com (Guy Harris) (01/23/91)
>Edit (under CP-V) would simply bop you out of insert mode if you >tried to insert something between 1736.165 and 1736.166. Oh, that's *real* swell. Is there no way I can say "dammit, I NEED to have a line inserted between these two lines, no arguments from you, Mr. Editor?" Do I have to renumber the file (and invalidate all those line-numbered listings that CP-V/CP6 people seem to be finding so useful"? Even if it's rare that a file has to be renumbered, because the editor works hard at trying to keep the space of line numbers sparse, I'd rather not have to do that *at all* - I like editors where inserting text is simply a matter of pointing and typing; I don't like to have to think about what I have to do to insert text. >>I've never seen a system that uses records, by that definition, as >>lines, nor any that use line numbers as keys. I presume you have. > >Yes, CP-V and CP6 both have such file types. Makes life much easier, >believe me. Uh, if I can't insert something between two lines in the fashion you describe, that would seem to make life harder, not easier.... (And support for keeping line-numbered listings might make your life easier, but not mine.) >It wasn't a problem because the editor could read unkeyed files and put >keys on if you wanted. Arbitrary keys - in which case, how do you tell it where to get the keys? - or line numbers? >More important, you rarely actually edited files that were generated in >such a way as to require keys; that is, the files that were generated >consecutively were not often modified randomly, and when they were you >just had to put on the keys. Since the keys had no significance to the >contents, the lack of good initial keys was not a problem in practice. In other words, keyed files don't solve the "/etc/passwd" problem magically, in the sense that either: 1) "/etc/passwd" would be editable as a text file, but wouldn't have keys; 2) "/etc/passwd" would have keys, but wouldn't be editable as a text file; 3) "/etc/passwd" would be editable as a text file, would have keys in some sense, but you'd have to do something special to generate the keys (the 4.3BSD solution). (Note to those who would say that 2), say, is a solution: I didn't say it wasn't, I just said it wasn't a "magical" solution. I.e., if UNIX had native keyed files, and allowed you to read them as text files, it would still have to have chosen some implementation technique for the password database other than implementing it as a plain text file in order to give it keys.) >I can see where putting the "correct" keys on might be useful. For example, >the Pascal compiler could put error messages in a specially-flagged comment: > {%%% Error 104: Undefined variable} >and key it in such a way that the error message could be merged into >the source at the correct place. The special tag could make the errors >easy to delete. The compiler could check to see if the error message is >already there in the source and not generate it again if it is. I'd rather have the error messages show up in another file, or in another window, perhaps with a way to jump to the point at which the error occurred (e.g., the compile mode some EMACSes support. >The reason I claim that the question is unfair is that there is no need >for the UNIX-like programs to put stuff in the resource fork. What goes >in the resource fork is Mac-specific, and UNIX ports don't use >Mac-specific stuff. The tools that make the Mac useful either put the >correct stuff into the resource fork because that is their job (like a >font installer) or they don't need to change the resource fork at all >(like a text editor). What makes the resource fork useful is that >things there can be changed after the application is compiled, new >items can be added that are never looked at by the originating >application, and so on. Sounds similar to the OS/2 HPFS notion of a bag of name/attribute pairs hung from a file; some people are looking at adding this to some UNIX file systems (I think the UNIX International file system group is). >>Again, this sounds like a case where providing a "stream of lines" >>read interface to a structured file is easy, but providing a "stream of >>lines" write interface to a structured file isn't so easy. > >Correct. Specifically, the Mac stores text like UNIX does, so the >"stream of lines" is the same on both. However, if I'm not speaking >English, I can modify all the Mac menus to be written in (say) German. >This is because I can change the menus in the resource fork, making >them different sizes and in different places, and not screw up anything >else. With what file is, say, the resource that specifies menus associated? >Again, CP-V and (I understand) CP6 would allow you to even write >keyed files to a tape, allowing you to access them exactly as if >they were on a disk, random access and all, except that you >could not write them. I suspect that in most cases you wouldn't *want* to access them randomly; tapes aren't known for their speed of random access.... (In the UNIX world, and I suspect in lots of other OSes, tapes are generally used as archival, backup, and file transfer media.)
guy@auspex.auspex.com (Guy Harris) (01/23/91)
>My main thrust is not that I want keys on text files, but that >I want keyed files, and if such files are implemented, it should be >possible to use them for text files. Unfortunately, the example you chose to bolster your case for that appears to be predicated on the widespread use of hardcopy terminals; choose another example, 'cuz when the CP-V/CP6 fans talk about how wonderful that scheme is because you don't invalidate the line numbers on your listings, lots of the rest of us - and not just the UNIX users, either - tend to ask "what's a line-numbered listing, and why is it so important not to invalidate them?" :-) >>Lots of us don't really have the hardcopy problem as you >>present it.... > >Lots of you never worked on a hardcopy-only terminal. Lots of us used to work on hardcopy-only terminals, but are now damn glad that we have display terminals, or, even better, large displays with multiple windows. Maybe in the days of hardcopy terminals persistent line numbers were a win (it's been a *long* time since I was stuck on a hardcopy terminal), but they would affect my life very little now.... (I.e., they seem like a solution to something that's not a problem for a lot of us.) >Being able to go back three of four pages and look to see what you did >is a great help. I could edit faster on 300 baud decwriters that I >often can with modern editors. I've not been unlucky to find a modern editor that bad.... >Sure there are always workarounds. However, I've seen EMACS users say >how terrible vi is because it doesn't have this and that and thother. In a lot of cases, that occurs, I suspect, because the EMACS users have gotten used to doing someting in one particular way that requires the stuff EMACS has that "vi" doesn't, and, *for them*, it'd be horrible not to have an editor that let you do things in that way. "vi" users have often gotten used to doing something in a different way, that uses what "vi" has, and I suspect that in a lot of cases neither way is "better" in some objective sense, or the way in which it's "better" is outweighed by the effort it'd take for a person to change their way of editing. >I'm saying UNIX doesn't have this and that and thother and you're >saying "well, there is this great workaround called blah, and besides >I've never needed it." I suspect that you *have* needed keyed record >files and have reimplemented them every time you need them. I suspect you haven't figured out what I was saying. I'm not saying *keyed files* aren't useful - I know that they are, for some purposes - I'm saying that having text files as keyed files with persistent line numbers as the keys would buy me precisely *nothing*. >Termcap (or is it terminfo?) uses file names inside >directories as keys for one line of information. It's "terminfo". >Actually, there are many programs that maintain text files as >keyed files (via DBM), and the passwd file is often one of them. >If it is such a rare problem, why have we seemed to have fixed it? We haven't "fixed" the "problem" to which I was referring; I wasn't referring to the lack of keyed files, I was referring to the lack of text files implemented as indexed files with persistent line numbers as the keys. There are problems for which keyed files provide a winning solution; however, the *particular* problem of not invalidating your line-numbered listings is one that I simply don't have, and one that I suspect a lot of users of UNIX, VMS, DOS, OS/2, etc., etc., etc. don't have, either, because hardcopy terminals aren't as common as they used to be. >I contend that it is because we have finally gotten to where we >have systems large enough that simple bytestream text files are >not good enough. There are some files that aren't best represented solely as bytestream text files; however, making the source files with which I work, or netnews postings, or mail messages, etc. into indexed files with persistent line numbers as the keys would, at least for me, be a complete waste of effort.
guy@auspex.auspex.com (Guy Harris) (01/23/91)
>Every system I've worked on that allowed keyed records has been >easier to program than systems that don't. So lobby for more UNIX systems to come standard with ISAM packages or "(n)dbm" or whatever. So far, the only example you've given with indexed files *that can look just like text files* is the line-number example, and frankly that doesn't look like a win at all to me. Most systems I've seen have different access methods for sequential files and keyed files, and implement text files as sequential files. Making some keyed-file access method something you can count on being present on a UNIX system would make UNIX resemble those other systems; if you want me to believe you need more, you'll need to make a better case than the one made for persistent line numbers.
new@ee.udel.edu (Darren New) (01/23/91)
In article <5427@auspex.auspex.com> guy@auspex.auspex.com (Guy Harris) writes: >Oh, that's *real* swell. Is there no way I can say "dammit, I NEED to >have a line inserted between these two lines, no arguments from you, Mr. >Editor?" Do I have to renumber the file (and invalidate all those >line-numbered listings that CP-V/CP6 people seem to be finding so >useful"? You could just renumber the lines in the immediate area. You're changing that area anyway. In practice, I never needed it, as I never inserted 1000 lines of changes in the same place before making a new listing. >I like editors where inserting >text is simply a matter of pointing and typing; I don't like to have to >think about what I have to do to insert text. But with my proposed file system, we could *both* have our way. For the same reason you probably would not want to do without regular expression searches, I don't want to do without line numbers. "I don't like to think about what I have to do to" find text. >Uh, if I can't insert something between two lines in the fashion you >describe, that would seem to make life harder, not easier.... (And >support for keeping line-numbered listings might make your life easier, >but not mine.) Why would it make your life harder? If it doesn't, why would you deny it of me? It is less a restriction than saying "you can't have file names longer that 255 characters." It comes up so rarely in practice that you just never think about how to get around it in advance. You *can* insert something between those two lines. You just have to renumber some of them. It is less a restriction than under UNIX saying "I can't insert a newline in the middle of a text line without getting two lines." It is about the same level of problem as saying "dammit, why do I need to run ctags every time I insert a new function into a file?" >Arbitrary keys - in which case, how do you tell it where to get the >keys? - or line numbers? It just uses the defaults. Why is this a problem? You are saying "mixing keyed files with unkeyed files is no good, because adding keys to files that don't need keys does not add any information." >In other words, keyed files don't solve the "/etc/passwd" problem >magically, in the sense that either: > > 1) "/etc/passwd" would be editable as a text file, but wouldn't > have keys; > > 2) "/etc/passwd" would have keys, but wouldn't be editable as a > text file; > > 3) "/etc/passwd" would be editable as a text file, would have > keys in some sense, but you'd have to do something special to > generate the keys (the 4.3BSD solution). > >(Note to those who would say that 2), say, is a solution: I didn't say >it wasn't, I just said it wasn't a "magical" solution. I.e., if UNIX >had native keyed files, and allowed you to read them as text files, it >would still have to have chosen some implementation technique for the >password database other than implementing it as a plain text file in >order to give it keys.) Sure. But I contend that that program to do either 2) or 3) with keyed files is easier than the program to do either 2) or 3) without keyed files. I'm not insisting that you give up bytestream files, you know. (If you really want to know, the passwd file was edited by a special editor, so it falls under category 2. However, that editor did quite a few integrity checks that you don't get with the UNIX passwd editor :0) >I'd rather have the error messages show up in another file, or in >another window, perhaps with a way to jump to the point at which the >error occurred (e.g., the compile mode some EMACSes support. And you could do that. It takes a special command to merge in the error messages from a separate file. I do it my way, and you do it your way. What's the problem? >With what file is, say, the resource that specifies menus associated? The menus specific to the application are in the application's code. On the Mac, other menus are required to be read out of other files and manually inserted, due to the way the menus selections are indicated to the application (ordinally rather than by keyword). A better example (because of the way menus are handled on the Mac) is "what file are the font resources associated with?" Fonts that are available to every program are in the "system" file (vmunix equivalent). Fonts that are available only within the specific application are in the application's executable file. Fonts that are available only within a particular document (say, that has been sent from somebody else (who knows you don't have that font) to you) are in the resource fork of that document. The "resource editor" can move resources from applications to system files and so on. When the application looks for a font, the system looks through a stack of files that normally contains document<-application<-system for the first resource matching the request. Other files (for example, a specific font file) can be inserted into this chain. Another example: the window border is drawn by a WIND resource (or something like that). If you want a particular application to have a different-looking window, simply replace the WIND resource in that application file and it will override the one found in the system file. >I suspect that in most cases you wouldn't *want* to access them >randomly; tapes aren't known for their speed of random access.... True. But again, you are making a pointless restriction. Usually, I don't want to access the passwd file sequentially. However, that was the only way to do it for quite some time. By making tapes randomly accessible, you make it possible for programs to work on tapes just like they work on disks. For example, the archiver (ar, tar, whatever) stored files by adding keys or prefixing keys (depending on the original file). Hence, just as tar can read from either a disk file or a tape, the CP-V archiver could read from either a disk or a tape. Primarily this capability was there so that system utilities (dump, quotas, etc) did not have to bypass the filesystem to do their work. Most of your objections seem to be of the form "Well, I don't work that way, and here is how to get around the problem of which you speak." I'm not trying to make you work my way. I'm just pointing out that there are other possibilities. -- Darren -- --- Darren New --- Grad Student --- CIS --- Univ. of Delaware --- ----- Network Protocols, Graphics, Programming Languages, Formal Description Techniques (esp. Estelle), Coffee, Amigas ----- =+=+=+ Let GROPE be an N-tuple where ... +=+=+=
guy@auspex.auspex.com (Guy Harris) (01/24/91)
>Sure. But I contend that that program to do either 2) or 3) with keyed >files is easier than the program to do either 2) or 3) without keyed >files. I'm not insisting that you give up bytestream files, you know. >(If you really want to know, the passwd file was edited by a special >editor, so it falls under category 2. However, that editor did quite >a few integrity checks that you don't get with the UNIX passwd editor :0) Note that I referred to 3) as "the 4.3BSD solution"; it uses the keyed file mechanism native to V7 and 4.xBSD, namely "dbm" or "ndbm". I'm not saying keyed files aren't useful; I'm saying that you can get somewhere between 90 and 99% of the benefit of them without having to have them as deeply "integrated" as you seem to want - i.e., going back to the complaints about Plan 9, you could port the "dbm" or "ndbm" stuff, or some B-tree package, or whatever to Plan 9 and that would answer almost all the complaints about the lack of keyed files in Plan 9.... >The menus specific to the application are in the application's code. That could be done with a separate file - or with *multiple* separate files, one per language, each containing menu information for that language, which seems a bigger win on a multi-user system *OR* on a single-user system getting the application from a file server. You don't need to have the notion of a "resource fork" for this.... >A better example (because of the way menus are handled on the Mac) is >"what file are the font resources associated with?" Fonts that are >available to every program are in the "system" file (vmunix >equivalent). Fonts that are available only within the specific >application are in the application's executable file. Fonts that are >available only within a particular document (say, that has been sent >from somebody else (who knows you don't have that font) to you) are in >the resource fork of that document. The fonts attached to the application's executable could be in separate files, with no loss that I can see. The ability to link a font to a document would be nice for the example you give of giving a document to somebody else, although I'm not convinced that it needs two-fork files or that two-fork files was the best solution. >Another example: the window border is drawn by a WIND resource (or >something like that). If you want a particular application to have a >different-looking window, simply replace the WIND resource in that >application file and it will override the one found in the system >file. See the comment above concerning menus; I'd rather not modify the application file, especially if I'm sharing the file with somebody else. >>I suspect that in most cases you wouldn't *want* to access them >>randomly; tapes aren't known for their speed of random access.... > >True. But again, you are making a pointless restriction. It's pointless only if there's minimal software cost to allowing tapes to be accessed as keyed files; if the keyed-access mechanism has to "know" whether it's accessing a disk file or a tape file and requires a significant amount of device-dependent code to handle the two cases, or if some tape-handling software below the keyed-access mechanism has to do a significant amount of work to hide the difference, the point of the restriction is to avoid doing that development work for something that seems like little benefit. No, the point of the restriction is to simplify the keyed-access-mechanism implementation, or the tape "driver" implementation. Either the keyed access mechanism has to know whether it's dealing with a disk file or a tape file, and use the appropriate lower-level operations for the medium it's using, or the tape "driver" implementation - i.e., whatever the keyed access mechanism sits atop - has to hide that difference. >Most of your objections seem to be of the form "Well, I don't >work that way, and here is how to get around the problem of >which you speak." I'm not trying to make you work my way. >I'm just pointing out that there are other possibilities. And I'm just trying to point out that the existence of the workarounds means that it's not "obvious" that you need the mechanisms you're advocating. I.e., to go back again to the original complaints about Plan 9 that started this thread, it's not clear that Plan 9 should be worrying about the things you said they should, given that there are a finite number of things they can actually do something about and they have to choose which problems they want to solve, and that the problems you cite can be dealt with. Many systems, including UNIX-flavored ones such as UNIX and Plan 9, provide, as a "platform", a "file system" that provides access to named randomly-accessible collections of bytes, and treat text files, keyed files of various sorts, and data structures of sorts often *not* provided by various OS's "access methods" as applications atop that file system. It doesn't bother me that keyed files in those systems aren't at the lowest level of the file system; I see little if any win to that. I haven't seen any good evidence yet that you *can't* build an "object-oriented" facility for implementing objects including but *not* limited to the sequential/keyed/etc. files provided by most "acces methods" atop such a platform, perhaps with some small additions made to the platform.
golding@saturn.ucsc.edu (Richard A. Golding) (01/24/91)
In article <5461@auspex.auspex.com> guy@auspex.auspex.com (Guy Harris) writes: >>The menus specific to the application are in the application's code. > >That could be done with a separate file - or with *multiple* separate >files, one per language, each containing menu information for that >language, which seems a bigger win on a multi-user system *OR* on a >single-user system getting the application from a file server. You >don't need to have the notion of a "resource fork" for this.... It's worth noting that this is exactly the approach taken for applications using the X11 window system. Each application has an applications-default file, essentially providing the functions of a Mac resource fork in the application. Users can have additional resource files, either loaded into the window server or separate. The resource files provide a hierarchical key space, and some X toolkits make use of this for things like menus, text strings, and so forth. The only thing this lacks is document-specific resources, which could still be provided (as a separate file) but which no applications (to my knowledge) use. This approach has a couple important benefits. First, resource files don't require any special mechanism, and don't affect any existing Unix utility unless it needs to be aware of resources. For example, `ls' and `cp' work just the same as they ever have. Making these applications aware of resources is non-trivial, as witnessed by the experience of the OS/2 designers. Second, the technique is portable, and doesn't depend either on the version of Unix nor indeed on Unix itself; it works just as well under VMS. In general, this whole argument seems silly. Nobody in their right mind is going to provide *any* kind of file system in the kernel, if current trends are to be believed. Any operating system worth consideration would be able to use a new "filesystem" module (object, server, ...) which provided complex semantics. And Unix functionality does seem to be the basis people are using to develop new systems. Just where Plan 9 fits into this I wouldn't care to guess; I only have read the UKUUG papers and they left me convinced that the system was interesting, but that it was a research vehicle, and I would need more information to make an evaluation (were I to want to). -richard -- Richard A. Golding golding@cello.hpl.hp.com or UC Santa Cruz CIS Board (grad student) golding@cis.ucsc.edu (best) or golding@slice.ooc.uva.nl Post: Baskin Centre for CE & IS, Appl. Sci. Bldg., UC, Santa Cruz CA 95064
new@ee.udel.edu (Darren New) (01/24/91)
In article <5461@auspex.auspex.com> guy@auspex.auspex.com (Guy Harris) writes: >That could be done with a separate file - or with *multiple* separate > [...] >The fonts attached to the application's executable could be in separate >files, with no loss that I can see. Except that your applications normally contain dozens or hundreds of different resources, and the system file can contain thousands. Each menu item, each menu, each menu bar, each window, each window type, each button text, each button, each dialog window, each error message, each informative bit of text, etc is each a different resource. Therefore, you would probably want to make each application a directory, in which to hold all these little files, many of which take more disk space for the directory entry and the inode than for the data. (Speaking of which, where do you put the resources associated with a directory?) Then, you need something other than cp to copy applications and documents conveniently. exec() needs to be rewritten, as do your shells, to find the executable resources within the directory. It becomes quite easy to accidentally fubar an application by accidentally giving a bad file name somewhere, because all your applications have write-access and delete-access to all the resources of your application always, unless you want to add more protection bits that say which application owns the directory. Sure, you can do it half way by pretending a directory full of files is actually a keyed file or by adding a set of libraries on top of every program or by making some sort of structure inside the file that is problematic for shared updates, but do you really want the parameters for each X Windows call in your application to be put in a separate file? Of course none of this *needs* a more sophisticated access method. But I'll use a system where the things I need to do my job come with the computer, thanks. Again, I'm not arguing that UNIX should have such things added to the kernel, or even that they should be in there in the first place. Only that when starting from scratch, using essentially 20 year old file access methods seems like a bad idea. >See the comment above concerning menus; I'd rather not modify the >application file, especially if I'm sharing the file with somebody else. Normally, one does not modify the application while it's running. Also, the Mac has only one accessor for a file at a time. However, if it is built into the kernel, shared access can be mediated much more tightly and efficiently than if it is handled through libraries. The example of the WIND resource was for somebody adding a new look for windows at compile-time. The same mechanism might have been used to implement "tear-off menus" in hypercard -- the menu handling code was overridden in the application code at compile-time. >Many systems, including UNIX-flavored ones such as UNIX and Plan 9, >provide, as a "platform", a "file system" that provides access to named >randomly-accessible collections of bytes, and treat text files, keyed >files of various sorts, and data structures of sorts often *not* >provided by various OS's "access methods" as applications atop that file >system. It doesn't bother me that keyed files in those systems aren't >at the lowest level of the file system; I see little if any win to that. But Plan-9 goes much further than that, in that all accessible objects are accessed like files are. It just seems very kludgy to me to say that writing to /proc/1234/zelda will cause the zelda action to be performed by that process. It seems to me to be similar to saying "To create a pipe under UNIX, write your process id to the file called /dev/makepipe and then read that file to get back two integers which are the file handles." You can do it, but it's ugly. I'm not saying that putting everything in the file namespace is bad. I'm not saying that having files accessed as you describe above is particularly bad. I'm saying that having things other than files accessed only as you descibe above seems to me to be shoehorning strange things into the wrong paradigm. If writing to /proc/1234/ctl causes the process to stop, why does the process not have to read it's own ctl to get that information? Why is "cc" a program that you invoke, but "kill" a file you write to? Why do you rm a file to destroy it, close a window to destroy it, and write a string to a process file to destroy the process? Why do you creat() a file but fork() a process since you seem to talk to both the same way? Why is /dev/bitblt implemented as a single stream-oriented file when calls to blit normally have five or six structured arguments? Why not have six files, and another for writing requests and another for reading the responses of requests? When checking for errors, does one look at a global errno, or does one read /dev/errno after every call? Before you bother to answer all this, realise first that hyperbole is being used in some of the above examples, and second that I'm not asking about these specifics, but rather about what basis is used to make these decisions. What is the motivation for such choices, and what *should* be the motivation for such choices? To me it looks like a mishmash of contortions caused by the choice of a UNIX-like filesystem interface as the primary IPC&I/O mechanism with special cases provided for things that are not efficiently handled in that way. It just seems to me that a distributed or remote filesystem is a special case of procedure calls, rather than RPC being a special case of file I/O. Shoehorning RPC into a bytestream model seems like it might lead to *more* device dependancy rather than less. I've seen systems (AmigaDOS and now Amoeba) which both handle communication in a message- and object-oriented way, and both seem to have much cleaner construction. It is clear in Amoeba and in AmigaDOS from just a simple description of how the basic I/O operations work how you would go about acessing remote files and why particular choices were made in that mechanism, just as Ada cannot be described in ten pages but Lisp and FORTH can. There is usually a "straightforward" and obvious method for implementing any capability one might imagine. For example, it isn't clear how processes get migrated (especially since the paper doesn't say :-). Would one copy the /proc/123/* files? Is it a special call? If you copy the /proc files, where do you copy them to? Are names added to the name space by writing to a special file? Or by a creat() call? Or both? Are lightweight threads created with a special call, or by writing to one of the /proc files? None of these are obvious when you look at a simple description (on the order of 10 pages or so) of Plan-9. The answers *are* obvious from a ten-page description of Amoeba or AmigaDOS. Another thing that bothers me is the comment that security is not an integral part of Plan-9. With bigger systems networked more widely, I would not want to place my trust in an OS with security as difficult to get right as it is under UNIX. Under Amoeba, the same mechanism that keeps me from writing your files keeps me from killing your processes and keeps me from tapping you ethernet; the mechanism is simple, elegant, robust, and easy to see is correct. >I haven't seen any good evidence yet that you *can't* build an >"object-oriented" facility for implementing objects including but *not* >limited to the sequential/keyed/etc. files provided by most "acces >methods" atop such a platform, perhaps with some small additions made to >the platform. Sure. I can write recursive algorithms in FORTRAN, too. I can write massively parallel, distributed, object-oriented code in C. However, it would be cleaner, easier to understand, easier to get correct, and probably even more efficient to use the right tool for the right job. Using directories as keyed files is something I consider as "the wrong tool." Restricting servers to only respond to a small number of kinds of requests and forcing non-file-like operations to use only file-like operators also seems like "the wrong tool." $$$$ If you want to discuss only one point further, below is the point I would like to discuss: $$$$ As a matter of fact, I can write the UNIX filesystem on top of the FORTH filesystem, wherein individual disk blocks are read and written by integer index (i.e., disks are just arrays of blocks). I have yet to see any good reason why one should have a filesystem that goes just as far as the UNIX filesystem does, and then stops. I've seen several good reasons for (and examples of) why you would want something *less* complex that the UNIX filesystem, and I've seen several good reasons for (and examples of) something *more* complex than the UNIX filesystem, but as far as something *exactly* as complex as the UNIX filesystem, people usually just say "Oh, put it in libraries." Why not put directories, access-control, allocation, and concurrency controls in libraries? -- Darren -- --- Darren New --- Grad Student --- CIS --- Univ. of Delaware --- ----- Network Protocols, Graphics, Programming Languages, Formal Description Techniques (esp. Estelle), Coffee, Amigas ----- =+=+=+ Let GROPE be an N-tuple where ... +=+=+=
new@ee.udel.edu (Darren New) (01/25/91)
(Sorry if you see this twice. Our newsserver crashed in the middle of submitting it and I didn't see it show up at our site.) In article <5461@auspex.auspex.com> guy@auspex.auspex.com (Guy Harris) writes: >Many systems, including UNIX-flavored ones such as UNIX and Plan 9, >provide, as a "platform", a "file system" that provides access to named >randomly-accessible collections of bytes, and treat text files, keyed >files of various sorts, and data structures of sorts often *not* >provided by various OS's "access methods" as applications atop that file >system. It doesn't bother me that keyed files in those systems aren't >at the lowest level of the file system; I see little if any win to that. What I think you are saying here is "why put a complex filesystem in the kernel when all the functionality can be provided in a library?" I would like to address this with an example. Lets look at a filesystem that is only marginally more complex than UNIX's. A file, when created, can have a flag set during creation that says "This is the new kind of file". (Necessary to allow room for the informative overhead.) This new file type is exactly like the old one except that two new calls are added: insert(fd, buf, size) inserts those bytes at the current file cursor, delete(fd, size) deletes that many bytes starting at the current file cursor. For simplicity, lets exclude lseek() on these files for now. (Note: in the following discussion, when I say "library" I mean user-mode application library, dynamic or not. When I say "kernel" or "filesystem" I mean a filesystem that has access to presumedly privledged information in the kernel such as device buffers, disk allocation bitmaps, etc and also has the ability to prevent or otherwise control process preemption and/or resumption. "kernel" does not necessarily mean resident, nor does library mean statically linked.) My proposed organization is to make each file a linked list of blocks. Each block has an additional three words of (non user-data) information in it: the previous block, the next block, and the number of bytes of used user data in this block. Clearly, in the simple cases, this is about as easy to implement in a user library as it is in a kernel. However, a kernel implementation has quite a few advantages: 1) When the need to insert a block arises, a block near (in an access-time sense) its predicessor or successor can be chosen. In a user library, the control of which block will be chosen is not available; even if it was, the interface would need to be complicated because the block could be used between the time it was found and the time it was inserted; hence, some sort of atomic call would be needed. Also, new blocks being inserted would probably be tacked onto the end of the file, where normal UNIX semantics might say "allocate a block `close' to the current last block of the file." 2) When a block is deleted from the middle of the file, the kernel version can return that block to the free space pool immediately. The user library will need to either write all zeros to it and hope that the kernel will notice this, or it will have to maintain a separate list of "free blocks". In either case, the fact that the block is free either has to be recorded somewhere in the file (hence taking up more room for the free list and not really being available for other files anyway) or the library must scan the entire file looking for a block that when read looks like all zeros. It's not even possible to store the free block list inside the free blocks because then they won't all be zeros. 3) The forward and backward links in the blocks as stored by the kernel can be direct diskblock numbers. In a user-library, they must be offsets from the beginning of the file. Hence, with the kernel, there need be no "extention blocks" in the inode and reading an entire file can be done as fast as it takes to physically read the disk. In the user-library implementation, blocks are not necessarily read in the order in which they appear in the file; hence, extension-block caching may not be efficient. Once the files get above a certain (admitedly large) size, you run out of extension blocks, which are not even needed in new-files. 4) The number of kernel/user context switches is much greater when the library is in user mode (requiring several separate calls to each of lseek, read, and write just to insert a block, with probably only a dozen instructions between each pair of calls). This is very costly on machines with large register sets or memory maps or where the cache is flushed on such switches. 5) semi-strawman: robustness is increased because it's generally harder for a user to screw up the kernel than it is for one to screw up a user library. I'm aware of the counter-arguments to this one. 6) semi-strawman #2: The memory and call-overhead penaltys are probably lower if the new file type is used often, as the dynamic library stubs don't have to be linked in, the dynamic library cache does not need to be stored, etc. That is, if it's always going to be around anyway, it's probably better to have it in the kernel than to go through the extra overhead of making it a usually-resident dynamic library that then accesses the kernel. For a slightly more complex specification, let's add multi-user shared access. An assumed requirement is that all calls are atomic. This is easy in the kernel, and difficult in the user library. It is easy in the kernel because locks can be modified without releasing them when information is inserted or deleted before them. (Here, I'm assuming that once you lock some bytes, the contents of those bytes cannot change and an insertion or deletion before those bytes will not "slide" them out from under the lock.) It is difficult in the user library not only because one process cannot modify another's locks, but also *in UNIX* locks lock ranges of bytes in the file. When the user requests to lock bytes 2000 through 2050, the library has no idea which bytes of the file it needs to lock; it must find the correct bytes to lock, which implies reading possibly locked structural information from the file. (Note that under some OSes, e.g. HP MCP and CP-V, locks are more symbolic and are disassociated from the files they are locking. This would make matters easier but not easy.) 1) one possibility is to have each user lock the file, make all changes, and then unlock the file, caching no information between calls. Basically, you have no multi-user shared access. The only overhead saved is the cost of openning and closing the file each time. Otherwise, all information you need has to be copied into and out of the kernel buffers on every call. This is why we invented buffered I/O and setbuf. 2) If you lock only the blocks you are working on, you cannot even cache that block between calls, as somebody may have written to it in the meantime. It also prevents scanning the file to find all-zero blocks. Without a mechanism to asynchronously notify you that a file block you are locking is being requested elsewhere, you cannot afford to hold locks for an arbitrary length of time. 3) Any sort of free-space list you may use will need to be locked on both reads and writes (so a writer does not delete a block that you then continue to read), again causing excessive contention. I would no more want to add this type of file in user-mode than I would want to implement directories in user-mode, if the directories allowed some processes to be in readdir() while others were in creat() or unlink(). I've not seen a single-tasking OS that turned into a multi-tasking OS where such conditions were not buggy for at least two OS releases. (It took AmigaDOS 4 releases to get this right, and the latest tests aren't even in yet.) Look at what the kernel can do, due to the fact that all users go through the *same* kernel to get to the file: 1) Arbitrary portions of the file can be cached in memory. 2) Multiple atomic operations can be carried out concurrently due to the fact that the kernel *knows* which areas of the file are under modification right now. 3) Read-ahead and write-behind are possible because the memory is being shared between processes. 4) Since chances are good that upon reading a mostly-empty block, the previous block will still be in memory, it may be possible to compress two half- empty blocks into a single block with no additional overhead (due to the write-behind). Before you say all this stuff is just for "efficiency" or "programmer's convenience", remember that interrupts, DMA, disk caching, demand-paging, and memory protection are all just "efficiency" and "programmer's convenience", as is the rest of the operating system. Let's add lseek()s to this new file type. First, how would I do it in the kernel? I could maintain an in-core table, created the first time a new-file is lseeked to somewhere other than the first or last byte. The table could give me the offset-in-bytes, the physical block number and the byte-within-block. If I only stored when seeking *from* the middle of the file, seeking back to anyplace I'd taken an ftell() from would be instantaneous; seeking elsewhere (say, ftell()+200) would require some amount of reading, but since it is in the kernel, buffers need not be copied into user space while skipping. Since this cache is in shared memory, somebody else deleting or inserting bytes could update the table offset-in-bytes entries. The cache could grow based on how much kernel memory is available, and shrink when *any* process anywhere needs that memory. The table or something like it could be stored along with the file (in a separate block, say) while the file is closed, for efficiency. When disk space is low, these extra blocks (which are only there for efficiency, after all) could be reaped out of the files on demand, possibly even based on LRU times. Most disks around here are >97% full most of the time, since most people only delete files when they run out of room; I won't accept the "so get a bigger disk" argument :-). Also, the number of blocks allocated to the file could be tracked, and when the density fell below a certain level, the file could automatically be compressed upon the last close, which the kernel is tracking anyway; you can do this in the user library, but then you need all sorts of extra code to keep track of file opens and closes and when the file is being compressed in addition to the compression code. Also, as above, freeing blocks as they are compressed is problematic. A user-mode deamon could copy the file to a new, compressed file and then delete the old version (as long as nobody else has it open), but then you need room for two copies of the file that you are compressing and the deamon has to be triggered and *find* the right files to compress; the deamon also has to be a trusted program. How could something like this be done in the user library? Well, we could just read the whole file every time we did a seek. Except that now we again run into any locks that anybody else has and we have to lock all that part of the file while we're reading. We also have several dozen kernel/user context switches and lots of copying between kernel memory and user memory. We could store the cache in shared memory, except that we don't have shared memory in BSD. Last time I looked System V has a single pool of non-pagable shared memory (which is in the kernel anyway), you can't find out when other processes need some and can't get it, and the name-space connection between that shared memory buffer and the file it is associated with is problematic. You also need to make up semaphores (also in the kernel memory and limited) to mediate access to the shared memory, also causing context switching and name resolution complexities. In addition, if the dynamic libraries are usually open anyway, all that code is in memory as well, so you are not saving memory by moving it to the user library. (Besides, you could page or overlay the parts of the kernel dealing with file structures, so if nobody is using the complex structures, you wouldn't waste resident real memory.) Another choice may be to implement the whole thing with caches in the shared memory, using mmap() or something similar. Note that if you did things this way, you would essentially be rewriting huge portions of the UNIX filesystem just to add this one tiny feature. You would be reimplementing disk-block allocation and deallocation, block buffering, contention resolution, file descriptor consistancy maintainance, and so on; you are essentially treating a UNIX file as a block-structured device and writing a new filesystem as a user library. Anyway, I think that makes some good points, and I would be interested in any discussion this may raise. I would be especially interested in hearing why putting such functionality is actually *better* than putting it in the kernel, rather than simply possible, which is all I've heard up to now. Another point: given that we have insert() and delete() added to the kernel, it becomes much simpler to add more complex structures as user-mode libraries if desired. For example, byte-counted records are simple. The file format is a bytecount, that many bytes of data, and another record. Lengthening and shortening records is no longer traumatic. KSAM files become much simpler, because the block split operation during addition of a new key is just the insert() operator with the right parameters. Libraries like dbm no longer have to return keys in an "arbitrary" order because they can put the keys in the right place to start with. Things like the Macintosh "resources" can be added to files efficiently. (BTW, Mac resources or keyed files are much like lightweight directories or lightweight dynamically-linked libraries. You can do it with the heavyweight versions, but it's much less efficient and uglier.) Insertion of key blocks into keyed files can insert them in an efficient-to-retreive location. Editors no longer have to suck up the entire file and blow it back out to add one character to the first line. If directory structures were based on new-file structures as opposed to the current UNIX-file structures, they could be maintained in sorted order, would automatically shrink when many files were deleted, and would have no wasted space in them. Hence, I claim that the UNIX filesystem is only marginally too weak to support complex file structures conveniently. -- --- Darren New --- Grad Student --- CIS --- Univ. of Delaware --- ----- Network Protocols, Graphics, Programming Languages, Formal Description Techniques (esp. Estelle), Coffee, Amigas ----- =+=+=+ Let GROPE be an N-tuple where ... +=+=+=
guy@auspex.auspex.com (Guy Harris) (01/25/91)
>In general, this whole argument seems silly. Nobody in their right >mind is going to provide *any* kind of file system in the kernel, if >current trends are to be believed. Any operating system worth >consideration would be able to use a new "filesystem" module (object, >server, ...) which provided complex semantics. And Unix functionality >does seem to be the basis people are using to develop new systems. I think much of the problem here is that people are getting hung up by the fact that there are two levels at which UNIX - and some other systems - provide "file system" functionality. The levels are the "container" level and the "structure" level. The level that is typically implemented by the kernel on UNIX systems is a "container". UNIX files, at that level, are randomly-accessible arrays of bytes, with no structure. The structure is - with the exception of directories - provided by stuff running atop that level, usually in user mode. Examples: The standard I/O library knows, in some places, about "text files", which consist of sequences of bytes with newline characters ending "lines" - and, at least for "(f)gets()" and "(f)puts", containing no '\0' characters. Some utilities that don't use standard I/O, or that use interfaces such as "getc()" and "putc()", know about "text files" as well. The "(n)dbm" library knows about keyed files, as do various ISAM libraries supplied by the vendor or by third parties. Assemblers and linkers, various libraries and library routines, run-time linkers in e.g. SunOS 4.x and S5R4 (and possibly other systems), and the "exec()" code often done in the kernel, know about object or executable image files. This general model isn't unique to UNIX. For example: RSX-11 implements the "container" level in the Files-11 ACP, which is a somewhat privileged process running in user mode, as I remember, to which messages are sent when certain QIO functions are performed; it's basically an array of 512-byte blocks, rather than bytes, but it's still unstructured. The structure is provided by libraries running in user mode which perform those QIO functions to talk to the Files-11 ACP. F11ACP does provide some per-file attributes, I think, that can be fetched and set via QIOs, but that it doesn't itself interpret; those are used by the user-mode libraries. VMS is, I think, similar (although I think the ACP was replaced by the "extended QIO processor", which isn't a process receiving messages but is directly called, for performance reasons), although RMS runs, as I remember, in executive rather than user mode. Dunno if user-mode programs can, if they choose, get direct access to the "container" via QIOs or not. I think MS-DOS provides a generally-accessible "container" level as well, with structuring done above that level and suspect OS/2 does the same. Amoeba's Bullet file server, from what I can tell, also implements containers without imposing or providing any structure on the contents. Multics also provides containers, i.e. segments, at the lowest level. A lot of the complaints about UNIX seem ultimately either to be that it doesn't provide enough higher-level mechanisms that implement structured objects, or that the lowest level is too accessible. The former complaint is, I think, a fair one in many cases - but that doesn't say that providing unstructured containers is intrinsically a Bad Idea. The latter complaint strikes me as similar to the complaints that the instruction sets of modern machines are too low-level. Many complaints about a "semantic gap" seemed to ultimately miss the fact that there is an *intrinsic* semantic gap between doped silicon and high-level languages, and that the debate was really over the extent to which you filled in that gap with microcode and hardware as opposed to compilers and run-time libraries. There is room for a debate as to which parts should be filled in below the instruction set and which parts should be filled in above the instruction set, but that issue seems generally best resolved by looking at the performance, security, transparency, and development cost implications of various choices, not by looking at it as a philosophical issue. Similarly, there's a "semantic gap" between magnetized domains in iron oxide (or whatever) and data structures, but the layers at which that semantic gap should be filled in seem generally best resolved by looking at performance, security, transparency, and development cost issues, not philosophical issues. (E.g., by putting stuff at lower levels you can make changes to the implementation transparent to stuff at higher levels, and make it more secure if stuff at lower levels is "protected" while stuff at higher levels isn't. However: that doesn't necessarily reduce development costs - *somebody* has to fill in various parts of the aforementioned semantic gap in any case, and the question is which choice of where to fill it in makes it easier or means you have to do it fewer time; it doesn't necessarily increase performance.) To go back to the question you brought up earlier, the new "filesystem" module could provide those complex semantics atop a raw disk or partition, or could provide them atop a "container" file provided by some lower-level "filesystem" module. I've not seen any indication that there's some obvious universally "right" choice - nor have I seen any indication that the latter choice is obviously "wrong", as some folks flaming UNIX or, say, Plan 9 seem to think.
guy@auspex.auspex.com (Guy Harris) (01/25/91)
>Except that your applications normally contain dozens or hundreds of >different resources, and the system file can contain thousands. Each >menu item, each menu, each menu bar, each window, each window type, >each button text, each button, each dialog window, each error message, >each informative bit of text, etc is each a different resource. >Therefore, you would probably want to make each application a >directory, in which to hold all these little files, What "little files"? Did I say that each one of those resources would go in a file of its own? No, I didn't. >Of course none of this *needs* a more sophisticated access method. >But I'll use a system where the things I need to do my job come >with the computer, thanks. Again, I'm not arguing that UNIX >should have such things added to the kernel, or even that they >should be in there in the first place. Only that when starting >from scratch, using essentially 20 year old file access methods >seems like a bad idea. To what "essentially 20 year old file access method" in particular are you referring here? >Normally, one does not modify the application while it's running. >Also, the Mac has only one accessor for a file at a time. However, if >it is built into the kernel, shared access can be mediated much more >tightly and efficiently than if it is handled through libraries. The >example of the WIND resource was for somebody adding a new look for >windows at compile-time. The same mechanism might have been used to >implement "tear-off menus" in hypercard -- the menu handling code was >overridden in the application code at compile-time. > >>Many systems, including UNIX-flavored ones such as UNIX and Plan 9, >>provide, as a "platform", a "file system" that provides access to named >>randomly-accessible collections of bytes, and treat text files, keyed >>files of various sorts, and data structures of sorts often *not* >>provided by various OS's "access methods" as applications atop that file >>system. It doesn't bother me that keyed files in those systems aren't >>at the lowest level of the file system; I see little if any win to that. > >But Plan-9 goes much further than that, in that all accessible objects >are accessed like files are. That wasn't your entire original complaint against Plan 9: From the 15 pages I've read about it, I see Plan-9 doing all the same things wrong that UNIX did. They seem to have only files which are byte arrays, in spite of the fact that 99.44% of programs I've seen want records, and most want keyed records. That part of your complaint was the main part I objected to, and I think I've successfully demonstrated that "most want keyed records" is correct only if you interpret it as "most folks who like CP-V's keyed text files want keyed records"; unfortunately, the wins of CP-V keyed text files seem to occur only in fairly specialized cases, like folks who carry lots of line-numbered listings around.... (And if you're going to thump UNIX there, you might want to thump Amoeba, while you're at it; the Bullet file server is actively *hostile* to update-in-place keyed files of the CP-V sort that you were boosting in earlier articles, since it doesn't let you write to files - it just let you replace files with files. I think that's an *extremely* interesting idea, but it's going further away than UNIX from some of the stuff you're advocating. In fact, the way they talk about handing database files in "The Design of a High-Performance File Server" resembles, in some ways, the "throw each entry into a file of its own" scheme you thump below - "Data bases can be subdivided over may subdivided over many smaller Bullet files, for example based on the identifying keys.") The later part of the complaint: UNIX then seems to attempt to stuff every object that *isn't* a file into this same mode, and poorly at that. Plan-9 seems to be going the same way. I have fewer problems with. Another poster took that notion to what I considered a *far* too extreme notion, namely overloading UNIX-style permission bits to indicate which signals the process is catching, and I objected to that in my reply. I completely agree that blindly placing all objects onto the Procrustean bed of text file access is a mistake. I consider using "rm" to kill processes - or, more particularly, to *send signals to* processes, as "kill" is somewhat of a misnomer in UNIX - somewhat extreme; I'd consider it equally extreme to use "mv" to do process migration. >What is the motivation for such choices, and what *should* be the >motivation for such choices? The criterion I'd use is "if making some particular operations doable through the file system makes it possible to use existing UNIX tools to perform useful functions built on those operations, *and* won't give the developers an excuse to avoid making common operations convenient just by saying 'well, if you don't like it, wrap some shell script around it'", I'd say provide a file system mechanism to do the operation. >>I haven't seen any good evidence yet that you *can't* build an >>"object-oriented" facility for implementing objects including but *not* >>limited to the sequential/keyed/etc. files provided by most "access >>methods" atop such a platform, perhaps with some small additions made to >>the platform. > >Sure. I can write recursive algorithms in FORTRAN, too. I can write >massively parallel, distributed, object-oriented code in C. However, >it would be cleaner, easier to understand, easier to get correct, and >probably even more efficient to use the right tool for the right job. See my followup to another article, in which I discuss "containers". Ultimately, all that code is going to end up as 1's and 0's (barring, say, a base-3 machine); I've not been convinced that the 1's and 0's of a typical "conventional" computer are necessarily any worse for that purpose than the 1's and 0's of some of the "high-level-language-oriented" computers that have been proposed in the past, just as I've not been convinced that the 1's and 0's in "containers" like UNIX or Bullet files are any worse than the 1's and 0's in the disk blocks maintained by OS's that support "keyed files" or whatever at a lower level. >Using directories as keyed files is something I consider as "the wrong >tool." Well, if you consider putting parts of keyed databases into separate files based on the key values to be part of the reason it's using "the wrong tool", you'd better let the folks doing Amoeba know you think they're making a mistake.... :-) >Restricting servers to only respond to a small number of >kinds of requests and forcing non-file-like operations to use only >file-like operators also seems like "the wrong tool." *Forcing*, yes. *Allowing*, no. >$$$$ If you want to discuss only one point further, below is > the point I would like to discuss: $$$$ > >As a matter of fact, I can write the UNIX filesystem on top of the >FORTH filesystem, wherein individual disk blocks are read and written >by integer index (i.e., disks are just arrays of blocks). I have yet >to see any good reason why one should have a filesystem that goes just >as far as the UNIX filesystem does, and then stops. I've seen several >good reasons for (and examples of) why you would want something *less* >complex that the UNIX filesystem, What sort of examples are you talking about? Are you talking about something less complex than a bucket of bytes? >and I've seen several good reasons for (and examples of) something >*more* complex than the UNIX filesystem, What sort of exmaples, again? >but as far as something *exactly* as complex as the UNIX >filesystem, people usually just say "Oh, put it in libraries." Why not >put directories, access-control, allocation, and concurrency controls >in libraries? Directories? Sure, why not.
cur022%cluster@ukc.ac.uk (Bob Eager) (01/25/91)
In article <5509@auspex.auspex.com>, guy@auspex.auspex.com (Guy Harris) writes: > VMS is, I think, similar (although I think the ACP was replaced by the > "extended QIO processor", which isn't a process receiving messages but > is directly called, for performance reasons), although RMS runs, as I > remember, in executive rather than user mode. Dunno if user-mode > programs can, if they choose, get direct access to the "container" via > QIOs or not. Yes, they can. The container layer (RMS) allows an essentially transparent mode of access, and for higher efficiency one can call the disk driver using a QIO to obtain direct access to the file at the block level. -------------------------+------------------------------------------------- Bob Eager | University of Kent at Canterbury | +44 227 764000 ext 7589 -------------------------+-------------------------------------------------
new@ee.udel.edu (Darren New) (01/26/91)
In article <5510@auspex.auspex.com> guy@auspex.auspex.com (Guy Harris) writes: >That part of your complaint was the main part I objected to, and I think >I've successfully demonstrated that "most want keyed records" is correct >only if you interpret it as "most folks who like CP-V's keyed text files >want keyed records"; unfortunately, the wins of CP-V keyed text files >seem to occur only in fairly specialized cases, like folks who carry >lots of line-numbered listings around.... OK. So I exagerated a little. The keyed text files were not the only reason for having keyed files, tho. So far, I have *still* not seen a good reason for putting it in the user-level libraries instead of the kernel. All you have said (as far as I remember) is that "Many people don't need it." >(And if you're going to thump UNIX there, you might want to thump >Amoeba, while you're at it; the Bullet file server is actively *hostile* >to update-in-place keyed files of the CP-V sort that you were boosting >in earlier articles, since it doesn't let you write to files - it just >let you replace files with files. Ah, but in Amoeba, the Bullet server is not the only file server. There is *also* a logfile server, a UNIX file server, and a database file server. This kind of flexibility is what I think is lacking in Plan-9's filesystem. >"Data bases can >be subdivided over may subdivided over many smaller Bullet files, for >example based on the identifying keys.") I agree that this is using the wrong tool. However, Plan-9 does not let you do your own "right" tool. >The criterion I'd use is "if making some particular operations doable >through the file system makes it possible to use existing UNIX tools to >perform useful functions built on those operations, *and* won't give the >developers an excuse to avoid making common operations convenient just >by saying 'well, if you don't like it, wrap some shell script around >it'", I'd say provide a file system mechanism to do the operation. But my objection is how the "filesystem" keeps you from making available all reasonable operations because it insists on a bytestream model. Note that processes, files, memory segments, etc are *all* in the "filesystem" of the Amoeba system, in the messages-to-capabilities sense. >>Sure. I can write recursive algorithms in FORTRAN, too. I can write >>massively parallel, distributed, object-oriented code in C. However, >>it would be cleaner, easier to understand, easier to get correct, and >>probably even more efficient to use the right tool for the right job. > >See my followup to another article, in which I discuss "containers". See my followup in which I discuss why I want this stuff in the kernel. >>Using directories as keyed files is something I consider as "the wrong >>tool." > >Well, if you consider putting parts of keyed databases into separate >files based on the key values to be part of the reason it's using "the >wrong tool", you'd better let the folks doing Amoeba know you think >they're making a mistake.... :-) They are, if that is how they implement their database. Actually, maybe not, because the semantics of the Bullet server don't require lots of overhead. You could look at each bullet file as a record or as a whole file, as you see fit, because you don't need to name them, you can put direct references into the other files, and so on. But I would still prefer a database server. >>Restricting servers to only respond to a small number of >>kinds of requests and forcing non-file-like operations to use only >>file-like operators also seems like "the wrong tool." > >*Forcing*, yes. *Allowing*, no. That was my original intent in my complaints about Plan-9. It just came out wrong. >What sort of examples are you talking about? Are you talking about >something less complex than a bucket of bytes? Yes. The Amoeba bullet file server is less complex than the UNIX system, with the benefit of extreamly good performance. The disk-block server allows sophisticated file structures to be built in user libraries with minimal fuss. >>and I've seen several good reasons for (and examples of) something >>*more* complex than the UNIX filesystem, > >What sort of exmaples, again? Database backend filesystems. CP-V filesystems. PICK-OS file systems. Amoeba database filesystems. In all these cases, programming was easier and more robust than trying to do the same thing under UNIX because the kernel had more info and more control over the filesystem. >>but as far as something *exactly* as complex as the UNIX >>filesystem, people usually just say "Oh, put it in libraries." Why not >>put directories, access-control, allocation, and concurrency controls >>in libraries? > >Directories? Sure, why not. That doesn't answer the question of why UNIX did *not* put it in libraries. Amoeba did, and there are clear benefits to doing so. Why not put TCP/IP in user libraries and just give people write-permission on the ethernet hardware? Why did UNIX make those choices, and why do so many people defend those choices without actually stating any benefits of having made those choices and not others? -- Darren -- --- Darren New --- Grad Student --- CIS --- Univ. of Delaware --- ----- Network Protocols, Graphics, Programming Languages, Formal Description Techniques (esp. Estelle), Coffee, Amigas ----- =+=+=+ Let GROPE be an N-tuple where ... +=+=+=
new@ee.udel.edu (Darren New) (01/26/91)
In article <5510@auspex.auspex.com> guy@auspex.auspex.com (Guy Harris) writes: >What "little files"? Did I say that each one of those resources would >go in a file of its own? No, I didn't. Again, I was analogising. If you don't put them all in little files, then you need special code in each application to pull them apart. If every application and the kernel uses this code, then you might as well put it in the kernel. Every time I say "it's easier and more efficient to put it in the kernel" you say "but it's just as easy to put it in a library". When I say "it's very inefficient to put it into a library as easy to use as the kernel" you say "who said I was going to do it the easy way." I'm not arguing that you *can't* put it in the user lib, only that it is an order of magnitude harder and slower to put it in the user lib. You have not given me any reason so far to doubt that. Try thinking this way: It's already in the kernel. Why would you take it out and put it into a user lib? Then maybe this kind of response would help progress the discussion. :-) -- Darren -- --- Darren New --- Grad Student --- CIS --- Univ. of Delaware --- ----- Network Protocols, Graphics, Programming Languages, Formal Description Techniques (esp. Estelle), Coffee, Amigas ----- =+=+=+ Let GROPE be an N-tuple where ... +=+=+=
terryl@sail.LABS.TEK.COM (01/26/91)
In article <42793@nigel.ee.udel.edu> new@ee.udel.edu (Darren New) writes: +In article <5510@auspex.auspex.com> guy@auspex.auspex.com (Guy Harris) writes: +>What "little files"? Did I say that each one of those resources would +>go in a file of its own? No, I didn't. + +Again, I was analogising. If you don't put them all in little files, +then you need special code in each application to pull them apart. If +every application and the kernel uses this code, then you might as well +put it in the kernel. + +Every time I say "it's easier and more efficient to put it in the kernel" +you say "but it's just as easy to put it in a library". When I say "it's +very inefficient to put it into a library as easy to use as the kernel" +you say "who said I was going to do it the easy way." I'm not +arguing that you *can't* put it in the user lib, only that it is an +order of magnitude harder and slower to put it in the user lib. You have +not given me any reason so far to doubt that. Try thinking this way: +It's already in the kernel. Why would you take it out and put +it into a user lib? Then maybe this kind of response would help +progress the discussion. :-) -- Darren I don't know; why don't you go ask the folks at CMU who are doing MACH (and also ask the folks at OSF) why they're taking out the kernel proper what is basically the whole Unix paradigm, and putting it into a user level server process??? Note that this is NOT a rhetorical question; I have some answers myself.... __________________________________________________________ Terry Laskodi "There's a permanent crease of in your right and wrong." Tektronix Sly and the Family Stone, "Stand!" __________________________________________________________
fmayhar@hermes.ladc.bull.com (Frank Mayhar) (01/28/91)
I'm coming into this kinda late. Excuse me, I've been out of town. But I thought I'd throw some more mud on the fire. :-) In article <5510@auspex.auspex.com>, guy@auspex.auspex.com (Guy Harris) writes: > [a whole buncha stuff, most of which I agree with] |> [Darren New writes:] |> >Using directories as keyed files is something I consider as "the wrong |> >tool." |> |> Well, if you consider putting parts of keyed databases into separate |> files based on the key values to be part of the reason it's using "the |> wrong tool", you'd better let the folks doing Amoeba know you think |> they're making a mistake.... :-) OK, I will. (I'm one of the CP6 people you've (lightly) bashed in earlier postings. I'm the guy, in fact, that maintains the File Management part of the OS, including the ISAM-like and relational-database-like parts.) Some of my customers have databases consisting of *millions* of records, with the same number of unique keys, and at a minimum, hundreds of thousands of records with the same keys. How would you like to have your bread-and-butter database spread across two million different files on Amoeba? Talk about a maintenance headache! And for large applications, that's a fairly representative number. Plus, it seems on the face of it (not knowing the internals of Amoeba) that lookups would take *forever* under such a scheme. It seems to me that combining all those different records into one file, even if it's distributed across different devices, would make life much more simple for everyone, and that it would significantly enhance database performance. -- Frank Mayhar fmayhar@hermes.ladc.bull.com (..!{uunet,hacgate}!ladcgw!fmayhar) Bull HN Information Systems Inc. Los Angeles Development Center 5250 W. Century Blvd., LA, CA 90045 Phone: (213) 216-6241
guy@auspex.auspex.com (Guy Harris) (02/02/91)
>Hmm. McKusick and Karels (and a guy named Chris Landaur (sp?)) disagree with >you. According to them, directories originally were basically plaintext >files, and handled no differently than other files. This caused problems, >though, and was changed. All this was very early on in Unix evolution. If by "McKusick and Karels" you're referring to the book by Leffler, McKusick, Karels, and Quarterman, *The Design and Implementation of the 4.3BSD UNIX(R) Operating System", well, I checked all the references to "directory" from the index, and NONE of them claim that "directories originally were plaintext files". They may have said elsewhere that *VERY* early on - as in "prior to V6, and perhaps prior to V5" - they were plaintext files, but they most definitely positively absolutely were *NOT* plain-text files in V6 or later UNIX systems from AT&T.
guy@auspex.auspex.com (Guy Harris) (02/02/91)
>(Note: in the following discussion, when I say "library" I mean >user-mode application library, dynamic or not. When I say "kernel" or >"filesystem" I mean a filesystem that has access to presumedly >privledged information in the kernel such as device buffers, disk >allocation bitmaps, etc and also has the ability to prevent or >otherwise control process preemption and/or resumption. "kernel" does >not necessarily mean resident, nor does library mean statically >linked.) ... >Anyway, I think that makes some good points, and I would be interested >in any discussion this may raise. I would be especially interested in >hearing why putting such functionality is actually *better* than >putting it in the kernel, rather than simply possible, which is all >I've heard up to now. If you think you've heard from me any claim that a *container*-level object of the type you describe is better implemented outside a privileged subsystem than inside one, get your hearing checked. I've already drawn the distinction between a "container" and an object with more structure; for instance, the arrays of bytes or blocks implemented by many OSes (UNIX, RSX-11, VMS and, I think, OS/360 and successors) in the privileged-subsystem portion of their file systems are "containers", and text files, sequential files, indexed files, etc. are the more structured objects stuffed into those containers. It's somewhat similar to the structured address spaces provided by the privileged-subsystem portion of some OSes (e.g., more modern versions of UNIX) and various stuff that can be built atop it, often in userland; for example, many UNIX systems let you map files into memory with "mmap()", and then that can be used by such as shared library/dynamic linking mechanisms, multithreading packages (for multiple stacks), etc.. Given that your new kind of container probably wants to be able to allocate blocks for the file from a per-file-system pool which would be shared by other processes (and possibly other users), I have no problem with implement low-level primitives such as your "insert()" and "delete()" primitives inside a privileged subsystem. However, I don't see any point to putting stuff you build *on top of* that container, e.g. KSAM files, into the kernel, unless they need access to some shared resource that should be denied non-privileged code. >Editors no longer have to suck up the entire file and blow it back >out to add one character to the first line. Of course, for a lot of files, that's no real win; lots of text files are quite small.
new@ee.udel.edu (Darren New) (02/06/91)
In article <5685@auspex.auspex.com> guy@auspex.auspex.com (Guy Harris) writes: >If you think you've heard from me any claim that a *container*-level >object of the type you describe is better implemented outside a >privileged subsystem than inside one, get your hearing checked. Or my eyes. :-) >I've already drawn the distinction between a "container" and an object >with more structure; for instance, the arrays of bytes or blocks >implemented by many OSes (UNIX, RSX-11, VMS and, I think, OS/360 and >successors) in the privileged-subsystem portion of their file systems >are "containers", and text files, sequential files, indexed files, etc. >are the more structured objects stuffed into those containers. OK. But at what level do you put the "container" object? Disk blocks? Arrays of bytes? Arrays of resizable records? Entire database relations? This is my quandry. Why are "arrays of bytes" OK to go in the kernel, but "arrays of resizable records" not? >Given that your new kind of container probably wants to be able to allocate >blocks for the file from a per-file-system pool which would be shared by >other processes (and possibly other users), I have no problem with >implement low-level primitives such as your "insert()" and "delete()" >primitives inside a privileged subsystem. > >However, I don't see any point to putting stuff you build *on top of* >that container, e.g. KSAM files, into the kernel, unless they need >access to some shared resource that should be denied non-privileged >code. Because the KSAM code wants to be able to allocate blocks for the file from a per-file-system pool also. It also wants access to the block buffers that are in kernel memory. It also wants to be able to block user processes to keep them from both updating these files at the same time. All the arguments about "insert()" and "delete()" apply equally well to KSAM files. You can build these in "userland" but then you have all the same problems that the "insert()" and "delete()" additions caused. With a simpler "container" basic access method (say, allocate, deallocate, read, and write individual disk blocks, as well as locking of arbitrary conditions (file=xxx, key=yyy or zzz) rather than byte offsets within files) it could be efficient to implement all this stuff at the user level. At a higher level, the need to go "under" would be less common and the system would overall be more efficient. But right now the filesystem in UNIX is efficient for neither the hardware nor the software. -- Darren -- --- Darren New --- Grad Student --- CIS --- Univ. of Delaware --- ----- Network Protocols, Graphics, Programming Languages, Formal Description Techniques (esp. Estelle), Coffee, Amigas ----- =+=+=+ Let GROPE be an N-tuple where ... +=+=+=