rroba.DlosLV@xerox.com (06/06/89)
> Three things that should not be in an efficient OS: > 1) virtual memory > 2) symbolic links > 3) long file names (BSD directories) Perhaps some explanation is in order: We have performance problems with SunOS that we don't have with Xenix on similar hardware. The reason for the difference in performance that we see between Xenix and SunOS is the presence of these three features in SunOS. The code in the Kernel which supports these features eats up memory and cpu time whether the user wants to use them or not. The typical size of Xenix on an 80386 architecture is 300K, SunOS for the 386i is about 900K (as distributed; some features can be deleted through reconfiguration). The historical philosophy of the UNIX community (AT&T, at least) has been Keep It Simple. The recent proclivity toward rampant featurism (BSD crowd in particular) has resulted in a corresponding decrease in system throughput. >VM and Symbolic links are nice features. I'd say put them in. Let the user >determine whether to use them or not. the fact that they are there >doesn't hurt an OS. But, let the user, not the author, be the deciding factor. VM kills performance whether the user chooses to use it or not. The typical placement of the swap device on single disk systems (as most UNIX systems seem to be now) is as a partition placed between a read-only root file system and a read-write user file system (this is done to "optimize" the disk's swap activity). The result is that the disk heads are constantly seeking between the outermost tracks (the root file system), and the innermost tracks (the user file system). Paging code in the Kernel does not come free. It requires space, and execution time. Virtual memory was justifiable when memory was expensive (i.e. $30,000 for an 8K core bomb). Memory now is just too cheap to pay for the execution penalty. The real problem with symbolic links is that users are not given a choice whether to use them or not. If the OS distributor chooses to use them, and then rewrites utilities to optimize "real" paths, it is next to impossible for the user to remove them. (I am assuming that everybody in the audience understands the impact of a symbolic link on the amount of time required to open a file.)
guy@auspex.auspex.com (Guy Harris) (06/08/89)
>We have performance problems with SunOS that we don't have with Xenix >on similar hardware. The reason for the difference in performance that >we see between Xenix and SunOS is the presence of these three features >in SunOS. Evidence, please, for the conclusion that those particular three features account for all, or even most, of the difference in performance you see? >The code in the Kernel which supports these features eats up memory and >cpu time whether the user wants to use them or not. Are those the *only* features present in SunOS but not in your Xenix? Does your Xenix have TCP/IP support, or NFS, for example? (For that matter, are you certain your Xenix lacks VM?)
jmagee@fenix.UUCP (Jim Magee) (06/09/89)
> Three things that should not be in an efficient OS: > 1) virtual memory > 2) symbolic links > 3) long file names (BSD directories) Well, I don't quite agree. >Perhaps some explanation is in order: >We have performance problems with SunOS that we don't have with Xenix >on similar hardware. The reason for the difference in performance that >we see between Xenix and SunOS is the presence of these three features >in SunOS. The code in the Kernel which supports these features eats >up memory and cpu time whether the user wants to use them or not. >The typical size of Xenix on an 80386 architecture is 300K, SunOS for the >386i is about 900K (as distributed; some features can be deleted through >reconfiguration). The historical philosophy of the UNIX community (AT&T, at least) has been Keep It Simple. The recent proclivity toward rampant featurism (BSD crowd in particular) has resulted in a corresponding decrease in system throughput. Well, the kernel code that supports virtual memory does not eat up that much kernel space. Now all the networking code, tty drivers, etc... in there are a totally different story, and look to de-kernelized OSs like Mach (and hopefully GNU, hint, hint...) to take care of this problem. >>VM and Symbolic links are nice features. I'd say put them in. Let the >>user determine whether to use them or not. the fact that they are there >>doesn't hurt an OS. But, let the user, not the author, be the deciding >>factor. > >VM kills performance whether the user chooses to use it or not. The >typical >placement of the swap device on single disk systems (as most UNIX systems >seem to be now) is as a partition placed between a read-only root file >system and a read-write user file system (this is done to "optimize" the >disk's swap activity). The result is that the disk heads are constantly >seeking >between the outermost tracks (the root file system), and the innermost >tracks (the user file system). Paging code in the Kernel does not come >free. >It requires space, and execution time. Virtual memory was justifiable when >memory was expensive (i.e. $30,000 for an 8K core bomb). Memory now >is just too cheap to pay for the execution penalty. Well I have to totally disagree here. If you don't want the swap partition in between / and /usr, then buy another disk and put /usr on that. This gives you interleaved disks and another drive is a hell of a lot cheaper than having to stick the maximum amount of memory on a system that you are ever going to need. I have worked on real memory systems and seeing: Please wait: waiting for memory to be freed.... When you try ot run ls, is not exactly fun. (especially when whatever has that memory occupied never frees it, how do spell relief? R-E-B-O-O-T) Plus virtual memory can actually save performace, because it pages code in as well as out, if you don't need it, it won't be brought in) Try running emacs on a non-VM system. If you don't want a certain application to be swapped, then have process/page lockdown system calls (along with nice features like real-time scheduling etc, are you listening RMS? ;-)). Please don't ever take me VM away. If you don't want it, use DOS. -- Jim Magee - Unix Development | Encore Computer Corp jmagee@gould.com | 6901 W Sunrise Blvd MS407 ...!uunet!gould!jmagee | Ft Lauderdale, FL 33313 "I speak for nobody..." | (305) 587-2900 x4925
rroba.DlosLV@xerox.com (06/10/89)
In response to an earlier posting, in which I said: > Three things that should not be in an efficient OS: > 1) virtual memory > 2) symbolic links > 3) long file names (BSD directories) Bob Cherry says, "This list is extremely application dependant. " I agree. My purpose in posting these comments is to point out that what is appropriate for some applications is poison to others. Bob goes on to say, "Virtual Memory: VMem is quite useful especially in CAD tools and high volume/resolution graphics. On multi-user systems it becomes unrealistic to keep gigabytes of RAM around in order to perform high volume graphics. " Again, I agree. My own background has been principally in embedded systems, in which 1) data throughput is the measure of success, and 2) system RAM requirements are pre-determinable. In this environment, VM is not needed and imposes an execution cost that I would rather not pay. Then he says, "Eliminating VMem eliminates the ability to operate a wide range of applications and/or programming environments." This is a good point. But, in the environments that I work in, the applications that will be run on a particular system are known and fixed; so that this is not a concern. Throughput, however, is still a concern. "Symbolic Links: These links are extremely useful when a third party application requires that it be run from a specific directory. If the particular directory is in its own disk partition and if that partition does not have adequate free space to install or operate, a link may be used to map the actual directory to the desired directory." This is a description of a situation (partitioned drive) that was created by either the user or the (auto-installation program of the) OS distributor (i.e. Sun Microsystems). On multi-user systems, partitioning a drive facilitates the creation of secure backups from online file systems (because the partitions can be individually umounted and dumped). Partitioning a drive, however, always has a negative impact on file system throughput (through increased seek time), and should not be done on single-user or embedded systems. The exception is single-drive VM systems, in which the swap device should be located somewhere in the middle of the drive, in order to minimize the impact of VM on file system performance. But, even here, my opinion is that VM systems should never be single-drive. Next, Bob echoes my own opinions, "Inode mapping is more efficient and Unix offers the ability to make both hard and soft links. Hard links do not impact disk access as much as soft links do." But then, he says, "If a user doesn't make symbolic links to his environment, there should be no impact on the operation of the OS." The problem is that most symbolic links are not created by the user (God knows I wouldn't make one, if I had a choice), but are distributed with the OS. Although in the GNU case, if the system is distributed as source without any assumptions about file system partitioning, my arguments will be moot; this is a major cause of poor file system performance in SunOS (for the 386i). "Long Filenames: I do not see any impact on an OS by allowing long names. " My objection is not against long filenames, in general, as much as it is against the structure of BSD directories in particular. The complicated system of counts and offsets that must be traversed in a BSD directory must consume much more cpu time than the relatively simple structure of AT&T's directories (I am aware that AT&T, sadly, will adopt BSD's directory structure in the 'unified UNIX'). I am not concerned about the time required to extract a long name from a BSD directory, as opposed to extracting a short name from a BSD directory. I am concerned about the time that it takes to extract the inode number of the nth entry in a BSD directory.
rroba.DlosLV@xerox.com (06/10/89)
In message <14460.8906081609@orchid.warwick.ac.uk>, somebody says: >In article <19889@adm.BRL.MIL> you write: >> (I am assuming that everybody in the audience understands the impact >> of a symbolic link on the amount of time required to open a file.) > >No, I don't -- can you explain it please When you attempt to open a file, you specify a path name. Before the file can be opened, the kernel must translate the file name into an inode number (the inode must be obtained to determine the location and size of the file on the disk drive). The inode number is recorded in the directory of the file. So, the kernel must open (access) the directory to read the inode number of the file; but before it can open the directory, it must first determine the inode number of the directory . . . and so on to the root directory. This process actually begins, of course, with the root directory (at least in the case of absolute path names), and traces up to the file (opening directories and extracting the next inode number along the way). In the case of symbolic links, this process is interrupted when the kernel finds, in some intermediate directory, not an inode number, but an alternate path name. At this point the kernel must begin again at the root directory, retracing it's steps through another sequence of directories. The difficulty of extracting an inode number from this sequence of directories is further complicated in BSD systems by the complexity of BSD directories, which are structured in a manner similar to a linked list (as opposed to AT&T directories, which are more like arrays of structs).
guy@auspex.auspex.com (Guy Harris) (06/11/89)
>Bob goes on to say, "Virtual Memory: VMem is quite useful especially in >CAD tools and high volume/resolution graphics. On multi-user systems it >becomes unrealistic to keep gigabytes of RAM around in order to perform >high volume graphics. " Again, I agree. My own background has been >principally in embedded systems, in which 1) data throughput is the measure >of success, and 2) system RAM requirements are pre-determinable. In this >environment, VM is not needed and imposes an execution cost that I would >rather not pay. OK, so change your statement from: Three things that should not be in an efficient OS: to Three things that should not be in an efficient OS for embedded systems: or move Virtual Memory from the list of "things that should not be in an efficient OS" to a separate list of "things that should not be in an efficient OS for embedded systems" (not having worked with those systems, I'll let those who have debate whether VM is ever appropriate for them). UNIX wasn't primarily intended as an OS for embedded systems.... >The problem is that most symbolic links are not created by the user >(God knows I wouldn't make one, if I had a choice), I've made many of them; they do come in handy for some of us. I will not defend the 386i version of SunOS's proliferation of them, but the fact that they can be perhaps used to excess doesn't render them useless.... >"Long Filenames: I do not see any impact on an OS by allowing long names. " >My objection is not against long filenames, in general, as much as it is >against the structure of BSD directories in particular. The complicated >system of counts and offsets that must be traversed in a BSD directory must >consume much more cpu time than the relatively simple structure of AT&T's >directories (I am aware that AT&T, sadly, will adopt BSD's directory >structure in the 'unified UNIX'). I'm not sad about it in the least. I'm quite glad that I'll be able to have an S5 system on which I'll be able to create files without having to worry about the length of the file's name. The various directory name caches present in more recent systems with the BSD file system (including S5R4, when it arrives) help reduce the time spent looking up entries. If your objection is not to long filenames (although you *did* just say "long filenames" first and "(BSD directories)" second), note that extending the V7/S5 directory format to support longer file names makes directory entries larger, which also slows down the lookup time. It would be interesting to see the distribution of file name lengths on a BSD system (where the limit is probably essentially infinite for all but the most perverse user or application), to see if there's a bend in the curve suggesting a lower maximum length, and then see how a fixed-length-entry scheme supporting that maximum length does vs. the BSD scheme.
guy@auspex.auspex.com (Guy Harris) (06/11/89)
>In the case of symbolic links, this process is interrupted when the kernel >finds, in some intermediate directory, not an inode number, but an >alternate path name. You must be thinking of some flavor of symbolic links other than the one used in the UNIX systems with which I'm familiar. In the latter, the name lookup code finds an inode number, but the inode points to a file of type "symbolic link", which means the contents of the file are an alternate path name. This means the system has to "read" that file and *then* continue the lookup process. >At this point the kernel must begin again at the root >directory, retracing it's steps through another sequence of >directories. Assuming, of course, that the symbolic link's contents are an absolute path name.
bzs@bu-cs.BU.EDU (Barry Shein) (06/11/89)
Re: Symbolic Links... Also note that some vendors (eg. Encore) will store a symlink pathname directly into the inode if it will fit (I think the cut-off was 63 chars which isn't too sleazy, I never measured the hit rate tho I could easily I guess.) This means that getting the symlink path requires no extra disk accesses tho chasing down the result of course costs the same. The moral is: before you condemn a feature just for being non-performant make sure the implementation can't be improved. It would also be interesting to measure these things people claim are unacceptably non-performant. With all the caches etc I wouldn't trust people's intuitions, they might be complaining about nothing (ok, in certain real-time environments every cycle counts, but I doubt their problems are solved by merely avoiding symlinks, sounds like a red herring, and yes, I've done a fair amount of real-time stuff, in Unix even!) -- -Barry Shein Software Tool & Die, Purveyors to the Trade 1330 Beacon Street, Brookline, MA 02146, (617) 739-0202
rbj@dsys.ncsl.nist.gov (Root Boy Jim) (06/13/89)
? From: Barry Shein <bzs@bu-cs.bu.edu> ? Re: Symbolic Links... ? Also note that some vendors (eg. Encore) will store a symlink pathname ? directly into the inode if it will fit (I think the cut-off was 63 ? chars which isn't too sleazy, I never measured the hit rate tho I ? could easily I guess.) This means that getting the symlink path ? requires no extra disk accesses tho chasing down the result of course ? costs the same. ? The moral is: before you condemn a feature just for being ? non-performant make sure the implementation can't be improved. I seem to remember something about a UNIX port to a big machine (Cray? 370?) that used 4k bytes/inode. Guess where small files were stored? ? -Barry Shein ? Software Tool & Die, Purveyors to the Trade ? 1330 Beacon Street, Brookline, MA 02146, (617) 739-0202 Root Boy Jim is what I am Are you what you are or what?
jack@cwi.nl (Jack Jansen) (06/13/89)
In article <19981@adm.BRL.MIL> rbj@dsys.ncsl.nist.gov (Root Boy Jim) writes: > >I seem to remember something about a UNIX port to a big machine (Cray? >370?) that used 4k bytes/inode. Guess where small files were stored? > Was this actually implemented? This idea was proposed by Sape Mullender and Andy Tanenbaum in the paper 'Immedeate Files' (Software - practice and experience, april 1984), but I wasn't aware that people had actually done it. I would be interested if anyone could provide more details..... -- -- Een volk dat voor tirannen zwicht | Oral: Jack Jansen zal meer dan lijf en goed verliezen | Internet: jack@cwi.nl dan dooft het licht | Uucp: mcvax!jack
paul@prcrs.UUCP (Paul Hite) (06/14/89)
In article <8187@boring.cwi.nl>, jack@cwi.nl (Jack Jansen) writes: > In article <19981@adm.BRL.MIL> rbj@dsys.ncsl.nist.gov (Root Boy Jim) writes: > > > >I seem to remember something about a UNIX port to a big machine (Cray? > >370?) that used 4k bytes/inode. Guess where small files were stored? > > > I would be interested if anyone could provide more details..... I believe that I know the paper that Root Boy Jim remembers. But I'll bet that he confused a couple of things. I found the paper in the AT&T Bell Labs Technical Journal Oct 1984 Vol. 63 No.8 Part 2. (This is one of the 2 all-unix issues. These two issues have been reprinted and are available now as "Unix Readings" or something.) The paper is titled "A UNIX System Implementation for System/370" by W. A. Felton, G. L. Miller and J. M. Milner. And, Jack, the paper is dated Jan 9, 1984. A couple of quotes: UNIX file systems on System/370 are in format identical to standard UNIX file systems, except that the block size has been enlarged to 4096 bytes. But later: Files of less than 493 bytes are stored directly in the corresponding inode. The paper doesn't get more explicit than that about inode size. I believe that they were just using large blocks with regular sized inodes. They put small files in the inodes because they were afraid of wasting space with big blocks. They didn't have any "fragment" concept. They actually call the fast access a "side effect". Paul Hite PRC Realty Systems McLean,Va uunet!prcrs!paul (703) 556-2243 DOS is a four letter word!
peter@ficc.uu.net (Peter da Silva) (06/15/89)
I remember reading that V7 (or was it 2BSD) stored files less than 39 bytes in the inode. A real saving for all those empty directories. -- Peter da Silva, Xenix Support, Ferranti International Controls Corporation. Business: uunet.uu.net!ficc!peter, peter@ficc.uu.net, +1 713 274 5180. Personal: ...!texbell!sugar!peter, peter@sugar.hackercorp.com.
rbj@dsys.ncsl.nist.gov (Root Boy Jim) (06/24/89)
? From: Paul Hite <paul@prcrs.uucp> ? > In article <19981@adm.BRL.MIL> rbj@dsys.ncsl.nist.gov (Root Boy Jim) writes ? > > ? > >I seem to remember something about a UNIX port to a big machine (Cray? ? > >370?) that used 4k bytes/inode. Guess where small files were stored? ? I believe that I know the paper that Root Boy Jim remembers. But I'll ? bet that he confused a couple of things. It won't be the first or the last time :-) ? I found the paper in the AT&T Bell Labs Technical Journal Oct 1984 ? Vol. 63 No.8 Part 2. (This is one of the 2 all-unix issues. These two ? issues have been reprinted and are available now as "Unix Readings" or ? something.) ? The paper is titled "A UNIX System Implementation for System/370" by ? W. A. Felton, G. L. Miller and J. M. Milner. And, Jack, the paper is ? dated Jan 9, 1984. That's the ticket! ? A couple of quotes: ? UNIX file systems on System/370 are in format identical to ? standard UNIX file systems, except that the block size has ? been enlarged to 4096 bytes. ? But later: ? Files of less than 493 bytes are stored directly in the ? corresponding inode. Hmmm. That would seem to imply an inode size of 512, with 20 bytes of mode/uid/gid/links/size/etc info. Exactly one sector. ? The paper doesn't get more explicit than that about inode size. I believe ? that they were just using large blocks with regular sized inodes. They ? put small files in the inodes because they were afraid of wasting space ? with big blocks. They didn't have any "fragment" concept. They actually ? call the fast access a "side effect". No wonder. 128 direct blocks gives you 1/2 Meg of directly accessible data. Another side effect is that if the buffer cache was modified to treat inode and data blocks differently (512 and 4k sizes), when a buffer was locked for I/O it wouldn't lock out all the other inodes in that buffer. ? Paul Hite PRC Realty Systems McLean,Va uunet!prcrs!paul (703) 556-2243 ? DOS is a four letter word! Root Boy Jim is what I am Are you what you are or what?
whh@PacBell.COM (Wilson Heydt) (06/27/89)
In article <20100@adm.BRL.MIL>, rbj@dsys.ncsl.nist.gov (Root Boy Jim) writes: > ? From: Paul Hite <paul@prcrs.uucp> > > ? The paper is titled "A UNIX System Implementation for System/370" by > ? W. A. Felton, G. L. Miller and J. M. Milner. And, Jack, the paper is > ? dated Jan 9, 1984. > > Hmmm. That would seem to imply an inode size of 512, with 20 bytes of > mode/uid/gid/links/size/etc info. Exactly one sector. Except for one *minor* thing--the drives on System/370s (and 360s, for that matter) don't *have* sectors. They're variable (or free) format. Be careful about system-provincialism. --Hal ========================================================================= Hal Heydt | In the old days, we had wooden Analyst, Pacific*Bell | ships sailed by iron men. Now 415-645-7708 | we have steel ships and block- whh@pbhya.PacBell.COM | heads running them. --Capt. D. Seymour
guy@auspex.auspex.com (Guy Harris) (06/29/89)
>I remember reading that V7 (or was it 2BSD) stored files less than 39 bytes >in the inode. A real saving for all those empty directories. V7 didn't do that. Sounds like you should throw out your references for V7 behavior and get some better ones....