[comp.unix.wizards] What kind of things would you want in the GNU OS

rroba.DlosLV@xerox.com (06/06/89)

> Three things that should not be in an efficient OS:
>	1) virtual memory
>	2) symbolic links
>	3) long file names (BSD directories)

Perhaps some explanation is in order:

We have performance problems with SunOS that we don't have with Xenix
on similar hardware.  The reason for the difference in performance that
we see between Xenix and SunOS is the presence of these three features
in SunOS.  The code in the Kernel which supports these features eats
up memory and cpu time whether the user wants to use them or not.
The typical size of Xenix on an 80386 architecture is 300K, SunOS for the
386i is about 900K (as distributed; some features can be deleted through
reconfiguration).

The historical philosophy of the UNIX community (AT&T, at least) has been
Keep It Simple.  The recent  proclivity toward rampant featurism (BSD
crowd in particular) has resulted in a corresponding decrease in system
throughput.

>VM and Symbolic links are nice features.  I'd say put them in.  Let the
user >determine whether to use them or not.  the fact that they are there
>doesn't hurt an OS.  But, let the user, not the author, be the deciding
factor.

VM kills performance whether the user chooses to use it or not.  The
typical
placement of the swap device on single disk systems (as most UNIX systems
seem to be now) is as a partition placed between a read-only root file
system and a read-write user file system (this is done to "optimize" the
disk's swap activity).  The result is that the disk heads are constantly
seeking
between the outermost tracks (the root file system), and the innermost
tracks (the user file system).  Paging code in the Kernel does not come
free.
It requires space, and execution time.  Virtual memory was justifiable when
memory was expensive (i.e. $30,000 for an 8K core bomb).  Memory now
is just too cheap to pay for the execution penalty.

The real problem with symbolic links is that users are not given a choice
whether to use them or not.  If the OS distributor chooses to use them, and
then rewrites utilities to optimize "real" paths, it is next to impossible
for
the user to remove them.  (I am assuming that everybody in the audience
understands the impact of a symbolic link on the amount of time required
to open a file.) 

guy@auspex.auspex.com (Guy Harris) (06/08/89)

>We have performance problems with SunOS that we don't have with Xenix
>on similar hardware.  The reason for the difference in performance that
>we see between Xenix and SunOS is the presence of these three features
>in SunOS.

Evidence, please, for the conclusion that those particular three
features account for all, or even most, of the difference in performance
you see?

>The code in the Kernel which supports these features eats up memory and
>cpu time whether the user wants to use them or not.

Are those the *only* features present in SunOS but not in your Xenix?
Does your Xenix have TCP/IP support, or NFS, for example?

(For that matter, are you certain your Xenix lacks VM?)

jmagee@fenix.UUCP (Jim Magee) (06/09/89)

> Three things that should not be in an efficient OS:
>	1) virtual memory
>	2) symbolic links
>	3) long file names (BSD directories)

Well, I don't quite agree.

>Perhaps some explanation is in order:

>We have performance problems with SunOS that we don't have with Xenix
>on similar hardware.  The reason for the difference in performance that
>we see between Xenix and SunOS is the presence of these three features
>in SunOS.  The code in the Kernel which supports these features eats
>up memory and cpu time whether the user wants to use them or not.
>The typical size of Xenix on an 80386 architecture is 300K, SunOS for the
>386i is about 900K (as distributed; some features can be deleted through
>reconfiguration).
The historical philosophy of the UNIX community (AT&T, at least) has been
Keep It Simple.  The recent  proclivity toward rampant featurism (BSD
crowd in particular) has resulted in a corresponding decrease in system
throughput.

Well, the kernel code that supports virtual memory does not eat up that
much kernel space.  Now all the networking code, tty drivers, etc... in
there are a totally different story, and look to de-kernelized OSs like
Mach (and hopefully GNU, hint, hint...) to take care of this problem.

>>VM and Symbolic links are nice features.  I'd say put them in.  Let the
>>user determine whether to use them or not.  the fact that they are there
>>doesn't hurt an OS.  But, let the user, not the author, be the deciding
>>factor.
>
>VM kills performance whether the user chooses to use it or not.  The
>typical
>placement of the swap device on single disk systems (as most UNIX systems
>seem to be now) is as a partition placed between a read-only root file
>system and a read-write user file system (this is done to "optimize" the
>disk's swap activity).  The result is that the disk heads are constantly
>seeking
>between the outermost tracks (the root file system), and the innermost
>tracks (the user file system).  Paging code in the Kernel does not come
>free.
>It requires space, and execution time.  Virtual memory was justifiable when
>memory was expensive (i.e. $30,000 for an 8K core bomb).  Memory now
>is just too cheap to pay for the execution penalty.

Well I have to totally disagree here.  If you don't want the swap partition
in between / and /usr, then buy another disk and put /usr on that.  This
gives you interleaved disks and another drive is a hell of a lot cheaper
than having to stick the maximum amount of memory on a system that you are
ever going to need.  I have worked on real memory systems and seeing:

Please wait: waiting for memory to be freed....

When you try ot run ls, is not exactly fun. (especially when whatever has
that memory occupied never frees it, how do spell relief? R-E-B-O-O-T)  Plus
virtual memory can actually save performace, because it pages code in as
well as out, if you don't need it, it won't be brought in)  Try running emacs
on a non-VM system.

If you don't want a certain application to be swapped, then have process/page
lockdown system calls (along with nice features like real-time scheduling etc,
are you listening RMS? ;-)).  Please don't ever take me VM away.  If you
don't want it, use DOS.
-- 
Jim Magee - Unix Development		| Encore Computer Corp
jmagee@gould.com			| 6901 W Sunrise Blvd  MS407
...!uunet!gould!jmagee			| Ft Lauderdale, FL 33313
"I speak for nobody..."			| (305) 587-2900 x4925

rroba.DlosLV@xerox.com (06/10/89)

In response to an earlier posting, in which I said:
> Three things that should not be in an efficient OS:
>	1) virtual memory
>	2) symbolic links
>	3) long file names (BSD directories)
Bob Cherry says, "This list is extremely application dependant. "   I
agree.  My purpose in posting these comments  is to point out that what is
appropriate for some applications is poison to others.

Bob goes on to say, "Virtual Memory:  VMem is quite useful especially in
CAD tools and high volume/resolution graphics.  On multi-user systems it
becomes unrealistic to keep gigabytes of RAM around in order to perform
high volume graphics. "   Again, I agree.  My own background has been
principally in embedded systems, in which 1) data throughput is the measure
of success, and 2) system RAM requirements are pre-determinable.  In this
environment, VM is not needed and imposes an execution cost that I would
rather not pay.

Then he says, "Eliminating VMem eliminates the ability to operate a wide
range of applications and/or programming environments."  This is a good
point.  But, in the environments that I work in, the applications that will
be run on a particular system are known and fixed; so that this is not a
concern.  Throughput, however, is still a concern.
  
"Symbolic Links:  These links are extremely useful when a third party
application requires that it be run from a specific directory.  If the
particular directory is in its own disk partition and if that partition
does not have adequate free space to install or operate, a link may be used
to map the actual directory to the desired directory."  This is a
description of a situation  (partitioned drive) that was created by either
the user or the (auto-installation program of the) OS distributor (i.e. Sun
Microsystems).  On multi-user systems, partitioning a drive facilitates the
creation of secure backups from online file systems (because the partitions
can be individually umounted and dumped).  Partitioning a drive, however,
always has a negative impact on file system throughput (through increased
seek time), and should not be done on single-user or embedded systems.  The
exception is single-drive VM systems, in which the swap device should be
located somewhere in the middle of the drive, in order to minimize the
impact of VM on file system performance.  But, even here, my opinion is
that VM systems should never be single-drive.

Next, Bob echoes my own opinions, "Inode mapping is more efficient and Unix
offers the ability to make both hard and soft links.  Hard links do not
impact disk access as much as soft links do."  But then, he says, "If a
user doesn't make symbolic links to his environment, there should be no
impact on the operation of the OS."  The problem is that most symbolic
links are not created by the user (God knows I wouldn't make one, if I had
a choice), but are distributed with the OS.  Although in the GNU case,  if
the system is distributed as source without any assumptions about file
system partitioning, my arguments will be moot; this is a major cause of
poor file system performance in SunOS (for the 386i).

"Long Filenames: I do not see any impact on an OS by allowing long names. "
My objection is not against long filenames, in general, as much as it is
against the structure of BSD directories in particular.  The complicated
system of counts and offsets that must be traversed in a BSD directory must
consume much more cpu time than the relatively simple structure of AT&T's
directories (I am aware that AT&T, sadly,  will adopt BSD's directory
structure in the 'unified UNIX').  I am not concerned about the time
required to extract a long name from a BSD directory, as opposed to
extracting a short name from a BSD directory.  I am concerned about the
time that it takes to extract the inode number of the nth entry in a BSD
directory. 

rroba.DlosLV@xerox.com (06/10/89)

In message <14460.8906081609@orchid.warwick.ac.uk>,  somebody says:
>In article <19889@adm.BRL.MIL> you write:
>> (I am assuming that everybody in the audience understands the impact
>> of a symbolic link on the amount of time required to open a file.)
>
>No, I don't -- can you explain it please

When you attempt to open a file, you specify a path name.  Before the file
can be opened, the kernel must translate the file name into an inode number
(the inode must be obtained to determine the location and size of the file
on the disk drive).   The inode number is recorded in the directory of the
file.  So, the kernel must open (access) the directory to read the inode
number of the file; but before it can open the directory, it must first
determine the inode number of the directory . . .  and so on to the root
directory.  This process actually begins, of course, with the root
directory (at least in the case of absolute path names), and traces up to
the file (opening directories and extracting the next inode number along
the way).

In the case of symbolic links, this process is interrupted when the kernel
finds, in some intermediate directory, not an inode number, but an
alternate path name.  At this point the kernel must begin again at the root
directory, retracing it's steps through another sequence of directories.

The difficulty of extracting an inode number from this sequence of
directories is further complicated in BSD systems by the complexity of BSD
directories, which are structured in a manner similar to a linked list (as
opposed to AT&T directories, which are more like arrays of structs).

guy@auspex.auspex.com (Guy Harris) (06/11/89)

>Bob goes on to say, "Virtual Memory:  VMem is quite useful especially in
>CAD tools and high volume/resolution graphics.  On multi-user systems it
>becomes unrealistic to keep gigabytes of RAM around in order to perform
>high volume graphics. "   Again, I agree.  My own background has been
>principally in embedded systems, in which 1) data throughput is the measure
>of success, and 2) system RAM requirements are pre-determinable.  In this
>environment, VM is not needed and imposes an execution cost that I would
>rather not pay.

OK, so change your statement from:

	Three things that should not be in an efficient OS:

to

	Three things that should not be in an efficient OS for embedded
	systems:

or move Virtual Memory from the list of "things that should not be in an
efficient OS" to a separate list of "things that should not be in an
efficient OS for embedded systems" (not having worked with those
systems, I'll let those who have debate whether VM is ever appropriate
for them).

UNIX wasn't primarily intended as an OS for embedded systems....

>The problem is that most symbolic links are not created by the user
>(God knows I wouldn't make one, if I had a choice),

I've made many of them; they do come in handy for some of us.  I will
not defend the 386i version of SunOS's proliferation of them, but the
fact that they can be perhaps used to excess doesn't render them
useless....

>"Long Filenames: I do not see any impact on an OS by allowing long names. "
>My objection is not against long filenames, in general, as much as it is
>against the structure of BSD directories in particular.  The complicated
>system of counts and offsets that must be traversed in a BSD directory must
>consume much more cpu time than the relatively simple structure of AT&T's
>directories (I am aware that AT&T, sadly,  will adopt BSD's directory
>structure in the 'unified UNIX').

I'm not sad about it in the least.  I'm quite glad that I'll be able to
have an S5 system on which I'll be able to create files without having
to worry about the length of the file's name.  The various directory
name caches present in more recent systems with the BSD file system
(including S5R4, when it arrives) help reduce the time spent looking up
entries.

If your objection is not to long filenames (although you *did* just say
"long filenames" first and "(BSD directories)" second), note that
extending the V7/S5 directory format to support longer file names makes
directory entries larger, which also slows down the lookup time.  It
would be interesting to see the distribution of file name lengths on a
BSD system (where the limit is probably essentially infinite for all but
the most perverse user or application), to see if there's a bend in the
curve suggesting a lower maximum length, and then see how a
fixed-length-entry scheme supporting that maximum length does vs. the
BSD scheme.

guy@auspex.auspex.com (Guy Harris) (06/11/89)

 >In the case of symbolic links, this process is interrupted when the kernel
 >finds, in some intermediate directory, not an inode number, but an
 >alternate path name.

You must be thinking of some flavor of symbolic links other than the one
used in the UNIX systems with which I'm familiar.  In the latter, the
name lookup code finds an inode number, but the inode points to a file
of type "symbolic link", which means the contents of the file are an
alternate path name.  This means the system has to "read" that file and
*then* continue the lookup process. 

 >At this point the kernel must begin again at the root
 >directory, retracing it's steps through another sequence of
 >directories.

Assuming, of course, that the symbolic link's contents are an absolute
path name.

bzs@bu-cs.BU.EDU (Barry Shein) (06/11/89)

Re: Symbolic Links...

Also note that some vendors (eg. Encore) will store a symlink pathname
directly into the inode if it will fit (I think the cut-off was 63
chars which isn't too sleazy, I never measured the hit rate tho I
could easily I guess.) This means that getting the symlink path
requires no extra disk accesses tho chasing down the result of course
costs the same.

The moral is: before you condemn a feature just for being
non-performant make sure the implementation can't be improved.

It would also be interesting to measure these things people claim are
unacceptably non-performant. With all the caches etc I wouldn't trust
people's intuitions, they might be complaining about nothing (ok, in
certain real-time environments every cycle counts, but I doubt their
problems are solved by merely avoiding symlinks, sounds like a red
herring, and yes, I've done a fair amount of real-time stuff, in Unix
even!)
-- 
	-Barry Shein

Software Tool & Die, Purveyors to the Trade
1330 Beacon Street, Brookline, MA 02146, (617) 739-0202

rbj@dsys.ncsl.nist.gov (Root Boy Jim) (06/13/89)

? From: Barry Shein <bzs@bu-cs.bu.edu>

? Re: Symbolic Links...

? Also note that some vendors (eg. Encore) will store a symlink pathname
? directly into the inode if it will fit (I think the cut-off was 63
? chars which isn't too sleazy, I never measured the hit rate tho I
? could easily I guess.) This means that getting the symlink path
? requires no extra disk accesses tho chasing down the result of course
? costs the same.

? The moral is: before you condemn a feature just for being
? non-performant make sure the implementation can't be improved.

I seem to remember something about a UNIX port to a big machine (Cray?
370?) that used 4k bytes/inode. Guess where small files were stored?

? 	-Barry Shein

? Software Tool & Die, Purveyors to the Trade
? 1330 Beacon Street, Brookline, MA 02146, (617) 739-0202


	Root Boy Jim is what I am
	Are you what you are or what?

jack@cwi.nl (Jack Jansen) (06/13/89)

In article <19981@adm.BRL.MIL> rbj@dsys.ncsl.nist.gov (Root Boy Jim) writes:
>
>I seem to remember something about a UNIX port to a big machine (Cray?
>370?) that used 4k bytes/inode. Guess where small files were stored?
>
Was this actually implemented? This idea was proposed by Sape Mullender
and Andy Tanenbaum in the paper 'Immedeate Files' (Software - practice
and experience, april 1984), but I wasn't aware that people had actually 
done it.

I would be interested if anyone could provide more details.....
-- 
--
Een volk dat voor tirannen zwicht	| Oral:     Jack Jansen
zal meer dan lijf en goed verliezen	| Internet: jack@cwi.nl
dan dooft het licht			| Uucp:     mcvax!jack

paul@prcrs.UUCP (Paul Hite) (06/14/89)

In article <8187@boring.cwi.nl>, jack@cwi.nl (Jack Jansen) writes:
> In article <19981@adm.BRL.MIL> rbj@dsys.ncsl.nist.gov (Root Boy Jim) writes:
> >
> >I seem to remember something about a UNIX port to a big machine (Cray?
> >370?) that used 4k bytes/inode. Guess where small files were stored?
> >
> I would be interested if anyone could provide more details.....

I believe that I know the paper that Root Boy Jim remembers.  But I'll
bet that he confused a couple of things.

I found the paper in the AT&T Bell Labs Technical Journal Oct 1984
Vol. 63 No.8 Part 2.  (This is one of the 2 all-unix issues.  These two
issues have been reprinted and are available now as "Unix Readings" or
something.)

The paper is titled "A UNIX System Implementation for System/370"  by
W. A. Felton, G. L. Miller and J. M. Milner.  And, Jack, the paper is
dated Jan 9, 1984.

A couple of quotes:

	UNIX file systems on System/370 are in format identical to 
	standard UNIX file systems, except that the block size has
	been enlarged to 4096 bytes.

But later:

	Files of less than 493 bytes are stored directly in the
	corresponding inode.

The paper doesn't get more explicit than that about inode size.  I believe
that they were just using large blocks with regular sized inodes.  They 
put small files in the inodes because they were afraid of wasting space
with big blocks.  They didn't have any "fragment" concept.  They actually
call the fast access a "side effect".

Paul Hite   PRC Realty Systems  McLean,Va   uunet!prcrs!paul    (703) 556-2243
                      DOS is a four letter word!

peter@ficc.uu.net (Peter da Silva) (06/15/89)

I remember reading that V7 (or was it 2BSD) stored files less than 39 bytes
in the inode. A real saving for all those empty directories.
-- 
Peter da Silva, Xenix Support, Ferranti International Controls Corporation.

Business: uunet.uu.net!ficc!peter, peter@ficc.uu.net, +1 713 274 5180.
Personal: ...!texbell!sugar!peter, peter@sugar.hackercorp.com.

rbj@dsys.ncsl.nist.gov (Root Boy Jim) (06/24/89)

? From: Paul Hite <paul@prcrs.uucp>

? > In article <19981@adm.BRL.MIL> rbj@dsys.ncsl.nist.gov (Root Boy Jim) writes
? > >
? > >I seem to remember something about a UNIX port to a big machine (Cray?
? > >370?) that used 4k bytes/inode. Guess where small files were stored?

? I believe that I know the paper that Root Boy Jim remembers.  But I'll
? bet that he confused a couple of things.

It won't be the first or the last time :-)

? I found the paper in the AT&T Bell Labs Technical Journal Oct 1984
? Vol. 63 No.8 Part 2.  (This is one of the 2 all-unix issues.  These two
? issues have been reprinted and are available now as "Unix Readings" or
? something.)

? The paper is titled "A UNIX System Implementation for System/370"  by
? W. A. Felton, G. L. Miller and J. M. Milner.  And, Jack, the paper is
? dated Jan 9, 1984.

That's the ticket!

? A couple of quotes:

? 	UNIX file systems on System/370 are in format identical to 
? 	standard UNIX file systems, except that the block size has
? 	been enlarged to 4096 bytes.

? But later:

? 	Files of less than 493 bytes are stored directly in the
? 	corresponding inode.

Hmmm. That would seem to imply an inode size of 512, with 20 bytes of
mode/uid/gid/links/size/etc info. Exactly one sector.

? The paper doesn't get more explicit than that about inode size.  I believe
? that they were just using large blocks with regular sized inodes.  They 
? put small files in the inodes because they were afraid of wasting space
? with big blocks.  They didn't have any "fragment" concept.  They actually
? call the fast access a "side effect".

No wonder. 128 direct blocks gives you 1/2 Meg of directly accessible data.
Another side effect is that if the buffer cache was modified to treat
inode and data blocks differently (512 and 4k sizes), when a buffer was
locked for I/O it wouldn't lock out all the other inodes in that buffer.

? Paul Hite   PRC Realty Systems  McLean,Va   uunet!prcrs!paul (703) 556-2243
?                       DOS is a four letter word!

	Root Boy Jim is what I am
	Are you what you are or what?

whh@PacBell.COM (Wilson Heydt) (06/27/89)

In article <20100@adm.BRL.MIL>, rbj@dsys.ncsl.nist.gov (Root Boy Jim) writes:
> ? From: Paul Hite <paul@prcrs.uucp>
> 
> ? The paper is titled "A UNIX System Implementation for System/370"  by
> ? W. A. Felton, G. L. Miller and J. M. Milner.  And, Jack, the paper is
> ? dated Jan 9, 1984.
> 
> Hmmm. That would seem to imply an inode size of 512, with 20 bytes of
> mode/uid/gid/links/size/etc info. Exactly one sector.

Except for one *minor* thing--the drives on System/370s (and 360s, for
that matter) don't *have* sectors.  They're variable (or free) format.

Be careful about system-provincialism.

     --Hal

=========================================================================
  Hal Heydt                             | In the old days, we had wooden
  Analyst, Pacific*Bell                 | ships sailed by iron men.  Now
  415-645-7708                          | we have steel ships and block-
  whh@pbhya.PacBell.COM                 | heads running them. --Capt. D. Seymour

guy@auspex.auspex.com (Guy Harris) (06/29/89)

>I remember reading that V7 (or was it 2BSD) stored files less than 39 bytes
>in the inode. A real saving for all those empty directories.

V7 didn't do that.  Sounds like you should throw out your references for
V7 behavior and get some better ones....