gordoni@chook.adelaide.edu.au (Gordon Irlam) (03/06/91)
Followups directed to comp.os.misc.
This fell into a black hole the first time I tried to post it. The
original discussion which prompted this was an attempt to compare Mach
to various other flavours of unix by looking at the size of the
executable image.
One of the problems with comparing text sizes is the differences in
machine architectures, and compilers. Here is another (flawed) attempt
at comparing the sizes of a few operating systems.
Presented below are some rough measurements of the number of lines of
kernel source code for a number different operating systems (includes
comments and blank lines).
Kernel code
(lines / 1000)
Synthesis (Sun 3) 5 experimental
Plan 9 (SG Power) 15 experimental
V (Sun 3) >15 experimental
Unix 32/V (VAX) 17 basic unix
Minix 1.5 (IBM PC) 30 basic unix
Ninth Edition Unix (Sun 3) 80 unix
BSD 4.3 (VAX) 90 unix
BSD 4.3 Tahoe (VAX) 100 unix
System V R3.2 (3b2) 120 unix
SunOS 4.03 (Sun 3 + Sun 4) 440 unix
Umax 4.2 (Multimax) 280 multi unix
Mach 2.0 (VAX) 140 multi unix (minimal)
Mach 2.0 (VAX) 400 multi unix (full)
Mach 3.0 (80386) 100 multi distributed kernel
Chorus 3.2 (Compaq 386) 60 multi distributed kernel
Chorus 3.2 (Compaq 386) 200 multi distributed kernel and unix
All these figures are very rough. Typically I ran du on the sources,
and applied the empirically determined constant of 38 lines per
kilobyte of source. I then adjusted some of the figures as I saw fit.
This was to account for sources that contained a large number of small
files (where du counts each file as a whole block), or when the kernel
directories contained a significant amount of documentation or dead
code that should not be included; I was after the number of lines of
code that are actually compiled to build a real kernel. Other factors
that I have not attempted to account for are differences in coding
density, number of comments, the presence of debugging code and so on.
Don't believe any figures to within more than, say, 30%. A few of the
values have been plucked from the net or various research papers.
Notes follow (slightly inflammatory):
Synthesis, Columbia - 5k. This is a very experimental system. I
guess this is about as small as you can get and still have an
operating system.
Plan 9, Bell Labs - 15k. This is supposedly a real distributed
operating system. The size is surprising. Either a lot of
functionality we have come to expect is not present. Or most
operating systems have accumulated a lot of dead wood over the
years. Probably both. I think I can urge caution, at the
suggestion that Plan 9 is going to replace System V, at least in
the short term.
V, British Columbia/Stanford - at least 15k. I have only seen the
size quoted in some early papers. I suspect the final version was
quite a bit larger. Deceased.
Unix 32/V, Bell Labs - 17k. The first version of unix to run on a
VAX.
Minix 1.5, Tanenbaum - 30k. A "toy" system designed to teach the
principles of operating systems design. Significantly larger than
32/V!
Ninth Edition, Bell Labs - 80k. A more recent version of unix from
Bell Labs (1987). Don't know enough about it to be able to make
any nasty comments.
BSD 4.3, Berkeley - 90-100k. Unix has grown by a factor of 5 in its
lifetime on the VAX starting from 32/V, more to come. Admittedly
an increasing portion of this has been to accommodate the ever
increasing range of machine models, and obscure peripherals that
are being developed.
System V R3.2, AT&T - 120k. Cleaner code than BSD, which is a bit
hacky, but not a very nice system to use.
SunOS 4.03, Sun - 440k. This includes both the Sun 3 and Sun 4
versions. For either one alone I would guess about 350k lines all
up. RPC, TMPFS, DLL, NFS, YP, POSIX, SVID, XPG, C2, it's all fun
stuff, but not without its cost. I guess one important thing to
note is the size of a basic Unix system is quite small in
comparison to the amount of extra stuff added to provide all the
functionality many people expect. But is it all really necessary,
and does it have to be in the kernel?
Umax 4.2, Encore - 280k. A reasonable attempt at porting BSD to a
multiprocessor. Perhaps a more difficult task than it sounds.
Despite all the documentation in the code Encore is too scared of
the complexity to try and modify it so that it performs well.
Mach, Carnegie-Mellon - 100-400k. Sizes should probably be reduced by
about 20% to account for the RCS header logs that are included in
the sources. Mach version 2.0 was essentially a multiprocessor
version of BSD along with a few other bits that were re-written.
Note the very large size of the full system. A large number of
obscure device drivers are included, along with experimental
communication facilities. Ditching all this and the debugger
drops the size from 400k to 140k. A lot of barnacles have
accumulated to BSD over the years. Mach 3.0 is an attempt to get
rid of the barnacles and split the system into a small kernel, and
a Unix sub-system running on top of the kernel. I will leave the
word distributed, which I have used, to someone at CMU to justify
- I can't.
Chorus 3.2, Chorus Systemes - 60-200k. A distributed multiprocessor
kernel developed from the ground up. The current Unix sub-system
is based on System V, alas. Developed outside of the United
States, and consequently largely ignored inside the United States.
For fun here is the size of the total system including all the
utilities and so on that are needed for a real system. Includes all
the bin, lib, and sys directories, but not the man and doc
directories. Varies a bit depending on whether the system comes with
a Fortran compiler and so on, but I attempted to ignore things like
X11.
Total code
(lines / 1000)
Minix 1.5 (IBM PC) 170
Unix 32/V (VAX) 180
BSD 4.3 (VAX) 640
System V 3.2 (3b2) 960
Mach 2.0 (VAX) 1000
BSD 4.3 Tahoe (VAX) 1000
Umax 4.2 (Multimax) 1800
SunOS 4.03 (Sun 3, Sun 4) 2400
If anybody has any figures for the amount of source code in the
Amoeba, OSF/1 and System V R4.0 kernels could they please post them,
thanks.
Gordon Irlam
gordoni@cs.adelaide.edu.aucur022%cluster@ukc.ac.uk (Bob Eager) (03/06/91)
In article <2551@sirius.ucs.adelaide.edu.au>, gordoni@chook.adelaide.edu.au (Gordon Irlam) writes: > Minix 1.5, Tanenbaum - 30k. A "toy" system designed to teach the > principles of operating systems design. Significantly larger than > 32/V! I always thought the 32V compiler had trouble recognising comments, there were so few! Minix is the reverse; as a teaching tool, it has a high proportion of comments. In the same vein, I am sure there must be C compilers that have tiny symbol tables, given the number of C programs that have meaningless 1/2 character identifiers in them. Yes, I do know about the *real* 6/8 character limit on some compilers :-) -------------------------+------------------------------------------------- Bob Eager | University of Kent at Canterbury | +44 227 764000 ext 7589 -------------------------+-------------------------------------------------
miles@cogsci.ed.ac.uk (Miles Bader) (03/07/91)
Note that the average identifier in most of the mach (2.5) code I've seen is about 47 characters long... -Miles -- Miles Bader -- HCRC, University of Edinburgh -- Miles.Bader@ed.ac.uk -- -- Miles Bader -- HCRC, University of Edinburgh -- Miles.Bader@ed.ac.uk
guy@auspex.auspex.com (Guy Harris) (03/12/91)
>SunOS 4.03, Sun - 440k. This includes both the Sun 3 and Sun 4 > versions. For either one alone I would guess about 350k lines all > up. RPC, TMPFS, DLL, NFS, YP, POSIX, SVID, XPG, C2, it's all fun > stuff, but not without its cost. I guess one important thing to > note is the size of a basic Unix system is quite small in > comparison to the amount of extra stuff added to provide all the > functionality many people expect. But is it all really necessary, > and does it have to be in the kernel? No, not all of it does - which is why neither YP nor DLL, if by the latter you mean "dynamically-linked libraries", *are* in the kernel. The YP client and server code, and the run-time loader, run in user mode; they make use of services that are in the kernel (e.g., file system access and network access in the case of YP, file system access and "mmap()" in the case of the run-time loader), but aren't implemented entirely in the kernel. You may also have included RFS in your count of source lines; you're not obliged to include that as part of your system, if you don't need it. (The same is true of NFS.)