[comp.unix.wizards] libraries

chris@mimsy.UUCP (Chris Torek) (12/20/88)

>In article <14946@mimsy.UUCP> I suggested, half kidding:
>>For that matter, why do we need object archives in the first place?
>>They are just a hack to save space (and perhaps, but not necessarily,
>>time).  How about /lib/libc/*.o?

In article <1269@nusdhub.UUCP> rwhite@nusdhub.UUCP (Robert C. White Jr.)
writes:
>Wrong! cammel breath ;-)
>
>The *proper* use of object libraries is to *organize* your objects into
>a usefull search order.

You might try reading the article to which you are responding, and then
thinking about it.%  A Unix `.a' `library' file is simply a file containing
other files, plus (depending on your system) a symbol table (in the
`sub-file' __.SYMDEF).  Now then, what is a Unix directory?
-----
% Especially if it is one of my articles. :-)  I might also add a cheap
shot here about using `spell'...
-----

If your answer was `a file containing other files', congratulations.

Now, aside from the actual implementation, what is the difference between
a library file that contains other files and a library directory that
contains other files?

If your answer was `none', congratulations again.

>How many times would you have to scan the contents of /usr/lib/*.o to
>load one relatively complex c program (say vn).

Either one time, or (preferably) zero times.

>As modules called modules that the program itself didn't use, you introduce
>the probability that the directory would have to be searched multiple times.
>If you tried to aleviate that the files would have to be ordered by names
>that reflected dependancies instead of content.

This is all quite false.  Even without using a ranlib (symbol table
file) scheme, the directory need only be searched once, and every file
within it opened once to build the linker's symbol table; then, or
after reading the symdef file, those files that contained needed
routines would have to be opened once to read their contents.

>Then you would have all the extra system calls that would spring up
>to open, search, and close all those files.

The extra system calls argument is valid: if you needed N object
modules out of the `-lc' library, you would have to open N+1 files (1
for the symtab file) rather than 1 file.  It is, however, the very same
argument that `proves' that fork()+exec() is wrong.  I claim that the
open and close calls---there are no `search' calls, though there may be
a higher percentage of read()s, along with fewer or no lseek()s---
*should* not be so expensive.  I also find it very likely that those
who claim it is too expensive have not tested it.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

jfh@rpp386.Dallas.TX.US (The Beach Bum) (12/21/88)

In article <15080@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes:
>This is all quite false.  Even without using a ranlib (symbol table
>file) scheme, the directory need only be searched once, and every file
>within it opened once to build the linker's symbol table; then, or
>after reading the symdef file, those files that contained needed
>routines would have to be opened once to read their contents.

[ Definitions:		N		Number of files in directory
			#ofModules	Number of files required for link
			EntriesPerBlock	Directory entries per FS block
]

This should display quadratic file system behavior [ or worse ;-) ]
even with a linear read of the directory.  Each file lookup will require
N/2 file entries [ more or less ... ] to be scanned in the directory
[ and don't forget those BIG BSD style directories ;-) ] for each of
the N files in the directory.  This give up to N**2/2 entries which must
be checked.  The number of entries searched converts to blocks of
directory I/O at the rate of EntriesPerBlock, which is either bounded,
or [ for System V ] a constant.

With a symbol table file, the file system sees N/2 * #ofModules I/O.
With large links, #ofModules approaches N.

With a library, the file system see at most 2 * LibrarySize blocks
at most.  As an earlier poster noted, the average object file in a 
library is smaller than a 1K block.  Thus, the directory block I/O
dominates when #ofModules is greater than EntriesPerBlock.  [ did I
do that right? ]  For System V, this would occur when more than 32
modules were being loaded.  Your milage may vary.

The big loser is namei(), who doesn't have the slightest clue as to
what you are doing.  With a library file the I/O routines can be
optimized to handle the specific task of locating object files.  namei()
was never intended to be a general purpose index searching tool ;-)
Perhaps we should have ISAM directories ;-)  COBOL here we come ...
-- 
John F. Haugh II                        +-Quote of the Week:-------------------
VoiceNet: (214) 250-3311   Data: -6272  |"Unix doesn't have bugs,
InterNet: jfh@rpp386.Dallas.TX.US       | Unix is a bug"
UucpNet : <backbone>!killer!rpp386!jfh  +--              -- author forgotten --

chris@mimsy.UUCP (Chris Torek) (12/21/88)

In article <15080@mimsy.UUCP> I wrote:
>[a directory is] `a file containing other files' ...

I guess I should answer the nit: Yes, it contains names and pointers to
files, rather than their contents.  I lumped this under `implementation
differences', although it is in fact a generalisation.

Now that everyone has had time to disagree with what they think I said:

>... I claim that the [added] open and close calls ... *should* not be
>so expensive.  I also find it very likely that those who claim it is
>too expensive have not tested it.

I suppose I should relent.  In fact, I too think it would be too
expensive (though `too expensive' is rather a matter of opinion).
But I have not tested it, and I will not say that it *is* so without
seeing some comparisons.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

rwhite@nusdhub.UUCP (Robert C. White Jr.) (12/21/88)

in article <15080@mimsy.UUCP>, chris@mimsy.UUCP (Chris Torek) says:
> 
> thinking about it.%  A Unix `.a' `library' file is simply a file containing
> other files, plus (depending on your system) a symbol table (in the
> `sub-file' __.SYMDEF).  Now then, what is a Unix directory?

> If your answer was `a file containing other files', congratulations.

Wrong-O kid!  An archive library is a "File which contains the
original contents of zero-or-more external sources, usually text or
object files, which have been reduced to a single system object."

As subjective proof of this address this question:  "Can you 'archive'
a a device special file and then access the device or service through
direct refrence to the archive?"  The answer is OF COURSE *NO* because
(1) device special files have no "contents" per-se and (2) The archive
does not preserve the "file" concept on an individual-entry basis.  If
you do not understand the difference between a  "system object" file,
and "the contents of a" file go to FILE MGMT. THEORY 101.  Do not pass
go, do not collect your next paycheck.

> -----
> % Especially if it is one of my articles. :-)  I might also add a cheap
> shot here about using `spell'...
> -----

I might counter with a cheap shot about doing research. ;-)

> Now, aside from the actual implementation, what is the difference between
> a library file that contains other files and a library directory that
> contains other files?
> 
> If your answer was `none', congratulations again.

Wrong-O again kid.  An archive is a single system-object which
contains the contents of something which may-or-may-not have ever been
a system object.  A directory is a system object which organizes
(by inclusive internal grouping) refrents to other system objects.
System objects are files (or other things on non UNIX System
arcitectures), the contents of systems objects are not.
(I say refrent because of the whole i-node/multiple link issue;  I say
may-not-have-been a system object because having the ability recreate
a file because you have the image of its contents does not mean that
you had the file in the first place.  See cpio or tar specs and
compare these to the inode structure of your machine.)

As an excersize in intuition and deduction try the
following: (1) get a listing of the modules in /lib/libc.a (If you
can't do this you might as well leave the concept of libraries
compleetly alone from here on out.) (2) Compare the number of entries
in this library to the number of files you may have open at the same
time.  (3) Multiply the number of entries in libc.a times the amount
of memory required by your system to manage a single open file.  (4)
Multiply the entries by the amount of time necessary to open, read,
and close a 30-to-3000 byte file.  (5) calculate about how much the
buffer colision of all this filing would cost each of your users.

Now take all that system performance information and multiply it by
three or four libraries (avrage for a large job) and then multiply
that by the number of programmers.

You can't just say "asside form the implementation" because in these
things implementation is everything.  After all "asside form the
implementation" Faster-Than-Light travel is a workable solution to
space flight.

>>How many times would you have to scan the contents of /usr/lib/*.o to
>>load one relatively complex c program (say vn).
> 
> Either one time, or (preferably) zero times.

A library directory you never scan would be useless. Ncest' Pa?
[sic ;-)]

>>As modules called modules that the program itself didn't use, you introduce
>>the probability that the directory would have to be searched multiple times.
>>If you tried to aleviate that the files would have to be ordered by names
>>that reflected dependancies instead of content.
> 
> This is all quite false.  Even without using a ranlib (symbol table
> file) scheme, the directory need only be searched once, and every file
> within it opened once to build the linker's symbol table; then, or
> after reading the symdef file, those files that contained needed
> routines would have to be opened once to read their contents.

How many i-node caches do you think your system has?  Try "sar -a"
sometime and then compare this to the number of entries in /lib/libc.a
and ... (see above)

I can garentee at least two scans.  More if more than one person is
compiling and the compiles are not in *PERFECT* sync.

>>Then you would have all the extra system calls that would spring up
>>to open, search, and close all those files.
> 
> The extra system calls argument is valid: if you needed N object
> modules out of the `-lc' library, you would have to open N+1 files (1
> for the symtab file) rather than 1 file.  It is, however, the very same
> argument that `proves' that fork()+exec() is wrong.  I claim that the
> open and close calls---there are no `search' calls, though there may be
> a higher percentage of read()s, along with fewer or no lseek()s---
> *should* not be so expensive.  I also find it very likely that those
> who claim it is too expensive have not tested it.

There are a few things I don not have to *test* to know.  They can be
proved by induction.  In the same sense that I have dropped things
that have fallen on my foot, heavy things do damage when dropped,
therefore I do not need to drop an anvil on my foot to find out if my
foot would be damaged by that;  I may say I know what directory searches
cost when run solo because I have run expire (from usenet);  I have
also run expire when others have been using the system and have seen that
it eats prefromance to hell and runs slowly;  therefore I do not have
to implement a looser of a library scheme using a symbol table file
and individual object files to know that it is a dumb idea.

IF you would like to see an example of what a dog it would be to
create/replace a symbol table run "expire -r -h -i" which will open
every article once and create a history entry for it.  (how did you
think news worked anyway?  If you had to read /usr/lib/news/history to
get your articles by group you would never last out to reading the
articles.  While I will cede that there may(?) be more articles in the
system at one moment than there would be objects via. your aproach I
strongly suspect that analogy is stronger than you will chose to
admit.)

Rob.

p.s. You try being dislexic for a few years and then make comments
about spelling.

bsy@PLAY.MACH.CS.CMU.EDU (Bennet Yee) (12/21/88)

In article <15080@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes:
]
]>In article <14946@mimsy.UUCP> I suggested, half kidding:
]>>For that matter, why do we need object archives in the first place?
]>>They are just a hack to save space (and perhaps, but not necessarily,
]>>time).  How about /lib/libc/*.o?
]
]In article <1269@nusdhub.UUCP> rwhite@nusdhub.UUCP (Robert C. White Jr.)
]writes:
]>Wrong! cammel breath ;-)
]>
]>The *proper* use of object libraries is to *organize* your objects into
]>a usefull search order.
]
]You might try reading the article to which you are responding, and then
]thinking about it.%  A Unix `.a' `library' file is simply a file containing
]other files, plus (depending on your system) a symbol table (in the
]`sub-file' __.SYMDEF).  Now then, what is a Unix directory?
]
]If your answer was `a file containing other files', congratulations.
]
]Now, aside from the actual implementation, what is the difference between
]a library file that contains other files and a library directory that
]contains other files?
]
]If your answer was `none', congratulations again.
]

This is not quite right.  The difference is that .a files impose an order to
the the contents which is not present in normal directories.  In particular,
the version of _cleanup that you load can come from either findiop.o or
fakcu.o, so ordering is indeed important.  Of course, having
/lib/libc/{*.o,__.SYMDEF} can still work, except that you'd either impose
the order via the contents of __.SYMDEF or another file.

-bsy
-- 
Internet:  bsy@cs.cmu.edu		Bitnet:	 bsy%cs.cmu.edu%smtp@interbit
CSnet:     bsy%cs.cmu.edu@relay.cs.net	Uucp:    ...!seismo!cs.cmu.edu!bsy
USPS:      Bennet Yee, CS Dept, CMU, Pittsburgh, PA 15213-3890
Voice:     (412) 268-7571
--

chris@mimsy.UUCP (Chris Torek) (12/21/88)

[subject changed back to follow the other parallel thread]

In article <43200058@uicsrd.csrd.uiuc.edu> kai@uicsrd.csrd.uiuc.edu writes:
[re replacing archive libraries with directories full of .o files]
>Large numbers of object files?  You've apparently never worked on a program
>so huge that */*.o expands to overflow the shell's command line buffer, so
>there is absolutely no way to link without storing them all in a library
>first.

You are not thinking clearly.  Indeed, a large number of .o files is
one of the very reasons I was considering giving up or modifying the
current library archive scheme.  When you have that many .o files,
`ar c lib.a *.o' also runs out of argv space, and you must build the
library in pieces, because you must name all the .o files for ar.
Getting the library sorted becomes a major hassle.

But if you were to run `ld -X /lib/crt0.o -o foo foo.o -lc', how is
that different from when you now run `ld -X /lib/crt0.o -o foo foo.o -lc'?
So *what* if `-lc' tells ld `go look at /lib/libc/*.o' rather than
`go look at /lib/libc.a'?  Indeed, library directories and library
archive-files are not at all incompatible; one could (as I did)
imagine ld containing code rather like the following:

	struct libops {
		int	(*lib_getsyms)();
		int	(*lib_readobj)();
		...
	};
	int sprintf();

	...
		if (!arlib_open(&lib, libname) && !dirlib_open(&lib, libname))
			stop("cannot find library `%s'", libname);
	...

	int arlib_getsyms(), arlib_readobj();
	struct libops arlib_ops = { arlib_getsyms, arlib_readobj, ... };

	/* try for an archive .a file */
	int
	arlib_open(lib, libname)
		struct libdata *lib;
		char *libname;
	{
		struct arlib_data *p;
		int fd;
		char fn[MAXPATHLEN];

		(void) sprintf(fn, "%s.a", libname);
		if ((fd = open(fn, O_RDONLY)) < 0)
			return (0);	/* no ar file */

		/* got an ar file. set up private data, etc */
		p = (struct arlib_data *)xalloc(sizeof(*p));
		lib->lib_data = (caddr_t)p;
		lib->lib_ops = &arlib_ops;
		p->ar_fd = fd;
		p->ar_israndom = arlib_hasfile("__.SYMDEF");
		...
		return (1);
	}

	...

	int dirlib_getsyms(), dirlib_readobj();
	struct libops dirlib_ops = { dirlib_getsyms, dirlib_readobj, ... };

	/* try for a directory .a file */
	int
	dirlib_open(lib, libname)
		struct libdata *lib;
		char *libname;
	{
		struct dirlib_data *p;
		struct stat st;

		if (stat(libname, &st) || (st.st_mode & S_IFMT) != S_IFDIR)
			return (0);	/* not a directory library */

		/* like, similar, y'know? */
		p = (struct dirlib_data *)xalloc(sizeof(*p));
		lib->lib_data = (caddr_t)p;
		lib->lib_ops = &dirlib_ops;
		p->d_file = p->d_path + sprintf(p->d_path, "%s/", libname);
		...
		return (1);
	}

plus any other arbitrary library scheme one cared to come up with (such
as multiple .a files for `sub-groups' of the library, in which loops in
the call topology do not cause so much ordering trouble as they do in
separate libraries now, because all the sub-groups are treated as a
single library by the grouplib() routines).

(Some will recognise the above approach as the way one writes `object
oriented' code in C.)
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

chris@mimsy.UUCP (Chris Torek) (12/21/88)

In article <10211@rpp386.Dallas.TX.US> jfh@rpp386.Dallas.TX.US
(The Beach Bum) writes:
>[directory libraries might] display quadratic file system behavior
>[ or worse ;-) ] ...

This has always been a problem with the Unix file system.  Directory
scanning tended to be O(n^2).  That is why 4.3BSD, current SunOSes, and
perhaps even SVR3 have name caches.  It reduces the behaviour to O(n)
in most cases.  Granted, O(n) is still considerably higher than O(1)....

(And *that* is the real problem, not the side issues several others
have named.  Nonetheless, `directory libraries' would probably be handy
during library development, and damn the scan time.  When it gets too
bad, you give in and make the .a file.  Or the .a groups....)
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

chris@mimsy.UUCP (Chris Torek) (12/22/88)

In article <1278@nusdhub.UUCP> rwhite@nusdhub.UUCP (Robert C. White Jr.)
writes:
>Wrong-O kid!  An archive library is a "File which contains the
>original contents of zero-or-more external sources, usually text or
>object files, which have been reduced to a single system object."

This is an implementation detail.

>As subjective proof of this address this question:  "Can you 'archive'
>a a device special file and then access the device or service through
>direct refrence to the archive?"  The answer is OF COURSE *NO* because ...

Indeed?  Is that creaking I hear coming from the limb upon which you
have climbed?  Perhaps I should break it off, lest others be tempted to
go out on it as well:

Why, certainly, you can `archive' a device special file and then access
the device via the archive.  What would you say if I told you I had
added ar-file searching to the directory scanning code in namei?
Insecure, yes, but on a single user workstation, so what?  (Note that
while `ar' ignores S_IFCHR and S_IFBLK on extraction, they do in fact
appear in the header mode field.  It is eminently possible to scan ar
files in the kernel as if they were directories.  Pointless, perhaps,
but easy.)

>(1) device special files have no "contents" per-se and (2) The archive
>does not preserve the "file" concept on an individual-entry basis.

% man 5 ar

AR(5)               UNIX Programmer's Manual                AR(5)
     ...
     A file produced by ar has a magic string at the start, fol-
     lowed by the constituent files, each preceded by a file
     header. ...

     Each file begins on a even (0 mod 2) boundary; a new-line is
     inserted between files if necessary. ...

That should tell you that each `entry' is a `file'.  [Argument from
authority: the manual says so :-) ]

>If you do not understand the difference between a  "system object" file,
>and "the contents of a" file go to FILE MGMT. THEORY 101.  Do not pass
>go, do not collect your next paycheck.

The difference is an *implementation* question again.  There is no
fundamental reason that the kernel could not use `ar' format for
directories that contain files with exactly one link.

(Since the rest of the argument is built on this, I am going to skip
ahead.)

>As an excersize in intuition and deduction try the following:

[steps left out: essentially, decide how much time it would take to
 link-edit by reading individual .o files instead of a .a file.]

I have already wasted more time and net bandwidth on this subject than
I really cared to use; but here is a very simple timing comparison for
a `hello world' program%.  Viz:

	# N.B.: these were both run twice to prime the cache and
	# make the numbers settle.

	% time ld -X /lib/crt0.o -o hw0 hw.o -lc
	0.5u 0.4s 0:02 52% 24+112k 41+3io 2pf+0w
	% time ld -X /lib/crt0.o -o hw1 *.o
	0.2u 0.4s 0:01 48% 25+98k 30+10io 2pf+0w
	%

Reading individual .o files is *FASTER*.  It took the *same* amount of
system time (to a first approximation) and *less* user time to read
the needed .o files than it did to read (and ignore the unneeded parts
of) the archive, for a total of 33% less time.  It also took less memory
space and fewer disk transfers.

-----
% `hw.o' needs only a few .o files, but hey, I want the results to look
good.
-----

Now, there were only a few .o files involved in this case: hw1 needed
only the set

	_exit.o bcopy.o bzero.o calloc.o cerror.o close.o doprnt.o
	exit.o findiop.o flsbuf.o fstat.o getdtablesize.o getpagesize.o
	hw.o ioctl.o isatty.o lseek.o makebuf.o malloc.o printf.o
	read.o sbrk.o perror.o stdio.o write.o

which is only 25 out of a potential 317 (that includes a few dozen
compatibility routines, over one hundred syscalls, etc.).  Real programs
would need more .o files, and it will indeed require more open calls.
There is another issue, which I shall address momentarily, and that
is deciding which .o files are needed (which I did `by hand' above,
so that it does not count in the output from `time').

>>>How many times would you have to scan the contents of /usr/lib/*.o to
>>>load one relatively complex c program (say vn).

>>Either one time, or (preferably) zero times.

>A library directory you never scan would be useless. Ncest' Pa?
>[sic ;-)]

(Ne c'est pa, if anyone cares... ne ~= not, c'est ~= is, pa ~= way:
is not that the way.)

Clearly we are not communicating.

The linker need not `look at' any .o files.  Its task is to link.  To
do this it must know which files define needed symbols, and which
symbols those files need, and so forth, recursively, until all needed
symbols are satisfied.  Now, how might ld perform that task?

For an archive random library---/lib/libc.a, for instance---it does not
scan the entire archive.  It pulls one sub-file out of the archive,
__.SYMDEF.  This file lists which symbols are defined by which files.
It does not now list which symbols are needed by which files, but it is
easy to imagine that, in a new scheme, the file that takes its place
does.

So what ld might do, then, is read the `.symtab' file and, using that
information, recursively build a list of needed .o files.  It could
then open and link exactly those .o files---never touching any that are
not needed.  If your C program consists of `main() {}', all you need
is exit.o.  ld would read exactly two files from the C library.  And
hey presto! we have scanned the contents of /lib/libc/*.o zero times.
If your C program was the hello-world example above, ld would read
exactly 26 files (the 25 .o's plus .symtab)---and again, scan the
contents of /lib/libc/*.o zero times.

>I can garentee at least two scans.

Perhaps you mean something else here: the number of times the kernel
must look at directory entries to find any given .o file.  If
directories were as fast as they should be, the answer would be `1'.
(Consider, e.g., a Unix with hashed directories.)

>There are a few things I don not have to *test* to know. ... I do not
>have to implement a looser of a library scheme using a symbol table
>file and individual object files to know that it is a dumb idea.

But if you do not test them, you may still be wrong.  See the hello-
world example above.  Linking individual .o files is sometimes *faster*
---even in the current (4.3BSD-tahoe) system.  And my original point
still stands: if archives are a necessary efficiency hack, there may
be something wrong with the underlying system.  I think that, in
principle, they *should* be unnecessary, and we should come up with
a way to make them so.  [But I must admit that it is not high on my
priority list.]

>p.s. You try being dislexic for a few years and then make comments
>about spelling.

(That, incidentally, was why the remark was in the `cheap shots'
footnote.  I apologise for it, although not for my general attitude.)
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

jfh@rpp386.Dallas.TX.US (The Beach Bum) (12/22/88)

In article <1278@nusdhub.UUCP> rwhite@nusdhub.UUCP (Robert C. White Jr.) writes:
>Wrong-O kid!  An archive library is a "File which contains the
>original contents of zero-or-more external sources, usually text or
>object files, which have been reduced to a single system object."

Please don't be pendantic about it.  The discussion was over using a
directory to archive object libraries, not as a general purpose file
archiver.

>A library directory you never scan would be useless. Ncest' Pa?
>[sic ;-)]

Also, don't use French if you can't spell it correctly.  N'est-ce pas?
We do not need to be impressed with your foreign language skills, and
Henry Spencer will no doubt include a reference to you in his next
.signature ;-)

>IF you would like to see an example of what a dog it would be to
>create/replace a symbol table run "expire -r -h -i" which will open
>every article once and create a history entry for it.

Weak analogy.  There are orders of magnitudes of difference between
the number of news articles on a system and the number of elements in
an archive file.

The limitation in the library directory approach is the file system,
which is not designed with an efficient file name lookup mechanism.
This is similiarly seen when reading news because of the high number
of file opens, etc.  File lookups display file system performance
of some hairy polynomial order.  This is aggrevated by the Version 7
file system peculairity of separating inode blocks and directory blocks
with large amounts of head motion.  [ Times the number of partitions or
drives which must be crossed - i.e., I mount both /usr and /usr/spool.
BIG head motion there. ]

Resolve the inefficiencies of the file system and the approach becomes
feasible.  Replace the current directory/inode approach with something
of a higher performance nature than library/archive accessing then
suddenly archive directories display higher performance.  Perhaps ISAM
using virtual directories?  Perhaps a filename/inode database of some
other nature?  Sorted directories and a binary search namei?  Any number
of implementation strategies exist to improve actual file system
behavior.

>p.s. You try being dislexic for a few years and then make comments
>about spelling.

I was unaware that the spell command didn't work on dyslexic computers?
[ =<:-) ]
-- 
John F. Haugh II                        +-Quote of the Week:-------------------
VoiceNet: (214) 250-3311   Data: -6272  |"Unix doesn't have bugs,
InterNet: jfh@rpp386.Dallas.TX.US       | Unix is a bug"
UucpNet : <backbone>!killer!rpp386!jfh  +--              -- author forgotten --

norm@oglvee.UUCP (Norman Joseph) (12/22/88)

From article <15080@mimsy.UUCP>, by chris@mimsy.UUCP (Chris Torek):

#               [...]  A Unix `.a' `library' file is simply a file containing
# other files, plus (depending on your system) a symbol table (in the
# `sub-file' __.SYMDEF).  Now then, what is a Unix directory?
#  [...]
# If your answer was `a file containing other files', congratulations.
# 
# Now, aside from the actual implementation, what is the difference between
# a library file that contains other files and a library directory that
# contains other files?
# 
# If your answer was `none', congratulations again.

I probably won't be the only one to point this out, but...

I was taught that a `Unix directory' contained filename/i-node number
pairs, and that the actual contents of the files listed in the directory
existed -outside- of the directory itself.

This certainly -would- be different from a Unix `.a' file if, in fact,
the contents of the (object) files it `archives' are actually contained
within the `.a' file proper.

Now, most of this goes without saying, so I believe that I must have
missed the point you were trying to make by using this analogy.

\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\*//////////////////////////////////////
 Norm Joseph                   | UUCP: ...!{pitt,cgh}!amanue!oglvee!norm
 Oglevee Computer System, Inc. | "Everything's written in stone, until the
 Connellsville, PA  15425      |  next guy with a sledgehammer comes along."
/////////////////////////////////////*\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\

dhesi@bsu-cs.UUCP (Rahul Dhesi) (12/23/88)

In article <15106@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes:
     Directory scanning tended to be O(n^2).  That is why 4.3BSD,
     current SunOSes, and perhaps even SVR3 have name caches.  It
     reduces the behaviour to O(n) in most cases.  Granted, O(n) is
     still considerably higher than O(1)....

In the long run (4.5BSD?) perhaps a bit could be found to mark a
directory as hashed.  To preserve compatibility with current utilities,
you would need a shadow directory with the hashed structure that
mimicked the current sequential list.
-- 
Rahul Dhesi         UUCP:  <backbones>!{iuvax,pur-ee}!bsu-cs!dhesi

rwhite@nusdhub.UUCP (Robert C. White Jr.) (12/23/88)

in article <15126@mimsy.UUCP>, chris@mimsy.UUCP (Chris Torek) says:
> 
> In article <1278@nusdhub.UUCP> rwhite@nusdhub.UUCP (Robert C. White Jr.)
> writes:
>>Wrong-O kid!  An archive library is a "File which contains the
>>original contents of zero-or-more external sources, usually text or
>>object files, which have been reduced to a single system object."
> 
> This is an implementation detail.

Where do you get this "implementation has nothing to do with it" bull?
We are TALKING inplementation after all.  "Implementation asside" your
comments on implementing archives as directories instead of files are
reduced or nothing.

>>As subjective proof of this address this question:  "Can you 'archive'
>>a a device special file and then access the device or service through
>>direct refrence to the archive?"  The answer is OF COURSE *NO* because ...
> 
> Indeed?  Is that creaking I hear coming from the limb upon which you
> have climbed?  Perhaps I should break it off, lest others be tempted to
> go out on it as well:

No, no creaking here!

> Why, certainly, you can `archive' a device special file and then access
> the device via the archive.  What would you say if I told you I had
> added ar-file searching to the directory scanning code in namei?

I would say that you do not understand "functional units" in terms
of real computer systems arcetecture.  Why would you take a (bad)
directory search routine and increase it's "baddness coefficient"
by including archive searching?? and if your were to do that, wouldn't
it, BY DEFINITION, no longer be a directory search routine?

Who are you anyway?

> Insecure, yes, but on a single user workstation, so what?  (Note that

Too bad life is not "single worksataions" with no intrest in security isn't it?

> while `ar' ignores S_IFCHR and S_IFBLK on extraction, they do in fact
> appear in the header mode field.  It is eminently possible to scan ar
> files in the kernel as if they were directories.  Pointless, perhaps,
> but easy.)
> 
>>(1) device special files have no "contents" per-se and (2) The archive
>>does not preserve the "file" concept on an individual-entry basis.
> 
> % man 5 ar
> 
> AR(5)               UNIX Programmer's Manual                AR(5)
>      ...
>      A file produced by ar has a magic string at the start, fol-
>      lowed by the constituent files, each preceded by a file
>      header. ...
       ^^^^^^

You will note that the archive procedes the "files" (meaning their contents)
with HEADER(s).  *Headers* are NOT i-nodes.  There is no "file" in an
archive, only the "contents of the original system object" and a
sufficient quantity of information to RE-CONSTITUTE a file which would
have the same *SYSTEM DEPENDANT* information.

The file is not preserved; only the contents (and by induction, the
system-object level information, because nature is part of contents.)

>      Each file begins on a even (0 mod 2) boundary; a new-line is
>      inserted between files if necessary. ...
> 
> That should tell you that each `entry' is a `file'.  [Argument from
> authority: the manual says so :-) ]

SHAME SHAME... Quoting out of context again.  To whit:  [ AR(4)]

DESCRIPTION:
	The archive command ar(1) is used to combine several files into
	one.

(...also...)

	All information in the file member headers is in printable
	ASCII.

NONE of this preserves the "file"ness of the entries, and it all states
that the contributers are all reduced to "one (file)" so it is prima
facie that you do not understand that of which you speak.  If you don't
understand the difference between "a file" and "something that will let
you create a file."  I suggest you compare some *.o file (as a concept)
to using the "cc" command and an *.c file.  This is the same thing as saying
/usr/lib/libc.a is identical to using "ar" and the directory
/usr/src/libs/libc/*.o.  NOT BLODDY LIKELY.

In terms of "practical authority" I suggest you compare the contents of
<inode.h> and <ar.h>.  Archive entries are substancially variant from
WHATEVER your, or anybody elses, computers file-as-valid-system-object
concept is.

>>If you do not understand the difference between a  "system object" file,
>>and "the contents of a" file go to FILE MGMT. THEORY 101.  Do not pass
>>go, do not collect your next paycheck.
> 
> The difference is an *implementation* question again.  There is no
> fundamental reason that the kernel could not use `ar' format for
> directories that contain files with exactly one link.

I can think of things like "consistent refrencing"  "Raw device
information" "file length extension (e.g. oppen for append)"
"stream/socket information" "Open and closure tracking" and perhaps a
dozen reasons that a kernel could not use portable/common archive
formats for actual file manipulation.

The "FUNDAMENTAL PROBLEM" is that the ar format does not have the
flexability or space to provide the kernel with the things it needs to
work with (adding things to/changing the format makes it nolonger ar
format so don't go off on an "I'll just add..." kick; it would make you
look like a fool)

Additional problems include:  Having to read/lseek-past every entry
which procedes the entry you are intrested in.  No convient method for
going backwards without altering the format.  No "lookup" capibility.
Non-tabular format.  Inefficent storage method for random access of
contents.  (this list could be longer, but I have a life to get on
with.)

You can't just say "that's implementation dependant and so not
important"  because your statement is one on implementation.

> (Since the rest of the argument is built on this, I am going to skip
> ahead.)

Convienent way of avoiding personal falure, "I'll just skip it..."
Let me guess, you're a fundy, right?

>>As an excersize in intuition and deduction try the following:
> 
> [steps left out: essentially, decide how much time it would take to
>  link-edit by reading individual .o files instead of a .a file.]

Reading time/effort for a set nember of bytes X is identical no matter
where the X originates.  lseek is faster than open and close.  lseek
does not require any additional fiel table entries.  No steps were
ommited.

If I really had left anything out you would have mentioned them in some
detail instead of just deleting the entire thing and instering a
"fuzzing" generallity.

> I have already wasted more time and net bandwidth on this subject than
> I really cared to use; but here is a very simple timing comparison for
> a `hello world' program%.  Viz:
> 
> 	# N.B.: these were both run twice to prime the cache and
> 	# make the numbers settle.
> 
> 	% time ld -X /lib/crt0.o -o hw0 hw.o -lc
> 	0.5u 0.4s 0:02 52% 24+112k 41+3io 2pf+0w
> 	% time ld -X /lib/crt0.o -o hw1 *.o
> 	0.2u 0.4s 0:01 48% 25+98k 30+10io 2pf+0w
> 	%
> 
> Reading individual .o files is *FASTER*.  It took the *same* amount of
> system time (to a first approximation) and *less* user time to read
> the needed .o files than it did to read (and ignore the unneeded parts
> of) the archive, for a total of 33% less time.  It also took less memory
> space and fewer disk transfers.

While the extreme case, e.g. 1 object include, sometimes shows a
reduction, unfortunatley most of us compile things slightly more complex
than "hello world" programs.  Your second example also is a fraud in
that it didn't search through a directory containing all the *.o files
normally found in libc.a.  If it had you example would have failed.  In
your "good case" example you only search for the hw.o file mentioned in
the "bad case" portion not a directory containing many *.o files.

More clearly there is no "selection and resolution" phase involved in
your second example, by manually including all the objects ( with *.o )
you are instructing the loader to use "minimum" objects sepsified.  Your
example never invokes the unresolved refrences code that does all the
time consuming things we are discussing.

> -----
> % `hw.o' needs only a few .o files, but hey, I want the results to look
> good.
> -----
> 
> Now, there were only a few .o files involved in this case: hw1 needed
> only the set
> 
> 	_exit.o bcopy.o bzero.o calloc.o cerror.o close.o doprnt.o
> 	exit.o findiop.o flsbuf.o fstat.o getdtablesize.o getpagesize.o
> 	hw.o ioctl.o isatty.o lseek.o makebuf.o malloc.o printf.o
> 	read.o sbrk.o perror.o stdio.o write.o
> 
> which is only 25 out of a potential 317 (that includes a few dozen
> compatibility routines, over one hundred syscalls, etc.).  Real programs
> would need more .o files, and it will indeed require more open calls.
> There is another issue, which I shall address momentarily, and that
> is deciding which .o files are needed (which I did `by hand' above,
> so that it does not count in the output from `time').

So you admit that you didn't scan the full 317, nor the directory that
contained a full 317, you only took the files you needed.  Invalidating
your example.

If you had scanned the full 317 in example 2 using the command indicated
the resulting executable would have been HUGE and this size difference
alone would be the penalty for the "speed" you "gained."  You can, after
all, include any unrelated objects in a load that you chose, so the
loader doesn't have to think about the load much, so it runs faster, so
it wastes time.

>>>>How many times would you have to scan the contents of /usr/lib/*.o to
>>>>load one relatively complex c program (say vn).
> 
>>>Either one time, or (preferably) zero times.
> 
>>A library directory you never scan would be useless. Ncest' Pa?
>>[sic ;-)]
> 
> (Ne c'est pa, if anyone cares... ne ~= not, c'est ~= is, pa ~= way:
> is not that the way.)

> Clearly we are not communicating.
> 
> The linker need not `look at' any .o files.  Its task is to link.  To
> do this it must know which files define needed symbols, and which
> symbols those files need, and so forth, recursively, until all needed
> symbols are satisfied.  Now, how might ld perform that task?
> 
> For an archive random library---/lib/libc.a, for instance---it does not
> scan the entire archive.  It pulls one sub-file out of the archive,
> __.SYMDEF.  This file lists which symbols are defined by which files.
> It does not now list which symbols are needed by which files, but it is
> easy to imagine that, in a new scheme, the file that takes its place
> does.
> 
> So what ld might do, then, is read the `.symtab' file and, using that
> information, recursively build a list of needed .o files.  It could
> then open and link exactly those .o files---never touching any that are
> not needed.  If your C program consists of `main() {}', all you need
> is exit.o.  ld would read exactly two files from the C library.  And
> hey presto! we have scanned the contents of /lib/libc/*.o zero times.
> If your C program was the hello-world example above, ld would read
> exactly 26 files (the 25 .o's plus .symtab)---and again, scan the
> contents of /lib/libc/*.o zero times.

Compare this to:

Read each .a file once.  Juggle the same pointers necessary in both
examples.  Write output.  exit.

LD(1) says that:  If any argument is a library, it is searched exactly
once at the point it is encountered in the argument list.  (order is
only significant in symbol conflict, etc.)

>>I can garentee at least two scans.
> 
> Perhaps you mean something else here: the number of times the kernel
> must look at directory entries to find any given .o file.  If
> directories were as fast as they should be, the answer would be `1'.
> (Consider, e.g., a Unix with hashed directories.)

(in terms of directory scanning)

Scan #1:  Looking for the directories mentioned (e.g. scanning parent
	directories)
Scan #2:  Looking for the .Symtab file.  Repeat #3 for each "archive"
	named.  e.g. /usr/lib/libcurses /usr/lib/libc /usr/lib/libm
	or whatever
Scan #n:  Looking for individual .o files  (then of course there is the
	opening and reading and closing of whatever.)
[Scanning can be reduced with potentially infinite disk buffering, but
who has infinite real memory to put it in?]

Please compare this to:

Scan #1:  Looking for the files mentioned (then of course there is the
	opening and reading and closing of whatever single files.)

Your example is artifically fast because you already did the search and
extract phase manually.  what was the "time" on that?

>>There are a few things I don not have to *test* to know. ... I do not
>>have to implement a looser of a library scheme using a symbol table
>>file and individual object files to know that it is a dumb idea.
>>[elipisis (...) represent conviently removed example of keyed
>>directory scanning (usenet) and arguments as to it's similarity to the
>>system purposed.]
> 
> But if you do not test them, you may still be wrong.  See the hello-
> world example above.  Linking individual .o files is sometimes *faster*
> ---even in the current (4.3BSD-tahoe) system.  And my original point
> still stands: if archives are a necessary efficiency hack, there may
> be something wrong with the underlying system.  I think that, in
> principle, they *should* be unnecessary, and we should come up with
> a way to make them so.  [But I must admit that it is not high on my
> priority list.]

As already stated, your "hello world" example is not accurate because
you did the extraction manually before hand, instead of having the
linker do an intellegent extract based on a symbol table.  The linker
will load as many arbitrary .o files as you like, and quite a lot faster
than the normal simbol refrencing and lookup which is encountered in
both schemes.  In your example there were no unresolved symbols nor
selective loading of objects done by the linker because you had doen the
selecting before hand.  How long did the selecting take you?  was it
longer then the .3u?

Rob.

campbell@redsox.UUCP (Larry Campbell) (12/23/88)

Why not keep directories sorted?  In SysV filesystems this is easy and
relatively inexpensive, since you can assume a fixed 16 bytes per name.
I am also assuming that lookups outnumber creations to a huge degree,
which I'm sure is the case.

Then namei becomes a binary search.

It also means ls doesn't need to sort things any more.
-- 
Larry Campbell                          The Boston Software Works, Inc.
campbell@bsw.com                        120 Fulton Street
wjh12!redsox!campbell                   Boston, MA 02146

bzs@Encore.COM (Barry Shein) (12/24/88)

It seems to me the whole point is that I can create a file __.SYMDEF
in my object directory and have ld exploit it if it's there, at which
point almost all the complaints become moot.

	-Barry Shein, ||Encore||

jfh@rpp386.Dallas.TX.US (The Beach Bum) (12/26/88)

In article <4468@xenna.Encore.COM> bzs@Encore.COM (Barry Shein) writes:
>It seems to me the whole point is that I can create a file __.SYMDEF
>in my object directory and have ld exploit it if it's there, at which
>point almost all the complaints become moot.

Not quite the complete solution.  You would need a name cache [ thanks
Chris for pointing out how wonderful these are ... ] to reduce the
file name lookup overhead.  Large directories are VERY inefficient to
access.  Frequent lookups in large directories even more so.
-- 
John F. Haugh II                        +-Quote of the Week:-------------------
VoiceNet: (214) 250-3311   Data: -6272  |"Unix doesn't have bugs,
InterNet: jfh@rpp386.Dallas.TX.US       | Unix is a bug"
UucpNet : <backbone>!killer!rpp386!jfh  +--              -- author forgotten --

chris@mimsy.UUCP (Chris Torek) (12/26/88)

(I shall not bother much with technical points, but just clear up a few
things, after which I intend not to say anything further on this
subject.  Winning arguments in comp.unix.wizards, or having the last
word, is not that important to me.  Feel free to press `n' now :-) .
One `>' is rwhite@nusdhub.UUCP (Robert C. White Jr.); two `>>'s is me,
etc.)

First: my original point was a philosophical stance.  If I may
paraphrase Ken Thompson: File formats fill a much-needed gap in the
Unix operating system.  (He may have meant `in the kernel', but I think
the gap is needed everywhere, inasmuch as it is feasible to maintain
it.)  Perhaps the Unix file system *should* be flexible enough not to
require archive libraries for efficiency.  If it is not so, perhaps the
Unix file system is a failure.  [Douse the flames; I said *PERHAPS*!
:-/  Please take careful note of opinion words.  I am also trying not
to insult.]

Now, Mr. White took me to task for (as I understand it) giving even a
moment's consideration to the possibility that, in modern Unix file
systems---by which I mean the like of the BSD FFS or the V9 FS;
rehacked V7 implementations like those in the System V releases I know
of might not apply---directories might in fact suffice to replace `ar'
format libraries.  I said, in essence:  `Maybe it could work!  And if
it does work, maybe we can get rid of this file format, flush the extra
code from make(1) [see parallel discussion of BSD make vs archives],
recreate that much-needed gap, and live happily ever after.'

He (correctly) accused me of not having tested the proposition at all,
and then suggested

>>>... an excersize in intuition and deduction try the following:

Here we had an unfortunate misunderstanding:  I wrote (including the
square brackets):

>>[steps left out: essentially, decide how much time it would take to
>> link-edit by reading individual .o files instead of a .a file.]

This was poor wording on my part, for which I apologise.  I did not
mean `left out of the exercise', I meant `left out of my summary;
not copied into the >>>-level quote'.

At any rate, I then performed a small `proof of concept' test: the kind
of thing that a researcher should do before investing much effort in
some approach, to see whether the effort is worthwhile.  If the `proof'
(which here means `test') fails, the concept is not worth pursuing, not
in that form.  I was quite surprised to find that my simplified test
showed that directory libraries might possibly be *faster* than the
existing archive random library format, in some cases.  The concept
tested well, and therefore might be worth further work, investing
more effort.  It might still be hopeless, of course.  But it looked
promising.

Mr. White then insults my simple test (which, please note, I did not
claim to be a thorough example, or even a typical one---whatever that
may be):

>While the extreme case, e.g. 1 object include, sometimes shows a
>reduction, unfortunatley most of us compile things slightly more complex
>than "hello world" programs.  Your second example also is a fraud in
>that it didn't search through a directory containing all the *.o files
>normally found in libc.a.  If it had you example would have failed.  In
>your "good case" example you only search for the hw.o file mentioned in
>the "bad case" portion not a directory containing many *.o files.

Ah, but I did!  I must admit that my first test was on a small directory,
with only the needed .o files; but I repeated the test, this time using
the sequence:

	mkdir tmp
	cd tmp
	ar x /lib/libc.a

to fill the directory with .o files.  The times taken to link were,
within the resolution afforded (4.3BSD-tahoe on a VAX 8250),
identical.  The current BSD file system does not exhibit the O(n^2)
directory lookup times that more primitive V7-based systems still do;
directory lookup time is somewhere between O(1) and O(n), leaning
heavily towards O(1) (the namei cache averages about 85% effective on
our systems as they are currently used; cache hits are O(1); but I do
not care to guess as to what effects directory libraries might have
on this).

>More clearly there is no "selection and resolution" phase involved in
>your second example, by manually including all the objects ( with *.o )
>you are instructing the loader to use "minimum" objects sepsified.

As I myself pointed out.  Yet:

>Your example never invokes the unresolved refrences code that does all
>the time consuming things we are discussing.

I thought your claim was that the time-consuming part was in opening
N (here 26) files instead of 2.  That time is counted in `system' time.
The system times for the two links were identical.

>So you admit that you didn't scan the full 317, nor the directory that
>contained a full 317, you only took the files you needed.  Invalidating
>your example.

The directory contained all 317.  Ld opened only 25 of those---yet it
took ld no more time to open and read those 25 than it took it to open
and read /lib/libc.a.  One has to wonder why this is so.  (Proof-of-
concept: test! and then think, and test again!)

There are three obvious possibilities: (a) ld is hopeless; (b) the
open() syscall is cheap; (c) the cost of 25 opens is equalled and/or
exceeded by the cost of skipping the 292 files (oh very well, `archive
members' :-) ) contained within /lib/libc.a that were not needed by
this (overly) simple program.  Options (a) and (c) seem likeliest to
me, but the fact remains that *something* is up.

>>So what ld might do, then, is read the `.symtab' file and, using that
>>information, recursively build a list of needed .o files.  It could
>>then open and link exactly those .o files---never touching any that are
>>not needed. ...

>Compare this to:
>Read each .a file once.  Juggle the same pointers necessary in both
>examples.  Write output.  exit.
>LD(1) says that:  If any argument is a library, it is searched exactly
>once at the point it is encountered in the argument list.  (order is
>only significant in symbol conflict, etc.)

LD(1) lies.  At least, it does in 4.3BSD.  The `searched exactly once'
is one of those Little White Lies told to simplify the situation.  It
really means that ld will not back up over its argv list.  If, e.g.,
you ask it to read -lc, then -lI77, you may get unresolved references
from libI77 to libc.  In actuality, ld works something like this:

	do {
		hits = 0;
		for (p = ranlib_list; p < last; p++) {
			if (need_symbol(p->name)) {
				read_member(p->offset);
				hits++;
				off = p->offset;
				while (++p < last && p->offset == off)
					/* void */;
			}
		}
	} while (hits > 0);

This sequence causes such a slew of syscalls---read()s and lseek()s---
that ld can open 25 files in the same amount of system time.  (It will
read several times in the presence of loops in the call graph, since it
is attempting to maintain a partial order.  There are better ways....)
One might wonder, then, if perhaps the BSD file system has file-opening
sufficiently `jazzed up' that any loss of performance in changing from
the current scheme---or better, from an improved edition thereof---to a
directory scheme is small enough to ignore, for the sake of convenience
(just whose convenience, I will not spell out here).

Certainly nothing is free; and certainly name lookups seem inherently
more expensive than the lseek()s and read()s required by the current
scheme---and, before this began, I would have guessed they would be so
much more expensive that my simple test would fail.  Yet I *do* have to
test to know.  So I did, and now I know that directory libraries are
not utterly unworkable.  Less efficient in general, perhaps (though I
do not yet *know* this); but there are many aspects of existing Unix
systems that are less efficient than they might be, usually in the name
of convenience.  And even if, upon further testing, I were to deem the
scheme a failure after all, my philosophical objection still stands:
If name lookups were cheaper, how much would the whole system benefit?
If small files were stored more compactly---libraries do, in the
current system, save space---how much disk might we reclaim?  In
short:  If the file system made libraries unnecessary, would Unix
improve?
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

peter@thirdi.UUCP (Peter Rowell) (12/29/88)

The following may have been suggested or inferred during this discussion,
but I haven't seen it yet, so here goes.....:

What if a "library" was simply an editable file that contained the
names (possibly including *'s and such) of interesting .o files.
Additionally, there could be an optional SYMDEF file that had the
already-munched global symbol info in it.

The benefits include:

    1.	Files in a library can live anywhere they want, not just
	in a single directory.

    2.	There is no rebuild time. Changing one of the named .o
	files implicitly updates the library as seen by the loader.

    3.	Order can be specified if it is important.

    4.	The storage space involved is trivial.

    5.	The optional SYMDEF file would supply a performance boost
	for frequently accessed libraries (e.g. libc.a).

    6.	If the library-definition-file or any of the "touched" .o's
	(i.e. you were actually going to use it) was younger than a
	pre-existing SYMDEF file, a rebuild of the SYMDEF file is
	done automatically. (creeping featurism!)

This seems to handle what I was getting from Chris Torek's original
posting (quick, cheap libraries), and seems to handle the objections
of some of the other people.  It should be absolutely trivial to add
the first part (the library file itself), and pretty straightforward
to do the SYMDEF stuff.

Comments?

----------------------------------------------------------------------
Peter Rowell				"He made a symbolic gesture."
Third Eye Software, Inc.		+1 415 321 0967
Menlo Park, CA  94025 USA		peter@thirdi.UUCP

jfh@rpp386.Dallas.TX.US (The Beach Bum) (12/30/88)

In article <445@thirdi.UUCP> peter@thirdi.UUCP (Peter Rowell) writes:
>What if a "library" was simply an editable file that contained the
>names (possibly including *'s and such) of interesting .o files.
>Additionally, there could be an optional SYMDEF file that had the
>already-munched global symbol info in it.

For systems with symlinks this reduces to a directory full of symbolic
links.  The `file' containing the names is the directory itself.  Even
without symbolic links, the only restriction is that the files remain
in the same partition or disk.

I'm holding out for a namei cache Chris' ld code ;-)
-- 
John F. Haugh II                        +-Quote of the Week:-------------------
VoiceNet: (214) 250-3311   Data: -6272  |"I don't stick my tongue out at just
InterNet: jfh@rpp386.Dallas.TX.US       | anybody, only you."  -- Helena Wright
UucpNet : <backbone>!killer!rpp386!jfh  +--------------------------------------

allbery@ncoast.UUCP (Brandon S. Allbery) (12/30/88)

As quoted from <1282@nusdhub.UUCP> by rwhite@nusdhub.UUCP (Robert C. White Jr.):
+---------------
| in article <15126@mimsy.UUCP>, chris@mimsy.UUCP (Chris Torek) says:
| > 
| > In article <1278@nusdhub.UUCP> rwhite@nusdhub.UUCP (Robert C. White Jr.)
| > writes:
| >>Wrong-O kid!  An archive library is a "File which contains the
| >>original contents of zero-or-more external sources, usually text or
| >>object files, which have been reduced to a single system object."
| > 
| > This is an implementation detail.
| 
| Where do you get this "implementation has nothing to do with it" bull?
+---------------

It comes from the fact that when I want to link an object with a library, I
don't care if it's an "ar" archive, "cpio" archive (and btw, your "ar doesn't
preserve file-ness because it archives the contents" argument falls flat on
its silly face when applied to tar or cpio), directory, or little glass
beads.  As long as it works reasonably quickly (and THIS is an implementation
detail; namei CAN be speeded up) I don't care about the implementation.

You're missing the forest for the trees.

++Brandon
-- 
Brandon S. Allbery, comp.sources.misc moderator and one admin of ncoast PA UN*X
uunet!hal.cwru.edu!ncoast!allbery		    ncoast!allbery@hal.cwru.edu
comp.sources.misc is moving off ncoast -- please do NOT send submissions direct
      Send comp.sources.misc submissions to comp-sources-misc@<backbone>.

peter@thirdi.UUCP (Peter Rowell) (12/31/88)

In article <10485@rpp386.Dallas.TX.US> jfh@rpp386.Dallas.TX.US (The Beach Bum) writes:
>In article <445@thirdi.UUCP> peter@thirdi.UUCP (Peter Rowell) writes:
>>What if a "library" was simply an editable file that contained the
>>names (possibly including *'s and such) of interesting .o files.
>>Additionally, there could be an optional SYMDEF file that had the
>>already-munched global symbol info in it.
>
>For systems with symlinks this reduces to a directory full of symbolic
>links.  The `file' containing the names is the directory itself.  Even
>without symbolic links, the only restriction is that the files remain
>in the same partition or disk.

Actually, it doesn't reduce to a directory because you have no
control over order of evaluation in a directory.  Also, the statement
"For systems with symlinks" hardly is inclusive of all OS's that
can reasonably claim to be UNIX.  Finally, my suggestion easily can be
made to work under *any* OS, not just those with symlinks,
cheap directories and tweaked namei cacheing.

The suggestion may have been simple, but it was not simplistic.

jc@minya.UUCP (John Chambers) (01/08/89)

In article <445@thirdi.UUCP>, peter@thirdi.UUCP (Peter Rowell) writes:
> 
> What if a "library" was simply an editable file that contained the
> names (possibly including *'s and such) of interesting .o files.
> Additionally, there could be an optional SYMDEF file that had the
> already-munched global symbol info in it.
> 
> The benefits include:
> 
[ benefits deleted ]
> 
> This seems to handle what I was getting from Chris Torek's original
> posting (quick, cheap libraries), and seems to handle the objections
> of some of the other people.  It should be absolutely trivial to add
> the first part (the library file itself), and pretty straightforward
> to do the SYMDEF stuff.
> 
> Comments?

Yeah.  I've worked off and on with Intermetrics' set of compilers (the
one designed for doing cross-compiling and downloading for standalone,
embedded microprocessors).  They do exactly this sort of thing.  It
works just fine.  The performance of the linker seems to easily match
the Unix linkers, though the Intermetrics linker actually does quite
a bit more.  They also have "time wasting" features like object files
that are actually printable ASCII.  They still run quite fast.

As for the cost of all the extra directory searching:  Do you think 
that archives don't contain a directory, or that searching through 
them is free?  Why should one expect that the linker's search through 
an (unsorted) archive directory will be any faster than the kernel's?

-- 
John Chambers <{adelie,ima,maynard,mit-eddie}!minya!{jc,root}> (617/484-6393)

[Any errors in the above are due to failures in the logic of the keyboard,
not in the fingers that did the typing.]

rbj@nav.icst.nbs.gov (Root Boy Jim) (01/10/89)

? From: "Robert C. White Jr." <rwhite@nusdhub.uucp>
? in article <15126@mimsy.UUCP>, chris@mimsy.UUCP (Chris Torek) says:
? > In article <1278@nusdhub.UUCP> rwhite@nusdhub.UUCP (Robert C. White Jr.)
[ bantering deleted ]

Chris is being overly modest and polite in even deigning to carry on a
conversation with you, Mr. White. You are clearly out in left field.

Perhaps the most significant question you asked was:

? Who are you anyway?

Hint: find a 4.3 BSD manual and look in the Preface.
Or just listen in here for awhile.

	(Root Boy) Jim Cottrell	(301) 975-5688
	<rbj@nav.icst.nbs.gov> or <rbj@icst-cmr.arpa>
	Crackers and Worms -- Breakfast of Champions!

allbery@ncoast.UUCP (Brandon S. Allbery) (01/15/89)

As quoted from <7@minya.UUCP> by jc@minya.UUCP (John Chambers):
+---------------
| As for the cost of all the extra directory searching:  Do you think 
| that archives don't contain a directory, or that searching through 
| them is free?  Why should one expect that the linker's search through 
| an (unsorted) archive directory will be any faster than the kernel's?
+---------------

Good point.  In fact, "ar" archives (on systems lacking ranlib or the System
V archive TOC) *don't* have indexes; which, in the worst case, leads to ld
reading the entire archive to discover that what it's looking for isn't
there.  (This can happen if I include an unnecessary archive reference, e.g.
"-ltermcap" when I'm not using termcap.  This is optimal?

In fact, "namei" is *less* expensive than searching through a library index
in user code, unless namei() can't be swapped out (I don't see why, the
state is tucked away inside a process's ublock, not global).

[BTW, I notice that my friend of the "an archive and a file are completely
different, you can't exchange one for the other" pursuasion is being
silent.  Have I actually managed, for the first time in 6 months, to say
something in this group that contained some intelligence?  Given my past
record of flubs, I find that difficult to believe.  ;-) :-( ]

++Brandon
-- 
Brandon S. Allbery, moderator of comp.sources.misc    allbery@ncoast.org (soon)
uunet!hal.cwru.edu!ncoast!allbery		    ncoast!allbery@hal.cwru.edu
      Send comp.sources.misc submissions to comp-sources-misc@<backbone>
NCoast Public Access UN*X - (216) 781-6201, 300/1200/2400 baud, login: makeuser

buck@siswat.UUCP (A. Lester Buck) (01/16/89)

As quoted from <7@minya.UUCP> by jc@minya.UUCP (John Chambers):
+---------------
| As for the cost of all the extra directory searching:  Do you think 
| that archives don't contain a directory, or that searching through 
| them is free?  Why should one expect that the linker's search through 
| an (unsorted) archive directory will be any faster than the kernel's?
+---------------

When AIX 2.2 makes a new kernel, it bursts the lib0 and lib1 archives,
links, and deletes *.o.  Makes a boring modern art pattern as
the screen overflows with the link, then overflows again with the rm.
Other Sys VR2 kernel makefiles I have used don't do this.

Is this a common practice?  Is this an efficiency gain, or is there
some other reason for bursting the archives?

-- 
A. Lester Buck		...!uhnix1!moray!siswat!buck