[comp.unix.wizards] IEEE 1003.2

barmar@think.COM (Barry Margolin) (12/12/88)

In article <9137@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>) writes:
>Along these lines, does anybody know what on earth has possessed the
>IEEE 1003.2 working group?  They seem to be redesigning the standard
>utilities, in almost every case making them worse instead of better.
>They even had to debate whether "ar" should be usable with non-object
>module files!  (The latest minutes show that the -r option has been
>removed from "ar"; I sure hope that's not true!)  Somehow I missed
>getting into the ballotting group for 1003.2, but I sure hope that
>there are enough proponents of clean design to keep the current mess
>from becoming a standard that will adversely affect the systems we
>have to use in the future.

As I understand it, POSIX is just a minimum.  Just because POSIX
doesn't require a command to have a particular option or capability,
that doesn't mean that vendors must remove that feature.  So, the
point is not whether "ar" CAN be usable with non-object module files,
but whether it MUST be usable with them.  Sounds like they are
requiring "ar" to be enough to create object libraries, not a
general-purpose archiving utility.

Systems that aren't really Unix, but wish to provide POSIX
compatibility, will appreciate that IEEE 1003.2 isn't including all of
Unix in the standard.  Regarding this particular example, many already
have their own general-purpose archiving utility, so they wouldn't
need a full-featured "ar".


Barry Margolin
Thinking Machines Corp.

barmar@think.com
{uunet,harvard}!think!barmar

chris@mimsy.UUCP (Chris Torek) (12/12/88)

>In article <9137@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn) writes:
>>They even had to debate whether "ar" should be usable with non-object
>>module files! ...

In article <33251@think.UUCP> barmar@think.COM (Barry Margolin) writes:
>... the point is not whether "ar" CAN be usable with non-object module
>files, but whether it MUST be usable with them.  Sounds like they are
>requiring "ar" to be enough to create object libraries, not a
>general-purpose archiving utility.

For that matter, why do we need object archives in the first place?
They are just a hack to save space (and perhaps, but not necessarily,
time).  How about /lib/libc/*.o?

(About 1/2 :-) ---the file system is *supposed* to be clean enough and
fast enough to support this sort of thing; why *are* we working against
it?)
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

ekrell@hector.UUCP (Eduardo Krell) (12/12/88)

In article <14946@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes:

>For that matter, why do we need object archives in the first place?
>They are just a hack to save space (and perhaps, but not necessarily,
>time).  How about /lib/libc/*.o?

inodes were a scarce resource back then ...

We had an implementation of archives as plain directories (with the proper
changes to ar, ld, etc), and it worked just fine.
    
Eduardo Krell                   AT&T Bell Laboratories, Murray Hill, NJ

UUCP: {att,decvax,ucbvax}!ulysses!ekrell  Internet: ekrell@ulysses.att.com

gwyn@smoke.BRL.MIL (Doug Gwyn ) (12/13/88)

In article <33251@think.UUCP> barmar@kulla.think.com (Barry Margolin) writes:
>So, the point is not whether "ar" CAN be usable with non-object module files,
>but whether it MUST be usable with them.  Sounds like they are
>requiring "ar" to be enough to create object libraries, not a
>general-purpose archiving utility.

But MY point is that "ar" is practically the ONLY UNIX utility to have been
hammered over the years into producing a completely portable format (when
used for text files).  Even cpio -c and tar formats fail to be as portable.
This was a deliberate design choice that was eventually adopted for the
major UNIX variants.  The evolution to that format involved some fairly
painful accommodations in SGS software, but the price was considered
worthwhile in trade-off for the benefits of a portable character format.
For a group supposedly concerned about portability to be apparently unaware
of this important historical fact makes me wonder also about the other
choices they're making.

rwhite@nusdhub.UUCP (Robert C. White Jr.) (12/13/88)

in article <14946@mimsy.UUCP>, chris@mimsy.UUCP (Chris Torek) says:
> For that matter, why do we need object archives in the first place?
> They are just a hack to save space (and perhaps, but not necessarily,
> time).  How about /lib/libc/*.o?

Wrong! cammel breath ;-)

The *proper* use of object libraries is to *organize* your objects into
a usefull search order.  How many times would you have to scan the contents
of /usr/lib/*.o to load one relatively complex c program (say vn).

As modules called modules that the program itself didn't use, you introduce
the probability that the directory would have to be searched multiple times.
If you tried to aleviate that the files would have to be ordered by names
that reflected dependancies instead of content.  Then you would have all the
extra system calls that would spring up to open, search, and close all those
files.

A properly linked and tabled library is much better for such things.

Rob.

pcg@aber-cs.UUCP (Piercarlo Grandi) (12/14/88)

In article <9154@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>)
writes:

    But MY point is that "ar" is practically the ONLY UNIX utility to have
    been hammered over the years into producing a completely portable format
    (when used for text files).

I do agree! Ar format is the great obscure portable format!

    Even cpio -c and tar formats fail to be as portable.

I have a small reservation about this.  Actually (and unfortunately) ar(5) is
not as portable in practice as it is in theory.

The problem is that the length of each archive member is encoded in its
header, so that if tabs are (un)expanded or lines get padded to a fixed
length or trailing white space is trimmed, you are quite out of luck.
(as in uploading/downloading something using cu/tip to/from a "dumb" system).

Of course you can still recognize the member headers by the leading "magic
string" and check for the subsequent structure, but existing ar(1) utilities
rely absolutely on the chacter count.

Another archival format, Unix mailbox format, is, as far as I can see,
absolutely portable, but then it is not especially convenient for storing
files (well, I do that, for sources fished off the net).

All this discussion of portability, to one end: there have been some
complaints about the shar file format, and a search for alternatives. In my
view ar(5) format can be a very good alternative; if no funny things are
expected on white space, ar(5) is probably the best format, as virtually
everybody has ar(1) extractor or can build one in virtually no time (using
shell scripts...), and at worst, you can just edit the ar(5) file.

Mailbox format is an alternative, as there are lots of mailers that
understand it and can give a nice view of a mailbox archive. Extraction to
named file is less easy, and usually has to be manual (e.g. by using the "w"
command in Mail/mailx), and you have to undo manually any '>' insertion
done before lines beginning with "From " in you files.
-- 
Piercarlo "Peter" Grandi			INET: pcg@cs.aber.ac.uk
Sw.Eng. Group, Dept. of Computer Science	UUCP: ...!mcvax!ukc!aber-cs!pcg
UCW, Penglais, Aberystwyth, WALES SY23 3BZ (UK)

bzs@Encore.COM (Barry Shein) (12/16/88)

In article <14946@mimsy.UUCP>, chris@mimsy.UUCP (Chris Torek) writes:
> For that matter, why do we need object archives in the first place?
> They are just a hack to save space (and perhaps, but not necessarily,
> time).  How about /lib/libc/*.o?
> 
> (About 1/2 :-) ---the file system is *supposed* to be clean enough and
> fast enough to support this sort of thing; why *are* we working against
> it?)

For that matter why not just combine tar and ar and add a flag to tar
to include an archive symbol table (and have tar recognize this has
been done on input.) It seems the two functions of these utilities
barely needs to be distinguished. A simple shell script could replace
"ar" for backwards compatability. Or vice versa.

	-Barry Shein, ||Encore||

dhesi@bsu-cs.UUCP (Rahul Dhesi) (12/17/88)

No archive format that allows any of the following can be considered
portable:

    lines consisting of a single dot
    lines beginning with the word "From"
    lines ending with blanks
    lines containing embedded tab characters
    lines containing control characters
    lines containing graphic characters (e.g. { } \ |)
       for which there is no standard EBCDIC convention
    lines that exceed 80 characters

There is no existing portable text archive format.
-- 
Rahul Dhesi         UUCP:  <backbones>!{iuvax,pur-ee}!bsu-cs!dhesi

gwyn@smoke.BRL.MIL (Doug Gwyn ) (12/17/88)

In article <5203@bsu-cs.UUCP> dhesi@bsu-cs.UUCP (Rahul Dhesi) writes:
>There is no existing portable text archive format.

You seem to be thinking of mailer problems.  That's not what
I had in mind when discussing portable archive formats.

Discussion of whether or not a file format is portable should
start by assuming that it is bit-for-bit transferred without
corruption of information it contains to the target system.
Then you should consider whether a reasonable program that
interprets the archive's contents would work without change
and without system-specific tailoring on both systems.

wnp@dcs.UUCP (Wolf N. Paul) (12/17/88)

In article <5203@bsu-cs.UUCP> dhesi@bsu-cs.UUCP (Rahul Dhesi) writes:
>No archive format that allows any of the following can be considered
>portable:
>
>    ... lines deleted ...
>    lines containing embedded tab characters
>    lines containing control characters
>    lines containing graphic characters (e.g. { } \ |)
>       for which there is no standard EBCDIC convention
>There is no existing portable text archive format.

Well, what is an archive program to do with these instances in order to
achieve portability????

Some DATA is non-portable, and any archive containing such data as a 
result is non-portable, also. That does not make the archive format
non-portable.

Since on some systems, lines containing all of the above are both allowed
and sometimes necessary (certainly for binaries), and archive format
which does not allow them would not be portable.

Unless of course you mean that an archive program should handle such
lines in a manner analogous to uuencode or atob; but why make that part
of the archiver?
>-- 
>Rahul Dhesi         UUCP:  <backbones>!{iuvax,pur-ee}!bsu-cs!dhesi


-- 
Wolf N. Paul * 3387 Sam Rayburn Run * Carrollton TX 75007 * (214) 306-9101
UUCP:     killer!dcs!wnp                 ESL: 62832882
DOMAIN:   dcs!wnp@killer.dallas.tx.us    TLX: 910-380-0585 EES PLANO UD

gordon@sneaky.TANDY.COM (Gordon Burditt) (12/19/88)

>For that matter why not just combine tar and ar and add a flag to tar
>to include an archive symbol table (and have tar recognize this has
>been done on input.) It seems the two functions of these utilities
>barely needs to be distinguished. A simple shell script could replace

This is likely to cost quite a bit in disk space to store libraries, and
would probably at least double storage requirements.  

Lots of the object files in libraries are tiny.  A considerable number of
them just load a couple of registers and execute some kind of trapping
instruction.  These are likely to be smaller than the amount of disk space
used by an inode plus a directory entry.  

Take, for example, the libc.a on my system (Tandy 6000).
There are 202 files in libc.a, and it takes up 211 512-byte blocks, not
counting indirect blocks.  The average size of an object file, including
the ar header, is 534 bytes.  

If you used tar, the same library would take up 404 512-byte blocks,
minimum, and the average size of an object file, including the tar
header, would be at least 1024 bytes.  (None of the object files are
empty, so each would take up 1 block for the data plus one for the
tar header.)  Actually the total size would be closer to 530 blocks.

If you used the (Sys V) file system with a 512-byte blocksize, the files 
would take up 328 512-byte blocks for data, plus 26 blocks for inodes, 
plus 7 blocks for directory entries.  (calculated using the SysV file 
system, but the BSD file system would use about the same for directory 
entries), for a total of 361 512-byte blocks.

If you used the (Sys V) file system with a 1024-byte blocksize, the
files would take up about 492 512-byte blocks, plus 26 blocks for 
inodes and 8 blocks for directory entries, for a total of 526 512-byte 
blocks.

Summary:
Method		Size
ar		211
Filesys 512	361
Filesys 1024	526
tar		530

Disk space is NOT the only important feature about an object library
format.  It's not totally unimportant, either.

Disk I/O for typical link:
	These are all fairly close to the same, as long as a convention
exists to find the index easily.  Tar formats would probably have to
adopt a method of reading the last block and scanning backwards to find
the index.

Ease of updating:
	Filesystem libraries are a big win.  Tar loses, since it adds 
replacement object files on the end without removing the old one, thus, 
the library is perpetually growing.  The index-builder also has to be 
careful not to include references to outdated modules.  
	
Portability:
	Filesystems are not very portable by themselves, but there
are plenty of tools to package bunches of files (like tar) for
transportation.  Tar and ar formats are reasonably portable
(I did not say mailable or postable).  Of course, object files
themselves have limited portability.

					Gordon L. Burditt
					...!texbell!sneaky!gordon

kai@uicsrd.csrd.uiuc.edu (12/20/88)

> In article <14946@mimsy.UUCP>, chris@mimsy.UUCP (Chris Torek) writes:
> For that matter, why do we need object archives in the first place?
> They are just a hack to save space (and perhaps, but not necessarily,
> time).  How about /lib/libc/*.o?

Object libraries serve other functions as well (for some people).

Unless stored in object libraries, routines that reference each other will
not have their external references resolved correctly unless the object files
are linked in the correct order.  Put them in an library that was processed
by ranlib (or newer versions of ar), and ld handles it okay.  Maybe this is a
problem with ld.

Recursive references between two or more objects?  No matter how you order
the objects, you can forget it without libraries.  Okay, maybe this is
another problem with "ld".

Large numbers of object files?  You've apparently never worked on a program
so huge that */*.o expands to overflow the shell's command line buffer, so
there is absolutely no way to link without storing them all in a library
first.

You have a number of objects and want to test a new version of a single
object?  Well, cc -o testpgm main.o mytest.o mylib/*.o won't work, because
you'll get the original mytest.o as well, causing link errors, however cc -o
testpgm main.o mytest.o -lmylib does work.

Patrick Wolfe  (pat@kai.com, kailand!pat)

andrew@alice.UUCP (Andrew Hume) (12/20/88)

Rob, before badmouthing other people's ideas, check your facts.
Many Unixes (BSD, System V, ...) have symbol tables for their object archives,
the same thing could be done for directories. Even a simple-minded ld
could arrange things to remember the symbol table from each .o it looks
at and thus read each .o at most twice.

as one might say, as obvious as the nose on your face.

rml@hpfcdc.HP.COM (Bob Lenk) (12/22/88)

> But MY point is that "ar" is practically the ONLY UNIX utility to have been
> hammered over the years into producing a completely portable format (when
> used for text files).  Even cpio -c and tar formats fail to be as portable.

Interchange formats are part of the charter of 1003.1, not 1003.2.
1003.1 standardized both cpio -c and extended tar.  As far as I know, ar
format was never considered.  I'm relatively sure it wasn't brought up
in anyone's ballot.  It seems inappropriate for 1003.2 to attempt to
standardize ar as a better interchange format.

1003.1 is aware of limitations to extended tar and cpio, and is
considering proposals for (hopefully one) better archive/interchange
format.  Inputs to the 1003 mailing list and/or working group meetings
are welcome.  Informal inputs in forums like this may also prove useful.

		Bob Lenk
		hplabs!hpfcla!rml
		rml%hpfcla@hplabs.hp.com

allbery@ncoast.UUCP (Brandon S. Allbery) (12/26/88)

As quoted from <1269@nusdhub.UUCP> by rwhite@nusdhub.UUCP (Robert C. White Jr.):
+---------------
| in article <14946@mimsy.UUCP>, chris@mimsy.UUCP (Chris Torek) says:
| > For that matter, why do we need object archives in the first place?
| > They are just a hack to save space (and perhaps, but not necessarily,
| > time).  How about /lib/libc/*.o?
| 
| Wrong! cammel breath ;-)
| 
| The *proper* use of object libraries is to *organize* your objects into
| a usefull search order.  How many times would you have to scan the contents
| of /usr/lib/*.o to load one relatively complex c program (say vn).
+---------------

Why can't there be /lib/libc/__.SYMDEF?  It could even keep the same format
as it has now.  After all, it's just stored as another file in the archive.
(So is the System V archive symbol table, although that would have to be
given a real name to work in this case.)

Let's not insult unix.gurus without thinking, shall we?

++Brandon
-- 
Brandon S. Allbery, comp.sources.misc moderator and one admin of ncoast PA UN*X
uunet!hal.cwru.edu!ncoast!allbery		    ncoast!allbery@hal.cwru.edu
comp.sources.misc is moving off ncoast -- please do NOT send submissions direct
      Send comp.sources.misc submissions to comp-sources-misc@<backbone>.

rbj@nav.icst.nbs.gov (Root Boy Jim) (01/04/89)

? From: Chris Torek <chris@mimsy.uucp>

? For that matter, why do we need object archives in the first place?
? They are just a hack to save space (and perhaps, but not necessarily,
? time).  How about /lib/libc/*.o?

Which is what `-lc' would `mean'. How about /lib/c/*.o?

? (About 1/2 :-) ---the file system is *supposed* to be clean enough and
? fast enough to support this sort of thing; why *are* we working against
? it?)

Yeah, except that the directorys aren't sorted or hashed so you can't
use them very well as a database :-)

(yes, I know how to sort directorys)

? From: Eduardo Krell <ekrell@hector.uucp>

? inodes were a scarce resource back then ...

? We had an implementation of archives as plain directories (with the proper
? changes to ar, ld, etc), and it worked just fine.

Exactly! And `ranlib dir' just creates `dir/__.SYMDEF'!

? Eduardo Krell                   AT&T Bell Laboratories, Murray Hill, NJ

? From: Gordon Burditt <gordon@sneaky.tandy.com>

? Lots of the object files in libraries are tiny.  A considerable number of
? them just load a couple of registers and execute some kind of trapping
? instruction.  These are likely to be smaller than the amount of disk space
? used by an inode plus a directory entry.  

? Take, for example, the libc.a on my system (Tandy 6000).
? There are 202 files in libc.a, and it takes up 211 512-byte blocks, not
? counting indirect blocks.  The average size of an object file, including
? the ar header, is 534 bytes.  

So glom some of them together! Chances are, if you're gonna use `open',
you're gonna use `close', and most likely `read', `write', and `lseek' too.
And even if you don't, the minimal extra space isn't gonna kill you.

And with shared librarys, it doen't even matter, as you get the whole
thing anyway (at least I think so :-).

	(Root Boy) Jim Cottrell	(301) 975-5688
	<rbj@nav.icst.nbs.gov> or <rbj@icst-cmr.arpa>
	Crackers and Worms -- Breakfast of Champions!

aglew@urbana.mcd.mot.com (01/04/89)

...> Library archives as directories ( /lib/c/*.o)

Why not be able to do it both ways?  Why not define a
"filesystem within a file" filesystem type, so that you
can look at a library both as a file and as a directory.
In some ways, the raw disk is already like this.

Has anyone else out there got a "filesystem within a
file"? Methinks der Mouse has something like that at
McGill. Here, when the Little Software House on the
Prairie was owned by Gould, we did something very like
that for generating distributions, although we never
put in the kernel hooks (couldn't figure out how to
make them secure).

Andy "Krazy" Glew   aglew@urbana.mcd.mot.com   uunet!uiucdcs!mcdurb!aglew
   Motorola Microcomputer Division, Champaign-Urbana Design Center
	   1101 E. University, Urbana, Illinois 61801, USA.
   
My opinions are my own, and are not the opinions of my employer, or
any other organisation. I indicate my company only so that the reader
may account for any possible bias I may have towards our products.