[net.micro.68k] A Standard Source Archive Format

mwm@ucbopal.berkeley.edu (Mike (I'll be mellow when I'm dead) Meyer) (05/11/86)

In article <1209@lsuc.UUCP> jimomura@lsuc.UUCP (Jim Omura) writes:
>>The par.c mechanism uses the BSD ar format for the file. This format,
>>unlike the SysV ar format, is pure ascii. i.e. if the files par'ed
>>together are all ascii files, the entire file is ascii.
>
>     That's essentially what I wanted to find out.  If there's been
>a substantial effort in that direction, then I think we should use
>'par.c' for a standard archive.  I didn't know how much work had been
>done in either direction.
>
>     Any other comments either way?

YES!

The 4BSD ar format is unsuitable as an OS-9 (and AmigaDOS and Unix and
MS-DOS and etc.) standard archive for one simple reason: it doesn't
understand directories. This is why James Jones wrote something for
OS-9 that correctly dissasembles BSD tar files, including the directory
creation, and why I ported it to AmigaDOS - so we could download
directory structures (like microemacs). [For those interested, this
code - in a form that should compile on both OS-9 and AmigaDOS - has
been posted to net.micro.amiga.]

Unfortunately, the 4BSD tar uses lots of NULLs, which will get eaten
by mailers. Also, there isn't something to build tar files in the
public domain (yet).

Might I suggest this problem be tackled a different way: decide what
the tool should be, then what features it has to have, then which of
the public domain archivers will be easiest to modify to do that?

To start it off, I think that what we should really be looking for is
an archive format for moving source through various mailers, and the
archive should be suitable for any system with a Unix-like file
structure, not just OS-9. The discussion I've seen tends to suggest
that this is what people are really looking for, but the Subject line
(which I've changed) didn't suggest that.

I feel that the minimum set of features should be:

	1) PD versions available for most major OS's, both for micros
		and Internet hosts.
	2) The headers have no non-ASCII/EBCDIC characters, or TABS.
	3) A checksum of some kind is included on each file.
	4) The format include provisions for creating directories.
	5) It shouldn't choke on binary data in the archive.
	
Some discussion:

1) Obviously, to get as wide a distribution as possible. Probably those
on non-Unix Internet hosts will have to have someone write a version for
that host.

2) Since people on BITNET are interested in sources, we shouldn't make
the headers incompatable with their mailers. This means no non-EBCDIC
characters. Also, BITNET (for some reason) eats tabs, so we shouldn't
put those in the header either. Of course, most source will have them,
but why make things more difficult than we have to?

3) Of course; required for sending data through the mail.

4) This is harder, as some of the hosts don't have the same directory
format as Unix/OS-9/AmigaDOS. The archive format should specifiy what
directories look like (probably Unix), and those implementing versions
for other systems can decide how to handle things. For instance, what
does OS-9 do with an archive that has both README and Readme in it?

5) Obviously, otherwise we'll have a different format for local use.

Most of this is obvious and straightforward; just thought that it ought
to get said before a decisions is made.

Also, will net.micro readers please excuse the cross-posting. I posted to
the original discussion groups, and to net.micro as I thought it was
important enough to need to be seen by that group. Followups have been
pointed to net.micro ONLY.

	Thanx for the time,
	<mike