[comp.sources.d] "Archive-name:" proposed change

pml@usl.usl.edu (Patrick Landry) (03/24/89)

I am currently putting together a script to save
postings to the sources and binaries groups currently
using the extra header lines a la comp.sources.unix.

One thing I would like to see is some indication of the
number of parts in the package in these headers.
Once a directory is created and parts of a package start
pouring in it is difficult to tell if all the parts have
arrived. Currently this information is only found in the
Subject: line and not in any standard format.

My proposal would be to add a number to the end of the
Archive-naem header such as
	Archive-name: foo/bar/Part01of25
If articles were stored using these archive names an ls
in the directory would indicate whether all parts were there.
The only other option I have come up with is 
	Archive-name: foo/bar/Part25.final
on the last part. Personally I don't care for this as much.

I sent mail to Rich Salz and he suggested opening up the 
discussion here. Your thoughts?
--
patrick
pml@usl.usl.edu
...!uunet!dalsqnt!usl!pml

ncoverby@ndsuvax.UUCP (Glen Overby) (03/25/89)

In article <780@usl.usl.edu> pml@usl.usl.edu (Patrick Landry) writes:
>I am currently putting together a script to save
>postings to the sources and binaries groups currently
>using the extra header lines a la comp.sources.unix.
        [ ... ]
>My proposal would be to add a number to the end of the
>Archive-naem header such as
>       Archive-name: foo/bar/Part01of25
>If articles were stored using these archive names an ls
>in the directory would indicate whether all parts were there.
>The only other option I have come up with is
>       Archive-name: foo/bar/Part25.final
>on the last part. Personally I don't care for this as much.

If I'm interpreting your proposal correctly, all you're changing is the last
filename element of the Archive-Name line.  This might be nice, but I get
the feeling that it would make things a bit more cluttered.

I have a program which will parse the message header and part of the body
for recognisable strings, such as the standard header lines, Rich's extra
header-style lines, uudecode lines, several types of shell archives, etc.  I
find the current Archive-Name to work extremely well.  I assume there are
many others like me who run such programs; If the Archive-Name field is
going to be significantly modified, an advanced warning should be posted so
that those of us with automatic news savers can update our programs.

As an alternative to modifying the current Archive-Name line, I would like
to propose an additional header-style line, "Archive-Part", of the format:

        Archive-Part: NN of TT

where NN is the part number and TT is the total number of archives.  It
could also be extended for patches, but that might be overkill.

A reasonably smart program could then maintain a database of what programs
have been received in full, and possibly combine and decode (binary) or
un-archive (source) the distribution.

If you want a program do save news, look at the "narc" program in
comp.sources.unix (one of the past 3 volumes).  I haven't looked at it too
deeply, but it was a pretty nice program.  You might not have to build
your own wheel after all!
--
                Glen Overby     <ncoverby@plains.nodak.edu>
        uunet!ndsuvax!ncoverby (UUCP)   ncoverby@ndsuvax (Bitnet)

gnu@hoptoad.uucp (John Gilmore) (03/27/89)

I have seen plenty of postings that came out as "Part 38 of 37" because
something was forgotten along the way.  I don't think this can be
automated.

I'm wondering why you want to automate it, actually.

If you want to know if your archive contains everything ever posted to
comp.sources.unix, compare it to the index that's periodically posted,
or suck down uunet's "ls -lR.Z" from ~uucp or ~tcp and compare it to
yours.  This will not only check for all "parts" but will also tell you
if you missed a 1-part thing or a patch.  Also, if this shows that you
have *more* stuff archived than uunet or Rich, they can check
whether *they* have a problem :-).
-- 
John Gilmore    {sun,pacbell,uunet,pyramid,amdahl}!hoptoad!gnu    gnu@toad.com
"Use the Source, Luke...."
Copyright 1989 John Gilmore; you may redistribute only if your recipients may.

mhw@wittsend.LBP.HARRIS.COM (Michael H. Warfield) (03/28/89)

LBP.HARRIS.COM (Michael H. Warfield (Mike)
Path: wittsendwittsend
!mhw

In article <780@usl.usl.edu> pml@usl.usl.edu (Patrick Landry) writes:
>My proposal would be to add a number to the end of the
>Archive-naem header such as
>	Archive-name: foo/bar/Part01of25
>If articles were stored using these archive names an ls
>in the directory would indicate whether all parts were there.

     We use "Part01_25" which amounts to the same thing.  Yeah I like this idea
alot.  I have a binary program running the archiving here because some of the
gyrations (compressing and linking the archive name with the "volume/issue"
name) are just too inefficient from a script (even if it is done only at night).

----
Michael H. Warfield  (The Mad Wizard)	| gatech.edu!galbp!wittsend!mhw
  (404)  270-2123 / 270-2098		| mhw@wittsend.LBP.HARRIS.COM
An optimist believes we live in the best of all possible worlds.
A pessimist is sure of it!

mhw@wittsend.LBP.HARRIS.COM (Michael H. Warfield) (03/28/89)

In article <780@usl.usl.edu> pml@usl.usl.edu (Patrick Landry) writes:
>My proposal would be to add a number to the end of the
>Archive-naem header such as
>	Archive-name: foo/bar/Part01of25
>If articles were stored using these archive names an ls
>in the directory would indicate whether all parts were there.

     We use "Part01_25" which amounts to the same thing.  Yeah I like this idea
alot.  I have a binary program running the archiving here because some of the
gyrations (compressing and linking the archive name with the "volume/issue"
name) are just too inefficient from a script (even if it is done only at night).

----
Michael H. Warfield  (The Mad Wizard)	| gatech.edu!galbp!wittsend!mhw
  (404)  270-2123 / 270-2098		| mhw@wittsend.LBP.HARRIS.COM
An optimist believes we live in the best of all possible worlds.
A pessimist is sure of it!

barnett@crdgw1.crd.ge.com (Bruce Barnett) (03/28/89)

>>	Archive-name: foo/bar/Part01of25

I have a version of savenews that works with any article, and saves it
under the form
	/usr/spool/savenews/news.group/yy-mm/mesage-id

Where yy-mm is the year and month of the article.
If a compressed version of the file is there, or if two articles
with the same message ID comes in, it keeps both copies.
I have scripts that weed out duplicates, compress large files, etc.

Yes, it doesn't handle the archive name, but it doesn't really bother me.
I can grep the LOGS file (which contain the filename and subject line,
one file per newsgroup) and extract all of the pieces with a shell script.
This allows me to do commands like
	cd /usr/spool/savenews/LOGS
	grep -i emacs *editors* gnu* comp.unix* |grep VMS |browse-articles

Sources usually end up in the same directory anyway.
I can archive older directories onto tape.

And it works for any newsgroup. I have about 100,000 articles on disk
and 200,000 on tape. Then again, I don't have an automated archival retreival
system in place (this machine is not on the internet and has no UUCP links).

--
Bruce G. Barnett	<barnett@crdgw1.ge.com>  a.k.a. <barnett@[192.35.44.4]>
			uunet!steinmetz!barnett, <barnett@steinmetz.ge.com>

greg@ncr-sd.SanDiego.NCR.COM (Greg Noel) (03/29/89)

In article <780@usl.usl.edu> pml@usl.usl.edu (Patrick Landry) writes:
>My proposal would be to add a number to the end of the
>Archive-naem [sic] header such as
>       Archive-name: foo/bar/Part01of25

In article <2464@ndsuvax.UUCP> ncoverby@ndsuvax.UUCP (Glen Overby) writes:
>... all you're changing is the last
>filename element of the Archive-Name line.  This might be nice, but I get
>the feeling that it would make things a bit more cluttered.

I'd like to see it become even more "cluttered" -- I'd like to see the
version identifier and the classification be included as part of the
archive-name.  That is, if the posting is, say, a revised version of a
previously-posted game of hangman, the archive-name might be:
	Archive-name: fun/hangman/V01r01-src/Part01of03
(I could imagine that the classification could be even more articulated.)
This way, not only do the pieces of the program end up associated, but
bug fixes and patches could be posted to the same location, thus keeping
them together as well.  Suffixes of src, bin, and doc could be attached
to the version specifier to separate particular components of the package.
And it would keep programs of a similar nature together, so that if you
wanted to look for all the games (or to ignore all the games), that could
be automated as well.

That's a rather messy paragraph, but I think you get the general idea.
-- 
-- Greg Noel, NCR Rancho Bernardo   Greg.Noel@SanDiego.NCR.COM  or  greg@ncr-sd

dhesi@bsu-cs.UUCP (Rahul Dhesi) (03/29/89)

In article <1187@ncr-sd.SanDiego.NCR.COM> greg@ncr-sd.SanDiego.NCR.COM (Greg
Noel) recommends headers like:
>	Archive-name: fun/hangman/V01r01-src/Part01of03

In comp.binaries.ibm.pc, I make sure the Archive-name: is in legal
filename syntax for 4.xBSD, System V, and MS-DOS.  This allows the
archive name to be used by MS-DOS users too.

Perhaps what we need is a new header:

     Part-info:  total=9, first=part01, last=part09

The first field gives the total number of parts.  The second tells you
the archive-name of the first part, and the third tells you the
archive-name of the last part.  The *only* difference between the names
of the various parts should be in the last two characters, which should
be two decimal digits.  This provides all the information you need to
check for missing parts.

The Part-info header should occur only in the first part of a multipart
posting.  If the part-info header is found again on another part, the
header specifying a greater total= should replace the other.  This will
allow the part numbers to be corrected.

E.g., we initially post:

     Part-info:  total=7, first=part00, last=part06

This means that part00, part01, ..., part06 are the various parts.
Then the moderator suddenly realizes that another part is needed.  When
he posts this new part, he includes another Part-info: header:
  
     Part-info:  total=8, first=part00, last=part07

The archiving software finds another Part-info: header, notes that the
total count is now greater, and lets this new header supersede the old
Part-info header.

What if the total was too high?  I this case the moderator needs
to post the right number of parts anyway, with the superfluous
one(s) simply being place-holders with a summary line of the
type:  "place holder for incorrect part count -- may be deleted".
With any luck this should happen very rarely.
-- 
Rahul Dhesi         UUCP:  <backbones>!{iuvax,pur-ee}!bsu-cs!dhesi
                    ARPA:  dhesi@bsu-cs.bsu.edu

page%rishathra@Sun.COM (Bob Page) (04/01/89)

I'm not a fan of the 'part X of Y' stuff in the archive name because
moderators tend to miscount.  Even so, I do it in the subject line.
If you want to add XofY info, you can get it from there and add it to
the archive name at your leisure.  The R$-supplied 'post' programn
that many moderators use does this in a standard format.  Just don't
start flaming when you see 'Part Y+1 of Y'.

greg@ncr-sd.SanDiego.NCR.COM (Greg Noel) proposed something like:
>Archive-name: fun/hangman/V01r01-src/Part01of03

The problem with this is the sites that use things like tar and cpio
to store the entire collection of a posting.  I've had a couple of
requests to limit my archive names to seven unique characters,
so folks could archive the postings as 'pgmname.CPIO.Z' and
still fit in 14 characters (the current SystemV limit).  I'm not
sure how the above proposed archive-name would work for those sites.

..bob
Bob Page    page@sun.com    sun!page    415/336-2745

djz@cbnews.ATT.COM (Danny Zerkel) (04/03/89)

In article <97030@sun.Eng.Sun.COM> page@sun.UUCP (Bob Page) writes:
>I'm not a fan of the 'part X of Y' stuff in the archive name because
>moderators tend to miscount.  Even so, I do it in the subject line.
...
>greg@ncr-sd.SanDiego.NCR.COM (Greg Noel) proposed something like:
>>Archive-name: fun/hangman/V01r01-src/Part01of03
>
>The problem with this is the sites that use things like tar and cpio
>to store the entire collection of a posting.  I've had a couple of
>requests to limit my archive names to seven unique characters,
>so folks could archive the postings as 'pgmname.CPIO.Z' and
>still fit in 14 characters (the current SystemV limit).  I'm not
>sure how the above proposed archive-name would work for those sites.
>
>..bob
>Bob Page    page@sun.com    sun!page    415/336-2745

I've been doing my best to archive the sources from comp.sources.unix,
comp.sources.games, comp.sources.x, comp.sources.misc, and alt.sources.
I tried comp.binaries.ibm.*, but being unfamiliar with there contents
I was unsure what would be an useful name scheme.

For the comp.sources.* stuff, I use the following method:

Use regular expressions to search the first 2048 bytes of the article--
looking for lines like:

  Archive-name: fred/Part02
  Archive-name: fred/Patch01
  Archive-name: fred-killer

Parts are then stored thusly:
fred.02

Usually in a directory reflecting its original group, such as:
unix/fred.02

Patches have the following appearance:
unix/fred.p01

And single part postings are given the number 00:
unix/fred-kille.00

Notice the limit on name length is 10, which leaves 4 characters for
part separation.  These parts and patches are then lumped together in
and zoo archive:
unix/fred.zoo
unix/fred-kille.zoo

I have only recently started using zoo, before I was using cpio and compress
and calling the arhive: fred.ZZ.  But these are difficult to deal with.
Zoo seems to work well and the automatic compression is nice, but it acts
a bit twitchy about implied .zoo extensions on archives.  So I've taken to
typing the .zoo at all times.

This system easily expands to handle fixes: .f?? (usually the number is
selected by me at random), repostings .r??, and other miscellaneous bits
which I have seen.  The only problem with the fixes stuff is that I can
never remember the original archive name for a fix article like:

>
>Wow dudes!  I kept getting core dumps in "Bogus Rouges from Space"
>until I figured out this fix:
>
>  o.main.c: 66327
>       printf("Narley!");
>  main.c: 66327
>	printf("Fer sure!");
>

So if some standardized method of naming and archiving is developed, I'm
very interested in extending it to fixes of an unoffical nature.  Of course,
anything would help in alt.sources (ie, "Subject: Bit Shuffler PART 1 of 83").

I currently pipe whole newsgroups into my archiving program, and end up
adding more expressions every time it burps and saves parts 1 through 83
as funky/bitshuff.00.

****************************************************************************
Danny J. Zerkel
AT&T Bell Labs
Maloderous-Cow-Town-USA, OH

greg@ncr-sd.SanDiego.NCR.COM (Greg Noel) (04/05/89)

There seem to be two issues here:  Those of us who wish to conveniently
index and retrieve the packages, and those who wish to conveniently store
the packages.

greg@ncr-sd.SanDiego.NCR.COM (that's me) proposed something like:
>Archive-name: fun/hangman/V01r01-src/Part01of03

In article <97030@sun.Eng.Sun.COM> page@sun.UUCP (Bob Page) writes:
>I'm not a fan of the 'part X of Y' stuff in the archive name ....

I don't really care if the '... of Y' stuff is in the archive name; my
point was that the terminal component of the name frequently carries too
much semantic weight.  In my example above, Bob might have posted it as
"fun/hangman11src.1" -- that is, the name, version, type, and number are
all squeezed into one component.  (And you can't really tell if it's
version one or eleven...)  What I'm saying is that I'd like to see the
components separated out to make it easy to archive and retrieve.  Whether
it is given as Part03of05 or just Part03 makes little difference here.

Bob does an admirable job in providing the initial categorization (even if
he occasionally puts something in the "langauge" category); I consider the
categorization to be the \hard/ part.  But for those of us that archive and
index these groups, it's not always obvious that "fun/hangman.p1" is a patch
to version 1.1 of the source, while "fun/hangman/V01r01-src/Patch01" is.

>I've had ... requests to limit my archive names to seven unique characters,
>so folks could archive the postings as 'pgmname.CPIO.Z' and still fit in 14
>characters (the current SystemV limit).  ....

(Aside: I'm one of those people using SysV; I'd certainly like to keep the
pathname \components/ under fourteen characters -- that's an intentional
side-effect of my suggestion.)

I think this is a valid point, but it's a different problem.  Perhaps this
could be met by dividing the archive-name into two pieces: One to identify
the \package/ and the other one to identify the \component/.  That is,
"Archive-name: fun/hangman V01r02-src/Part01" would work if the pieces were
separated by white space.  Then those who wish to sort their indexes so
that source, binary, and patches of one version are kept together can do
it, while those who wish to keep compressed cpio archives can strip off
the leading pathname components of the first piece and store it under
"hangman.CPIO.Z".

This is a serious proposal.  Right now, I have to duplicate some of Bob's
work (determining the version and part) before I can archive and index
the articles; it's amazing how much effort that can be sometimes.  (Although
having to do that really makes one respect just how much work the moderators
put in to have a smoothly-running news group.  It's not at all surprising
that they will sometimes make a mistake and have a 'Part 10 of 9' posting!)
-- 
-- Greg Noel, NCR Rancho Bernardo   Greg.Noel@SanDiego.NCR.COM  or  greg@ncr-sd

allbery@ncoast.ORG (Brandon S. Allbery) (04/12/89)

As quoted from <1220@ncr-sd.SanDiego.NCR.COM> by greg@ncr-sd.SanDiego.NCR.COM (Greg Noel):
+---------------
| >I've had ... requests to limit my archive names to seven unique characters,
| >so folks could archive the postings as 'pgmname.CPIO.Z' and still fit in 14
| >characters (the current SystemV limit).  ....
| 
| (Aside: I'm one of those people using SysV; I'd certainly like to keep the
| pathname \components/ under fourteen characters -- that's an intentional
| side-effect of my suggestion.)
+---------------

Side comment:  I try to keep my archive names to a maximum of 10 characters;
I also save compressed cpio's of some things.  My convention is to use the
extension ".cZ"....

+---------------
| This is a serious proposal.  Right now, I have to duplicate some of Bob's
| work (determining the version and part) before I can archive and index
| the articles; it's amazing how much effort that can be sometimes.  (Although
| having to do that really makes one respect just how much work the moderators
| put in to have a smoothly-running news group.  It's not at all surprising
| that they will sometimes make a mistake and have a 'Part 10 of 9' posting!)
+---------------

At least once, I've received a "part 5 of 4" from the *author* of a package.

I'm trying to come up with some kind of convention to make archive names
more useful myself.  My conventions go a bit farther than these; and I'm
still contemplating moving parts elsewhere.

Example:  For about a month, Archive-names in comp.sources.misc included a
compatability section (and I have continued this in certain cases); this
consists of something like ".bsd" or ".s5" (or ".xenix" or ".uport", etc.)
appended to the archive name.  I've since decided that this belongs in a
standardized Keywords: line, and I'll probably add this feature for Volume 7
if I have enough time to work on it.  (I will include a comprehensive
dictionary of keywords in the Welcome! posting when I implement this.  By
standardizing the contents of the Keywords: line, you can use it to generate
an index to find only, say, System V-compatable sources.

Thought:  extend the auxiliary headers as follows:

Submitted-by: ...
Posting-number: ...
Archive-name: foobar/Part01
Archive-version: 1.0
Archive-keywords: C, K&R, SunOS4, NeWS, binary
Warnings: contains uuencoded, compressed bitmaps

and possibly,

Moderator: ...

to make it easy to identify postings by a sub-moderator (although this may
not be necessary or even desireable; it's just a thought).

Note that the Archive-keywords: line includes information about the language
required; there have been times when I've FTP'd something large only to
discover that it requires ANSI C (ncoast, being System III, has neither an
ANSI C nor the ability to run GCC -- at least, not without a massive rewrite
and a guarantee of thrashing the system while it's running...).  This also
handles the occasional Ada or Pascal source (and I seem to remember at least
one posting in CLU!).

The list above isn't necessarily definitive.  Suggestions, anyone?

++Brandon
-- 
Brandon S. Allbery, moderator of comp.sources.misc	     allbery@ncoast.org
uunet!hal.cwru.edu!ncoast!allbery		    ncoast!allbery@hal.cwru.edu
      Send comp.sources.misc submissions to comp-sources-misc@<backbone>
NCoast Public Access UN*X - (216) 781-6201, 300/1200/2400 baud, login: makeuser

greg@ncr-sd.SanDiego.NCR.COM (Greg Noel) (04/14/89)

In article <13558@ncoast.ORG> allbery@ncoast.UUCP (Brandon S. Allbery) writes:
>Thought:  extend the auxiliary headers as follows:

>Submitted-By: ...

I'd like to see this expanded to include the author's name (if different):
  Submitted-by: Joe Jones   Written-by: Sam Smith
(I know, it's usually in the documentation, but I think it's valuable to
give highlighted credit where credit is due.)

>Archive-name: foobar/Part01
>Archive-version: 1.0

It's my point that this is really foobar/1.0/Part01 (or actually, to keep
the components uniform so that they sort well, foobar/V01r00/Part01).  That
is, it's part one of version one, so it should be possible to sort with the
version being more significant than the part.  I'm also tacking on bin, src,
or doc (as in foobar/V01r00-bin/Part01) when I want to indicate what is in
the package, although this distinction may only be useful when archiving both
sources and binaries, as with comp.{sources,binaries}.amiga.

>Archive-keywords: C, K&R, SunOS4, NeWS, binary

I'd find this useful, but I'd rather have the category(categories) of the
package listed -- it'd be more useful to me if I knew that the primary
category of a MIDI library package was "libraries" but that it should be
cross-filed under "audio/music" as well.  I'd buy a "Categories:" auxilary
header for this purpose; that would be even better than my proposal, but
it would involve a change to the types of headers, which I was trying to
avoid.  (What do you think of that, Bob?)  The keywords somehow seem too
low a level of description.

>Warnings: contains uuencoded, compressed bitmaps

Hmmmmm.....  No opinion here; but then, I'd expect to find this sort of
thing in comp.binaries.amiga.

>Moderator: ...

It seems to me that this is something that should be hashed out among the
moderators themselves.
-- 
-- Greg Noel, NCR Rancho Bernardo   Greg.Noel@SanDiego.NCR.COM  or  greg@ncr-sd