pml@usl.usl.edu (Patrick Landry) (03/24/89)
I am currently putting together a script to save postings to the sources and binaries groups currently using the extra header lines a la comp.sources.unix. One thing I would like to see is some indication of the number of parts in the package in these headers. Once a directory is created and parts of a package start pouring in it is difficult to tell if all the parts have arrived. Currently this information is only found in the Subject: line and not in any standard format. My proposal would be to add a number to the end of the Archive-naem header such as Archive-name: foo/bar/Part01of25 If articles were stored using these archive names an ls in the directory would indicate whether all parts were there. The only other option I have come up with is Archive-name: foo/bar/Part25.final on the last part. Personally I don't care for this as much. I sent mail to Rich Salz and he suggested opening up the discussion here. Your thoughts? -- patrick pml@usl.usl.edu ...!uunet!dalsqnt!usl!pml
ncoverby@ndsuvax.UUCP (Glen Overby) (03/25/89)
In article <780@usl.usl.edu> pml@usl.usl.edu (Patrick Landry) writes: >I am currently putting together a script to save >postings to the sources and binaries groups currently >using the extra header lines a la comp.sources.unix. [ ... ] >My proposal would be to add a number to the end of the >Archive-naem header such as > Archive-name: foo/bar/Part01of25 >If articles were stored using these archive names an ls >in the directory would indicate whether all parts were there. >The only other option I have come up with is > Archive-name: foo/bar/Part25.final >on the last part. Personally I don't care for this as much. If I'm interpreting your proposal correctly, all you're changing is the last filename element of the Archive-Name line. This might be nice, but I get the feeling that it would make things a bit more cluttered. I have a program which will parse the message header and part of the body for recognisable strings, such as the standard header lines, Rich's extra header-style lines, uudecode lines, several types of shell archives, etc. I find the current Archive-Name to work extremely well. I assume there are many others like me who run such programs; If the Archive-Name field is going to be significantly modified, an advanced warning should be posted so that those of us with automatic news savers can update our programs. As an alternative to modifying the current Archive-Name line, I would like to propose an additional header-style line, "Archive-Part", of the format: Archive-Part: NN of TT where NN is the part number and TT is the total number of archives. It could also be extended for patches, but that might be overkill. A reasonably smart program could then maintain a database of what programs have been received in full, and possibly combine and decode (binary) or un-archive (source) the distribution. If you want a program do save news, look at the "narc" program in comp.sources.unix (one of the past 3 volumes). I haven't looked at it too deeply, but it was a pretty nice program. You might not have to build your own wheel after all! -- Glen Overby <ncoverby@plains.nodak.edu> uunet!ndsuvax!ncoverby (UUCP) ncoverby@ndsuvax (Bitnet)
gnu@hoptoad.uucp (John Gilmore) (03/27/89)
I have seen plenty of postings that came out as "Part 38 of 37" because something was forgotten along the way. I don't think this can be automated. I'm wondering why you want to automate it, actually. If you want to know if your archive contains everything ever posted to comp.sources.unix, compare it to the index that's periodically posted, or suck down uunet's "ls -lR.Z" from ~uucp or ~tcp and compare it to yours. This will not only check for all "parts" but will also tell you if you missed a 1-part thing or a patch. Also, if this shows that you have *more* stuff archived than uunet or Rich, they can check whether *they* have a problem :-). -- John Gilmore {sun,pacbell,uunet,pyramid,amdahl}!hoptoad!gnu gnu@toad.com "Use the Source, Luke...." Copyright 1989 John Gilmore; you may redistribute only if your recipients may.
mhw@wittsend.LBP.HARRIS.COM (Michael H. Warfield) (03/28/89)
LBP.HARRIS.COM (Michael H. Warfield (Mike) Path: wittsendwittsend !mhw In article <780@usl.usl.edu> pml@usl.usl.edu (Patrick Landry) writes: >My proposal would be to add a number to the end of the >Archive-naem header such as > Archive-name: foo/bar/Part01of25 >If articles were stored using these archive names an ls >in the directory would indicate whether all parts were there. We use "Part01_25" which amounts to the same thing. Yeah I like this idea alot. I have a binary program running the archiving here because some of the gyrations (compressing and linking the archive name with the "volume/issue" name) are just too inefficient from a script (even if it is done only at night). ---- Michael H. Warfield (The Mad Wizard) | gatech.edu!galbp!wittsend!mhw (404) 270-2123 / 270-2098 | mhw@wittsend.LBP.HARRIS.COM An optimist believes we live in the best of all possible worlds. A pessimist is sure of it!
mhw@wittsend.LBP.HARRIS.COM (Michael H. Warfield) (03/28/89)
In article <780@usl.usl.edu> pml@usl.usl.edu (Patrick Landry) writes: >My proposal would be to add a number to the end of the >Archive-naem header such as > Archive-name: foo/bar/Part01of25 >If articles were stored using these archive names an ls >in the directory would indicate whether all parts were there. We use "Part01_25" which amounts to the same thing. Yeah I like this idea alot. I have a binary program running the archiving here because some of the gyrations (compressing and linking the archive name with the "volume/issue" name) are just too inefficient from a script (even if it is done only at night). ---- Michael H. Warfield (The Mad Wizard) | gatech.edu!galbp!wittsend!mhw (404) 270-2123 / 270-2098 | mhw@wittsend.LBP.HARRIS.COM An optimist believes we live in the best of all possible worlds. A pessimist is sure of it!
barnett@crdgw1.crd.ge.com (Bruce Barnett) (03/28/89)
>> Archive-name: foo/bar/Part01of25
I have a version of savenews that works with any article, and saves it
under the form
/usr/spool/savenews/news.group/yy-mm/mesage-id
Where yy-mm is the year and month of the article.
If a compressed version of the file is there, or if two articles
with the same message ID comes in, it keeps both copies.
I have scripts that weed out duplicates, compress large files, etc.
Yes, it doesn't handle the archive name, but it doesn't really bother me.
I can grep the LOGS file (which contain the filename and subject line,
one file per newsgroup) and extract all of the pieces with a shell script.
This allows me to do commands like
cd /usr/spool/savenews/LOGS
grep -i emacs *editors* gnu* comp.unix* |grep VMS |browse-articles
Sources usually end up in the same directory anyway.
I can archive older directories onto tape.
And it works for any newsgroup. I have about 100,000 articles on disk
and 200,000 on tape. Then again, I don't have an automated archival retreival
system in place (this machine is not on the internet and has no UUCP links).
--
Bruce G. Barnett <barnett@crdgw1.ge.com> a.k.a. <barnett@[192.35.44.4]>
uunet!steinmetz!barnett, <barnett@steinmetz.ge.com>
greg@ncr-sd.SanDiego.NCR.COM (Greg Noel) (03/29/89)
In article <780@usl.usl.edu> pml@usl.usl.edu (Patrick Landry) writes: >My proposal would be to add a number to the end of the >Archive-naem [sic] header such as > Archive-name: foo/bar/Part01of25 In article <2464@ndsuvax.UUCP> ncoverby@ndsuvax.UUCP (Glen Overby) writes: >... all you're changing is the last >filename element of the Archive-Name line. This might be nice, but I get >the feeling that it would make things a bit more cluttered. I'd like to see it become even more "cluttered" -- I'd like to see the version identifier and the classification be included as part of the archive-name. That is, if the posting is, say, a revised version of a previously-posted game of hangman, the archive-name might be: Archive-name: fun/hangman/V01r01-src/Part01of03 (I could imagine that the classification could be even more articulated.) This way, not only do the pieces of the program end up associated, but bug fixes and patches could be posted to the same location, thus keeping them together as well. Suffixes of src, bin, and doc could be attached to the version specifier to separate particular components of the package. And it would keep programs of a similar nature together, so that if you wanted to look for all the games (or to ignore all the games), that could be automated as well. That's a rather messy paragraph, but I think you get the general idea. -- -- Greg Noel, NCR Rancho Bernardo Greg.Noel@SanDiego.NCR.COM or greg@ncr-sd
dhesi@bsu-cs.UUCP (Rahul Dhesi) (03/29/89)
In article <1187@ncr-sd.SanDiego.NCR.COM> greg@ncr-sd.SanDiego.NCR.COM (Greg Noel) recommends headers like: > Archive-name: fun/hangman/V01r01-src/Part01of03 In comp.binaries.ibm.pc, I make sure the Archive-name: is in legal filename syntax for 4.xBSD, System V, and MS-DOS. This allows the archive name to be used by MS-DOS users too. Perhaps what we need is a new header: Part-info: total=9, first=part01, last=part09 The first field gives the total number of parts. The second tells you the archive-name of the first part, and the third tells you the archive-name of the last part. The *only* difference between the names of the various parts should be in the last two characters, which should be two decimal digits. This provides all the information you need to check for missing parts. The Part-info header should occur only in the first part of a multipart posting. If the part-info header is found again on another part, the header specifying a greater total= should replace the other. This will allow the part numbers to be corrected. E.g., we initially post: Part-info: total=7, first=part00, last=part06 This means that part00, part01, ..., part06 are the various parts. Then the moderator suddenly realizes that another part is needed. When he posts this new part, he includes another Part-info: header: Part-info: total=8, first=part00, last=part07 The archiving software finds another Part-info: header, notes that the total count is now greater, and lets this new header supersede the old Part-info header. What if the total was too high? I this case the moderator needs to post the right number of parts anyway, with the superfluous one(s) simply being place-holders with a summary line of the type: "place holder for incorrect part count -- may be deleted". With any luck this should happen very rarely. -- Rahul Dhesi UUCP: <backbones>!{iuvax,pur-ee}!bsu-cs!dhesi ARPA: dhesi@bsu-cs.bsu.edu
page%rishathra@Sun.COM (Bob Page) (04/01/89)
I'm not a fan of the 'part X of Y' stuff in the archive name because
moderators tend to miscount. Even so, I do it in the subject line.
If you want to add XofY info, you can get it from there and add it to
the archive name at your leisure. The R$-supplied 'post' programn
that many moderators use does this in a standard format. Just don't
start flaming when you see 'Part Y+1 of Y'.
greg@ncr-sd.SanDiego.NCR.COM (Greg Noel) proposed something like:
>Archive-name: fun/hangman/V01r01-src/Part01of03
The problem with this is the sites that use things like tar and cpio
to store the entire collection of a posting. I've had a couple of
requests to limit my archive names to seven unique characters,
so folks could archive the postings as 'pgmname.CPIO.Z' and
still fit in 14 characters (the current SystemV limit). I'm not
sure how the above proposed archive-name would work for those sites.
..bob
Bob Page page@sun.com sun!page 415/336-2745
djz@cbnews.ATT.COM (Danny Zerkel) (04/03/89)
In article <97030@sun.Eng.Sun.COM> page@sun.UUCP (Bob Page) writes: >I'm not a fan of the 'part X of Y' stuff in the archive name because >moderators tend to miscount. Even so, I do it in the subject line. ... >greg@ncr-sd.SanDiego.NCR.COM (Greg Noel) proposed something like: >>Archive-name: fun/hangman/V01r01-src/Part01of03 > >The problem with this is the sites that use things like tar and cpio >to store the entire collection of a posting. I've had a couple of >requests to limit my archive names to seven unique characters, >so folks could archive the postings as 'pgmname.CPIO.Z' and >still fit in 14 characters (the current SystemV limit). I'm not >sure how the above proposed archive-name would work for those sites. > >..bob >Bob Page page@sun.com sun!page 415/336-2745 I've been doing my best to archive the sources from comp.sources.unix, comp.sources.games, comp.sources.x, comp.sources.misc, and alt.sources. I tried comp.binaries.ibm.*, but being unfamiliar with there contents I was unsure what would be an useful name scheme. For the comp.sources.* stuff, I use the following method: Use regular expressions to search the first 2048 bytes of the article-- looking for lines like: Archive-name: fred/Part02 Archive-name: fred/Patch01 Archive-name: fred-killer Parts are then stored thusly: fred.02 Usually in a directory reflecting its original group, such as: unix/fred.02 Patches have the following appearance: unix/fred.p01 And single part postings are given the number 00: unix/fred-kille.00 Notice the limit on name length is 10, which leaves 4 characters for part separation. These parts and patches are then lumped together in and zoo archive: unix/fred.zoo unix/fred-kille.zoo I have only recently started using zoo, before I was using cpio and compress and calling the arhive: fred.ZZ. But these are difficult to deal with. Zoo seems to work well and the automatic compression is nice, but it acts a bit twitchy about implied .zoo extensions on archives. So I've taken to typing the .zoo at all times. This system easily expands to handle fixes: .f?? (usually the number is selected by me at random), repostings .r??, and other miscellaneous bits which I have seen. The only problem with the fixes stuff is that I can never remember the original archive name for a fix article like: > >Wow dudes! I kept getting core dumps in "Bogus Rouges from Space" >until I figured out this fix: > > o.main.c: 66327 > printf("Narley!"); > main.c: 66327 > printf("Fer sure!"); > So if some standardized method of naming and archiving is developed, I'm very interested in extending it to fixes of an unoffical nature. Of course, anything would help in alt.sources (ie, "Subject: Bit Shuffler PART 1 of 83"). I currently pipe whole newsgroups into my archiving program, and end up adding more expressions every time it burps and saves parts 1 through 83 as funky/bitshuff.00. **************************************************************************** Danny J. Zerkel AT&T Bell Labs Maloderous-Cow-Town-USA, OH
greg@ncr-sd.SanDiego.NCR.COM (Greg Noel) (04/05/89)
There seem to be two issues here: Those of us who wish to conveniently index and retrieve the packages, and those who wish to conveniently store the packages. greg@ncr-sd.SanDiego.NCR.COM (that's me) proposed something like: >Archive-name: fun/hangman/V01r01-src/Part01of03 In article <97030@sun.Eng.Sun.COM> page@sun.UUCP (Bob Page) writes: >I'm not a fan of the 'part X of Y' stuff in the archive name .... I don't really care if the '... of Y' stuff is in the archive name; my point was that the terminal component of the name frequently carries too much semantic weight. In my example above, Bob might have posted it as "fun/hangman11src.1" -- that is, the name, version, type, and number are all squeezed into one component. (And you can't really tell if it's version one or eleven...) What I'm saying is that I'd like to see the components separated out to make it easy to archive and retrieve. Whether it is given as Part03of05 or just Part03 makes little difference here. Bob does an admirable job in providing the initial categorization (even if he occasionally puts something in the "langauge" category); I consider the categorization to be the \hard/ part. But for those of us that archive and index these groups, it's not always obvious that "fun/hangman.p1" is a patch to version 1.1 of the source, while "fun/hangman/V01r01-src/Patch01" is. >I've had ... requests to limit my archive names to seven unique characters, >so folks could archive the postings as 'pgmname.CPIO.Z' and still fit in 14 >characters (the current SystemV limit). .... (Aside: I'm one of those people using SysV; I'd certainly like to keep the pathname \components/ under fourteen characters -- that's an intentional side-effect of my suggestion.) I think this is a valid point, but it's a different problem. Perhaps this could be met by dividing the archive-name into two pieces: One to identify the \package/ and the other one to identify the \component/. That is, "Archive-name: fun/hangman V01r02-src/Part01" would work if the pieces were separated by white space. Then those who wish to sort their indexes so that source, binary, and patches of one version are kept together can do it, while those who wish to keep compressed cpio archives can strip off the leading pathname components of the first piece and store it under "hangman.CPIO.Z". This is a serious proposal. Right now, I have to duplicate some of Bob's work (determining the version and part) before I can archive and index the articles; it's amazing how much effort that can be sometimes. (Although having to do that really makes one respect just how much work the moderators put in to have a smoothly-running news group. It's not at all surprising that they will sometimes make a mistake and have a 'Part 10 of 9' posting!) -- -- Greg Noel, NCR Rancho Bernardo Greg.Noel@SanDiego.NCR.COM or greg@ncr-sd
allbery@ncoast.ORG (Brandon S. Allbery) (04/12/89)
As quoted from <1220@ncr-sd.SanDiego.NCR.COM> by greg@ncr-sd.SanDiego.NCR.COM (Greg Noel): +--------------- | >I've had ... requests to limit my archive names to seven unique characters, | >so folks could archive the postings as 'pgmname.CPIO.Z' and still fit in 14 | >characters (the current SystemV limit). .... | | (Aside: I'm one of those people using SysV; I'd certainly like to keep the | pathname \components/ under fourteen characters -- that's an intentional | side-effect of my suggestion.) +--------------- Side comment: I try to keep my archive names to a maximum of 10 characters; I also save compressed cpio's of some things. My convention is to use the extension ".cZ".... +--------------- | This is a serious proposal. Right now, I have to duplicate some of Bob's | work (determining the version and part) before I can archive and index | the articles; it's amazing how much effort that can be sometimes. (Although | having to do that really makes one respect just how much work the moderators | put in to have a smoothly-running news group. It's not at all surprising | that they will sometimes make a mistake and have a 'Part 10 of 9' posting!) +--------------- At least once, I've received a "part 5 of 4" from the *author* of a package. I'm trying to come up with some kind of convention to make archive names more useful myself. My conventions go a bit farther than these; and I'm still contemplating moving parts elsewhere. Example: For about a month, Archive-names in comp.sources.misc included a compatability section (and I have continued this in certain cases); this consists of something like ".bsd" or ".s5" (or ".xenix" or ".uport", etc.) appended to the archive name. I've since decided that this belongs in a standardized Keywords: line, and I'll probably add this feature for Volume 7 if I have enough time to work on it. (I will include a comprehensive dictionary of keywords in the Welcome! posting when I implement this. By standardizing the contents of the Keywords: line, you can use it to generate an index to find only, say, System V-compatable sources. Thought: extend the auxiliary headers as follows: Submitted-by: ... Posting-number: ... Archive-name: foobar/Part01 Archive-version: 1.0 Archive-keywords: C, K&R, SunOS4, NeWS, binary Warnings: contains uuencoded, compressed bitmaps and possibly, Moderator: ... to make it easy to identify postings by a sub-moderator (although this may not be necessary or even desireable; it's just a thought). Note that the Archive-keywords: line includes information about the language required; there have been times when I've FTP'd something large only to discover that it requires ANSI C (ncoast, being System III, has neither an ANSI C nor the ability to run GCC -- at least, not without a massive rewrite and a guarantee of thrashing the system while it's running...). This also handles the occasional Ada or Pascal source (and I seem to remember at least one posting in CLU!). The list above isn't necessarily definitive. Suggestions, anyone? ++Brandon -- Brandon S. Allbery, moderator of comp.sources.misc allbery@ncoast.org uunet!hal.cwru.edu!ncoast!allbery ncoast!allbery@hal.cwru.edu Send comp.sources.misc submissions to comp-sources-misc@<backbone> NCoast Public Access UN*X - (216) 781-6201, 300/1200/2400 baud, login: makeuser
greg@ncr-sd.SanDiego.NCR.COM (Greg Noel) (04/14/89)
In article <13558@ncoast.ORG> allbery@ncoast.UUCP (Brandon S. Allbery) writes: >Thought: extend the auxiliary headers as follows: >Submitted-By: ... I'd like to see this expanded to include the author's name (if different): Submitted-by: Joe Jones Written-by: Sam Smith (I know, it's usually in the documentation, but I think it's valuable to give highlighted credit where credit is due.) >Archive-name: foobar/Part01 >Archive-version: 1.0 It's my point that this is really foobar/1.0/Part01 (or actually, to keep the components uniform so that they sort well, foobar/V01r00/Part01). That is, it's part one of version one, so it should be possible to sort with the version being more significant than the part. I'm also tacking on bin, src, or doc (as in foobar/V01r00-bin/Part01) when I want to indicate what is in the package, although this distinction may only be useful when archiving both sources and binaries, as with comp.{sources,binaries}.amiga. >Archive-keywords: C, K&R, SunOS4, NeWS, binary I'd find this useful, but I'd rather have the category(categories) of the package listed -- it'd be more useful to me if I knew that the primary category of a MIDI library package was "libraries" but that it should be cross-filed under "audio/music" as well. I'd buy a "Categories:" auxilary header for this purpose; that would be even better than my proposal, but it would involve a change to the types of headers, which I was trying to avoid. (What do you think of that, Bob?) The keywords somehow seem too low a level of description. >Warnings: contains uuencoded, compressed bitmaps Hmmmmm..... No opinion here; but then, I'd expect to find this sort of thing in comp.binaries.amiga. >Moderator: ... It seems to me that this is something that should be hashed out among the moderators themselves. -- -- Greg Noel, NCR Rancho Bernardo Greg.Noel@SanDiego.NCR.COM or greg@ncr-sd