[comp.sources.d] Archiving Sources

kent@ssbell.UUCP (Kent Landfield) (03/20/89)

J. Eric Townsend in <405@flatline.UUCP> writes:
# >In article <590@alice.marlow.uucp> fox@marlow.UUCP (Paul Fox) writes:
# >>In article <2706@rtech.rtech.com> daveb@rtech.com (Dave Brower) writes:
# Please move this discussion to comp.sources.d.  Alt.sources if for source
# postings only.  Posting messages here plays havoc with automatic archivers.

It should not play havoc with the archivers, just force the archive 
administrator at the site to review the archived members of alt.sources,
trashing non-source members.

Chip Rosenthal in <727@vector.UUCP> writes:
# [discussion moved from alt.sources -- what a novel concept]
# Automatic archiving of alt.sources? Ha! What a joke.
# I've long ago decided it was much easier to manually archive the "wheat"
# rather than automatically archive everything and throw out the "chaff".

I disagree! I do not have the time on a daily basis to manually save the
"wheat". I am envious of your position if you truely have time to waste.:-)
I find it much easier to automatically archive the entire newsgroup. Then,
when I can find the time, I remove members that are non-sources. 5 minutes
a week max...

Chip Rosenthal in <727@vector.UUCP> writes:
# The only way archiving is going to work for alt.sources is if posters
# start using secondary headers (a la comp.sources newsgroups) when real
# sources are posted there.  Personally, I'd like to see something like:
#     Submitted-by: My Name <my!address>
#     Posting-number: Volume 89
#     Archive-name: pgm_name
# This leaves off the "Issue" part of the "Posting-number" field.  Roll the
# volume number annually.  12 chars or less on archive name, please, to
# allow for ".Z".  Comments?

Well this is not the *only* way it will work but it is a method that fits
the r$alz approach. Chip's intent is to move more toward the standard method
of labeling archive members so that existing (and about to be released) 
archivers can deal with alt.sources. Why stop at alt.sources ? Why not
have a source posting program so that all sources posted to any non-moderated
newsgroup on the net have the same format of auxiliary headers. In this manner 
a process could be set on the news directory daily that would archive *all*
sources sent to *any* group.

(Rich, are you listening ? Is is time for a posting of post.c ?)

Jef Poskanzer in <10985@well.UUCP> writes:
# Why do you need Submitted-by: when you already have From:?  Why do you
# need a number, especially if it's only going to change once per year?

Chip Rosenthal in <763@vector.UUCP> writes:
# To minimize the chance of breaking existing archiving programs.

This is altogether too true. Everyone has their own method of archiving
sources. There has been no real tool for the average site to use. Each
site had to come up with their own tools to get the job done. To some I 
have run into, archiving is almost as touchy an issue as RHF.(:-)

Jef Poskanzer in <10985@well.UUCP> writes:
# Why do you need a separate Archive-name:, when some simple conventions
# for what goes in the Subject: line would work even better?  

For referencing sources, it is much easier to specify an Archive-name
then to rely on the standardization of new "simple conventions". I
can't see how the Subject: line would work "even better".  The Subject 
line should be used for informing us as to what the contents of the archive 
member are in English, not some cryptic "convention".

Jef Poskanzer in <10985@well.UUCP> writes:
# Why do you want to store the postings in a filename specified by the poster,
# with all the security issues that brings up?

I really fail to see a security issues problem as long as archivers do not
use absolute paths. Sources archiving should be done from a seperate uid/gid
as is stands now. NOT as root.. Posting software should not allow for 
absolute paths and the archivers would enforce this by only placing the 
files within an archive directory.

Jef Poskanzer in <10985@well.UUCP> writes:
# What's wrong with just saving the posting in a numbered file, grepping
# out the From: and Subject: lines to save in an index, and compressing
# the file?

Chip Rosenthal in <763@vector.UUCP> writes:
# Boy...that's a giant step backwards.  What's wrong with doing all of
# that, but using a meaningful name instead of a random number?

Nothing is wrong with Jef's approach. It is the Message-ID method of
archiving. It is the method used by many software archivers for groups 
that do not have auxiliary headers. Optimally, all source posted to any
newsgroup would have headers specifying the appropriate information so
that archivers could use a "meaningful name". Currently though, that is 
not the case.

Presently, there are three methods of archiving:

o Archive-name - The moderators of *most* sources groups assign an
   official Archive-name to each article that gets submitted to the net. 
   The Archive-name line in each file has a "new-login" or "elm/part06" 
   type of format.  For multi-part postings, a subdirectory is created 
   (as indicated in the elm example) to hold the separate "parts". This 
   format is used by many large archive sites because it is easier for 
   retrieval via mail request software such as netlib. The filenames also
   give hints as to what the software is.

o Volume-Issue - Software sent via *most* moderated sources groups have an 
   assigned Volume and Issue number. This allows the moderators to track and 
   reference the individual items that have been posted to the group. Each
   individual article is given an "Issue" number. The Issues are grouped 
   together into a "Volume". There are roughly 100 articles in each Volume 
   but this is an arbitrary split totally up to each moderator.  This format 
   is extremely useful when the software archives are cataloged. It makes 
   searching of the files quicker and verification of complete volumes 
   easier. This archive format is recommended for any site that will be doing 
   massive searches of the individual volumes since it keeps the quadratic 
   nature of directory searches from making your life miserable. 

o Message-ID   - The news software stores the articles locally by naming the 
   news article by a number generated on every site. The Message-ID number 
   ordering is unique to each site. If a Message-ID archive method is used,
   (or required by the newsgroup), the news article file is copied to the 
   archive directory.  The name of the archived article will match the 
   Message-ID number of the article contained within.  There is nothing 
   wrong with this method as long as a index is generated to assist in 
   archive member identification. This method is in use for alt.sources 
   and comp.sources.mac at ssbell since both groups do not have auxiliary
   headers.

Chip Rosenthal in <763@vector.UUCP> writes:
# I'm sorry, I missed the counterproposal in your message.  Were you trying
# to say that one shouldn't try to archive alt.sources?  Or were you just
# trying to trash my suggestion?  Nor have you explained why this is such
# a crummy idea for alt.sources even though it seems to work for the other
# sources newsgroups.

No. Jef was not trying to say it was a crummy idea. Discussion between 
people on an idea _usually_ produces a better solution. I did not see
the replies as flames, but as a "why bother in a group where anything 
goes, when archiving can be accomplished already". I disagree with Jef 
here. I would like to see a move towards auxiliary headers on all sources
posted to any newsgroup, whether they are moderated or not. I do not have
the time to read every single article in every single news group to examine
if it contains sources or not. I HATE it when I need something only to find
out that it went through a newsgroup I never read two weeks before I was 
forced to write it out of necessity. Alt.sources was established as a place
where people could get sources posted without going through a moderator.
The group was established for source postings, *not* discussions. 

Cool heads can accomplish something for the net, not just alt.sources.
Source/patches distribution needs to be improved in all NON-moderated
sources groups. How about modifications to Pnews/postnews so that:

Does this posting contain compilable sources ? [n]: y
Please enter requested Archive-name: pgm_name

produces an auxiliary header of the type,

	Submitted-by: My Name <my!address>
        Archive-name: pgm_name
	Posting-number: Volume 89

that can then be read by automated archivers. This could save us all a
lot of time. I know my family would appreciate seeing me more... 1/2 :-)

		-Kent+
--
Kent Landfield               UUCP:     kent@ssbell
Sterling Software FSG/IMD    INTERNET: kent%ssbell.uucp@uunet.uu.net
1404 Ft. Crook Rd. South     Phone:    (402) 291-8300 
Bellevue, NE. 68005-2969     FAX:      (402) 291-4362

rsalz@bbn.com (Rich Salz) (03/21/89)

In <448@ssbell.UUCP> kent@ssbell.UUCP (Kent Landfield) writes:
>(Rich, are you listening ? Is is time for a posting of post.c ?)

Yeah, I'm here. :-)  I dunno if the world really needs to see "post.c"
my impression is that only the really conscientious would use it (although
your idea of hacking Pnews/Postnews is a good one), and those folks
tend to send things off to moderators, anyhow.  (And then there's all
those nasty legal issues, post.c is copyright and trade-secret.)  (Just
kidding.)

I think the only way alt.sources is really gonna work is as a beta-release
for stuff that shows up in the real, moderated newsgroups.  Of course,
I'm biased, and Piercarlo Grandi's recent postings may prove me wrong.
CRISP may help prove me right :-)

>I can't see how the Subject: line would work "even better".  The Subject 
>line should be used for informing us as to what the contents of the archive 
>member are in English, not some cryptic "convention".
Yeah, I tried cramming all info into the Subject line, and it turns
people off.  As Brad Templeton has pointed out, READER time is what
counts most on Usenet, and it's unfair to take descriptive space away
to put something like an archive name.  Volume/Issue stuff could go away,
but folks like to know right away that they've found a gap.

	/r$
-- 
Please send comp.sources.unix-related mail to rsalz@uunet.uu.net.

lwall@jpl-devvax.JPL.NASA.GOV (Larry Wall) (03/22/89)

In article <448@ssbell.UUCP> kent@ssbell.UUCP (Kent Landfield) writes:
: I really fail to see a security issues problem as long as archivers do not
: use absolute paths.

Just so's you remember that ../../../../../../../../../../../etc/passwd
is effectively an absolute path.  And are any of your devices in /dev
world writable?  Anyone for a distributed Berkeley Break tonight?  Wanna
look for old Masscomps that default the system disk to world writable?

If you are archiving to a directory that might have had a foreign makefile
run in it, you'd best think a bit about symbolic links too.

Larry Wall
lwall@jpl-devvax.jpl.nasa.gov

kent@ssbell.UUCP (Kent Landfield) (03/23/89)

Warning: This is long!

Jef Poskanzer in <2165@helios.ee.lbl.gov> writes:
# In the referenced message, kent@ssbell.UUCP (Kent Landfield) wrote:
# }I really fail to see a security issues problem as long as archivers do not
# }use absolute paths.
#
# Everyone keeps failing to see the security issue.  All right, I'll be
# specific: if you are doing automatic archiving using filenames
# contained in or in any way derived from the postings, then you are
# vulnerable to having your archive overwritten.  

Automatic archivers *should* be smart enough to recognize that condition.
Store the file in a newsgroup "problems" directory using the article number 
as the file name and then send mail to the archive administrator alerting them 
to the problem.  It then becomes a manual problem for review and correction.
This also solves the accidental name-space collision. Duplicate filenames
have happened in moderated groups mainly when volume/issue numbers are 
duplicated accidentally. Sites that use volume/issue archiving have had to
deal with that more then the archive/name sites. This is not really a problem
(as long as your archiver can handle it :-)) since it happens soooo seldom. 

Larry Wall in <4632@jpl-devvax.JPL.NASA.GOV> writes:
# Just so's you remember that ../../../../../../../../../../../etc/passwd
# is effectively an absolute path.  

Absolutely!! Sorry... How many of you have taken a good look at your 
archiving lately?

# If you are archiving to a directory that might have had a foreign makefile
# run in it, you'd best think a bit about symbolic links too.

My personal opinion is that archive directories should be just that.
No testing, compiling, etc. should be done in the archive directories.

Jef Poskanzer in <2165@helios.ee.lbl.gov> writes:
# In the referenced message, kent@ssbell.UUCP (Kent Landfield) wrote:
# }Jef Poskanzer in <10985@well.UUCP> writes:
# }# What's wrong with just saving the posting in a numbered file, grepping
# }# out the From: and Subject: lines to save in an index, and compressing
# }# the file?
# }
# }Nothing is wrong with Jef's approach. It is the Message-ID method of
# }archiving.

# Bad name, since the filename has nothing at all to do with the Message-ID
# header line in the message. 

Yep! **Real** bad name. How about "article number" ? 

# }                                    Optimally, all source posted to any
# }newsgroup would have headers specifying the appropriate information so
# }that archivers could use a "meaningful name". 

# What's all this stuff about "meaningful names"?  Why do you care what
# filename a posting gets stored in?  

I really don't care what name a file is given. I would just like some 
indication that a file in a non-moderated newgroup contained sources.
Yes, you can get a bunch by grepping for "/bin/sh" or "This is a shell 
archive". This works on postings that have multiple files. It does not
work for single file postings that have not been shar'ed or sources in 
such groups as comp.os.vms, etc.  And grepping for "#include" will get 
more .signatures than source.. :-)

#                                     If you are archiving any substantial
# quantity of source, you have to use an index to find things anyway, so
# why bother with any additional (insecure) mechanism?

If you use article number archiving, an index is the *only* way to find
something short of grepping through 5000+ files. By having names that
reflect the contents, not only can you use the index, see what is there 
via 'ls', but a multi-part posting directory can be quickly copied 
to a target machine for compilation and testing.  Multi-part postings can
also be retrieved from archives via mail request software such as netlib.
I'd rather mail off a 
		"send elm"
 request instead of 
		"send 1723"
	 	...
		"send 1731"
	 	...
I don't know about you, but anonymous ftp to uunet (or wherever) would not 
be as much fun if doing a 'dir' in a comp.sources.any directory just produced 
a sea of numbers as filenames. In otherwords, it is exactly because I am 
archiving substantial quantities of sources that I need the additional 
mechanisms.  I still fail to see a security issues problem as long as 
the archiver does not use absolute paths.

# }    I would like to see a move towards auxiliary headers on all sources
# }posted to any newsgroup, whether they are moderated or not.

# Well, you won't see such a move.  Certainly it's easier *for you* if
# everyone on the net follows the standard *you* like when posting
# source; but people don't generally do what's easier *for you*.  They do
# what's easier *for them*.  

That is what I am counting on. I want them to do what is easier for them
because chances are it is also probably easier for me! Answering one or two
additional prompts in Pnews/postnews is petty, but would allow a truly 
automated archiver to be written so sources in non-moderated groups are
available when they are *needed*, *not* just when they were posted.

			-Kent+

Kent Landfield               UUCP:     kent@ssbell
Sterling Software FSG/IMD    INTERNET: kent%ssbell.uucp@uunet.uu.net
1404 Ft. Crook Rd. South     Phone:    (402) 291-8300 
Bellevue, NE. 68005-2969     FAX:      (402) 291-4362

allbery@ncoast.ORG (Brandon S. Allbery) (03/27/89)

As quoted from <1589@fig.bbn.com> by rsalz@bbn.com (Rich Salz):
+---------------
| I think the only way alt.sources is really gonna work is as a beta-release
| for stuff that shows up in the real, moderated newsgroups.  Of course,
| I'm biased, and Piercarlo Grandi's recent postings may prove me wrong.
| CRISP may help prove me right :-)
+---------------

Smile when you say that.  ;-)  CRISP came *that* close to coming out in
comp.sources.misc....

++Brandon
-- 
Brandon S. Allbery, moderator of comp.sources.misc	     allbery@ncoast.org
uunet!hal.cwru.edu!ncoast!allbery		    ncoast!allbery@hal.cwru.edu
      Send comp.sources.misc submissions to comp-sources-misc@<backbone>
NCoast Public Access UN*X - (216) 781-6201, 300/1200/2400 baud, login: makeuser