[comp.sources.d] alt.sources archiving

pokey@well.UUCP (Jef Poskanzer) (03/16/89)

In the referenced message, chip@vector.UUCP (Chip Rosenthal) wrote:
}Personally, I'd like to see something like:
}    Submitted-by: My Name <my!address>
}    Posting-number: Volume 89
}    Archive-name: pgm_name
}This leaves off the "Issue" part of the "Posting-number" field.  Roll the
}volume number annually.  12 chars or less on archive name, please, to
}allow for ".Z".  Comments?

Why do you need Submitted-by: when you already have From:?  Why do you
need a number, especially if it's only going to change once per year?
Why do you need a separate Archive-name:, when some simple conventions
for what goes in the Subject: line would work even better?  Why do you
want to store the postings in a filename specified by the poster, with
all the security issues that brings up?

What's wrong with just saving the posting in a numbered file, grepping
out the From: and Subject: lines to save in an index, and compressing
the file?
---
Jef

            Jef Poskanzer   jef@helios.ee.lbl.gov   ...well!pokey
       "I delete letters like this all the time. I couldn't care less."

randy@oetl.UUCP (Randy O'Meara) (03/19/89)

In article <10985@well.UUCP> Jef Poskanzer <jef@helios.ee.lbl.gov> writes:
>In the referenced message, chip@vector.UUCP (Chip Rosenthal) wrote:
>}Personally, I'd like to see something like:
>}    Submitted-by: My Name <my!address>
>}    Posting-number: Volume 89
>}    Archive-name: pgm_name
>}This leaves off the "Issue" part of the "Posting-number" field.  Roll the
>}volume number annually.  12 chars or less on archive name, please, to
>}allow for ".Z".  Comments?

	* Here! Here!

>Why do you need Submitted-by: when you already have From:?  Why do you
>need a number, especially if it's only going to change once per year?
>Why do you need a separate Archive-name:, when some simple conventions

	* The Posting-number uniquely identifies the top level directory
	  and the Archive-name uniquely identifies all subdirectories
	  under the top level directory.  Take a look at c.s.amiga
	  and c.b.amiga for an excellent example of very functional
	  (and consistent) naming conventions.  For example; if the
	  binaries and sources for a distribution are separated and
	  shoved down their appropriate piplines ( comp.sources.?
	  and comp.binaries.? ), but contain the same Archive-name,
	  a script that process both distributions actually joins
	  the sources and the binaries in the same destination
	  directory.  Now that's automation!

>for what goes in the Subject: line would work even better?  Why do you

	* Grepping a Subject: line for a consistent filename is a *real*
	  pain, whereas using a *defined* format for Archive-name is
	  soooo simple.  You know what's supposed to be there, and if
	  your scripts find an inconsistency, they can alert you right
	  away.

>want to store the postings in a filename specified by the poster, with
>all the security issues that brings up?

	* There are *no* security issues that I'm aware of unless you
	  run your scripts as root and use absolute pathnames.  I have
	  set up a special UID and GID for this purpose.  Regular-old
	  Unix file access security is all that's needed.  Allowing
	  the poster to specify a *relative* pathname allows everyone
	  to refer to the files in the distribution coherently.  In
	  other words, if you have a trashed file in your copy of the
	  distribution, you don't have to ask for the entire Partm of n,
	  just ask for a new copy of the trashed file.

>What's wrong with just saving the posting in a numbered file, grepping
>out the From: and Subject: lines to save in an index, and compressing
>the file?

	* Nothing if you just deal with a few sources.  If you archive
	  *alot* of the source/binary groups and actually work with the
	  postings (compile/install/modify/reference), then you (I) must
	  maintain them in a usable state (zoo/arc/tar/etc).


	* Just an aside here.  I archive games, unix, amiga, x, pc,
	  and alt sources (and some binaries).  Alt sources is the
	  *only* one in this list that requires a modified perl
	  script.  All of the others have (resonably) consistent
	  headers.

-- 
 _______________________________________________________________
<  Randy O'Meara -- LMSC -- SCF                                 >
<          {pyramid,leadsv}!oetl!randy   PHONE:  (408) 425-6249 >
<_______________________________________________________________>

chip@vector.UUCP (Chip Rosenthal) (03/19/89)

In article <10985@well.UUCP> Jef Poskanzer <jef@helios.ee.lbl.gov> writes:
[ in response to my proposal to use secondary headers in alt.sources ]
>Why do you need Submitted-by: when you already have From:?  Why do you
>need a number, especially if it's only going to change once per year?

To minimize the chance of breaking existing archiving programs.

>Why do you need a separate Archive-name:, when some simple conventions
>for what goes in the Subject: line would work even better?

So I don't have to write a new archiving program.

>Why do you
>want to store the postings in a filename specified by the poster, with
>all the security issues that brings up?

Are you saying that comp.sources.unix should stop using archive names??

>What's wrong with just saving the posting in a numbered file, grepping
>out the From: and Subject: lines to save in an index, and compressing
>the file?

Boy...that's a giant step backwards.  What's wrong with doing all of
that, but using a meaningful name instead of a random number?

I'm sorry, I missed the counterproposal in your message.  Were you trying
to say that one shouldn't try to archive alt.sources?  Or were you just
trying to trash my suggestion?  Nor have you explained why this is such
a crummy idea for alt.sources even though it seems to work for the other
sources newsgroups.
-- 
Chip Rosenthal     chip@vector.UUCP    | -------- watch this space --------
Dallas Semiconductor   214-450-5337    | - real domain address coming soon -

rsalz@bbn.com (Rich Salz) (03/21/89)

In <425@oetl1.oetl.UUCP> randy@oetl.UUCP (Randy O'Meara) writes:
>	    Take a look at c.s.amiga
>	  and c.b.amiga for an excellent example of very functional
>	  (and consistent) naming conventions.

Credit where it's due...  Bob uses a program he got from me called "post"
that does most of that standardization.  I've been able to foist it off
on most of the source-group moderators, and many of the binary-group ones.

My program is descended from one written by the Grandfather of Us All,
John Nelson of Genrad.  He was the first moderator of mod.sources --
one of the first moderators at all, in fact -- and started many of
the current practices...
	/r "the 5th commandment" $
-- 
Please send comp.sources.unix-related mail to rsalz@uunet.uu.net.

jef@ace.ee.lbl.gov (Jef Poskanzer) (03/22/89)

In the referenced message, chip@vector.UUCP (Chip Rosenthal) wrote:
}Are you saying that comp.sources.unix should stop using archive names??

As far as I am concerned, that would be fine.  I have never used them,
since my own simple and robust solution outlined previously works just fine.
---
Jef

            Jef Poskanzer   jef@helios.ee.lbl.gov   ...well!pokey
  "Necessity is the plea for every infringement of human freedom. It is the
       argument of tyrants; it is the creed of slaves." -- William Pitt

jef@ace.ee.lbl.gov (Jef Poskanzer) (03/22/89)

In the referenced message, kent@ssbell.UUCP (Kent Landfield) wrote:
}Jef Poskanzer in <10985@well.UUCP> writes:
}# Why do you want to store the postings in a filename specified by the poster,
}# with all the security issues that brings up?
}
}I really fail to see a security issues problem as long as archivers do not
}use absolute paths.

Everyone keeps failing to see the security issue.  All right, I'll be
specific: if you are doing automatic archiving using filenames
contained in or in any way derived from the postings, then you are
vulnerable to having your archive overwritten.  Let's say Person A
posted some software to some sources group, and Person B had a grudge
against Person A.  Person B can send out a fake article with the same
Archive-name: header, and trash Person A's software all over the net.

Or there could be an accidental name-space collision.  This is not a
problem with the moderated sources groups -- I assume that all the
moderators always check whether a name has been used before assigning
it -- but it would definitely be a problem with an unmoderated sources
group such as alt.sources.  Oh yeah, remember alt.sources?  That's what
we are talking about here.

}Jef Poskanzer in <10985@well.UUCP> writes:
}# What's wrong with just saving the posting in a numbered file, grepping
}# out the From: and Subject: lines to save in an index, and compressing
}# the file?
}
}Chip Rosenthal in <763@vector.UUCP> writes:
}# Boy...that's a giant step backwards.  What's wrong with doing all of
}# that, but using a meaningful name instead of a random number?
}
}Nothing is wrong with Jef's approach. It is the Message-ID method of
}archiving.

Bad name, since the filename has nothing at all to do with the Message-ID
header line in the message.  Or did you want me to bring up security again?

}                                    Optimally, all source posted to any
}newsgroup would have headers specifying the appropriate information so
}that archivers could use a "meaningful name". Currently though, that is 
}not the case.

What's all this stuff about "meaningful names"?  Why do you care what
filename a posting gets stored in?  If you are archiving any substantial
quantity of source, you have to use an index to find things anyway, so
why bother with any additional (insecure) mechanism?

}      I would like to see a move towards auxiliary headers on all sources
}posted to any newsgroup, whether they are moderated or not.

Well, you won't see such a move.  Certainly it's easier *for you* if
everyone on the net follows the standard *you* like when posting
source; but people don't generally do what's easier *for you*.  They do
what's easier *for them*.  Seems to me that a simple, robust mechanism
that works with people's actual behaviour is preferable to requiring
people to change their behaviour and flaming them when they don't.
---
Jef

            Jef Poskanzer   jef@helios.ee.lbl.gov   ...well!pokey
                             Driver has no cache.

denbeste@bgsuvax.UUCP (William C. DenBesten) (03/22/89)

Jef Poskanzer in <10985@well.UUCP> writes:
# Why do you want to store the postings in a filename specified by the poster,
# with all the security issues that brings up?

In the referenced message, kent@ssbell.UUCP (Kent Landfield) wrote:
} I really fail to see a security issues problem as long as archivers do not
} use absolute paths.

From article <2165@helios.ee.lbl.gov>, by jef@ace.ee.lbl.gov (Jef Poskanzer):
> Everyone keeps failing to see the security issue.  All right, I'll be
> specific:
  ...
> there could be an accidental name-space collision.

If you want to avoid name-space collisions, have your archiver check
for collisions.  When (and if) you find a collision, find a new name
to use.  You could make the pathname relative (not absolute) at the
same time.  E. g. :

  -----------------------------------------
set origname = $filename

# check for absolute pathname
if ($filename >= '/' && $filename < '0') set filename = $filename:s:/::

# resolve any collisions
while (-e $filename)
  set filename = X-$filename
end

# report any anomolies
if ($origname != $filename) then
   mail $user -s archiver\: $origname conflict. << EOF
Archiver notice: $origname caused a conflict and was archived as $filename.
EOF
endif
  -----------------------------------------

IMHO, security must begin at home.

Beware, however, I have not tested this code.

-- 
 William C. DenBesten
 denbeste@bgsu.edu
denbesten@bgsuopie.bitnet

stacy@mcl.UUCP (Stacy L. Millions) (03/25/89)

In article <2165@helios.ee.lbl.gov>, jef@ace.ee.lbl.gov (Jef Poskanzer) writes:
> Everyone keeps failing to see the security issue.  All right, I'll be
> specific: if you are doing automatic archiving using filenames
> contained in or in any way derived from the postings, then you are
> vulnerable to having your archive overwritten.

So your automatic archiving program should check to see if it is
going to overwrite a file before it does so. This is not particuliarly
difficult to do. I didn't even think of *NOT* doing it when I
wrote my archiver.

> Or there could be an accidental name-space collision.  This is not a
> problem with the moderated sources groups -- I assume that all the
> moderators always check whether a name has been used before assigning
> it -- but it would definitely be a problem with an unmoderated sources
> group such as alt.sources.  Oh yeah, remember alt.sources?  That's what
> we are talking about here.

Accidents do happen (wasn't it "sao" that got out of whack). Then there
are reposts of gibbled postings, not all postings show up in the proper
order. Are you going to blindly assume that just because it got to your
system last, it is the most recent.

About once every couple months my archiver tells me there is a file
name collision for some reason or another, and then I have to descide
which one I really want. We live in an imperfect world, usenet is
inperfect as well (but atleast it is better than _reality_ :-), so
why not do some error checking instead of "assume[ing] that all the
moderators always check whether a name has been used before assigning it"

-stacy

-- 

"You should not drink and bake."
				- Arnold Schwarzenegger, _Raw Deal_
S. L. Millions                                            ..!tmsoft!mcl!stacy

pokey@well.UUCP (Jef Poskanzer) (03/26/89)

In the referenced message, stacy@mcl.UUCP (Stacy L. Millions) wrote:
}                         We live in an imperfect world, usenet is
}inperfect as well (but atleast it is better than _reality_ :-), so
}why not do some error checking instead of "assume[ing] that all the
}moderators always check whether a name has been used before assigning it"

But I don't rely on such an assumption.  That was my whole point, haven't
you been listening?  You can check for name collisions, or you can ignore
the Archive-name: junk altogether, but either way there *is* a security
issue.  Deal with it one way, or deal with it the other way, but don't
ignore it or "fail to see" it.
---
Jef

            Jef Poskanzer   jef@helios.ee.lbl.gov   ...well!pokey
             "The white zone is for loading and unloading only."

page%rishathra@Sun.COM (Bob Page) (04/01/89)

[somebody menthioned I should be reading this newsgroup.]

Randy O'Meara wrote:
)Take a look at c.s.amiga and c.b.amiga for an excellent example of
)very functional (and consistent) naming conventions.

rsalz@bbn.com (Rich Salz) replied:
)Credit where it's due...  Bob uses a program he got from me called "post"
)that does most of that standardization.

I do use such a program, and it's been an immense help.  However, it
only helps with the standardization of the headers, not of the archive
name, which is what the original discussion was about.  I do not
follow the comp.sources.unix archive name convention.

Essentially, for those who have not looked, I have broken the archive
into subsections by functionality.  I get to choose which category the
posting belongs in, and sometimes my choices appear arbitrary.  Within
each category, all file names are unique.  If 'foo' is in the archive
and I get another one to post, I rename it if it's from a different
author or call it 'fooXX' where XX is some form of the version number
(I toss any separator between 'major' and 'minor' revision numbers).
The result is that version 1.10 and 11.0 have the same archive name,
but there's no problem with that since it's expected that you'll
toss old versions.  If you don't (some people don't), by the time
version 11.0 comes, I hardly see a need for version 1.10 to exist,
so overwriting it seems OK to me.

Anyway, a public thanks to R$ for making my moderator's job a lot easier,
and to jpn for showing us the way.

..bob
Bob Page    page@sun.com    sun!page    415/336-2745