[news.sysadmin] Source Propigation and Archiving

rwhite@nusdhub.UUCP (Robert C. White Jr.) (04/06/88)

Hello,

	I have been watching spuradic traffic on the net discussing
several related topics.  The discussions recur with some frequency and
always contain many valid complaints about the exchange of news,
especially as it pertains to 1) Sources, 2) Binaries, and 3) material
collecting up, waiting for deleivery.  I have some thoughts on these
issues which I would like to share with the net.  I was going to
hack-together working diffs for the ideas, but my diff does not
support "-c", and I just discovered I was at patchlevel 0 when the
rest of the world is somewhere around 14 so my inital lookthrough
of the sources may not count fo much.  [So maby I wasn't paying that
much attention to everything ... ;-)]

	Most of my ideas involve hacks to control.c with a few bonus
hacks to (un)batch and the central reading routine [and possibly others,
but if so, very few]

	My first two ideas deal directly with the problem of material
pileing up awaiting transfer.
	1) the following methonds allow a feeding site to ignore the
presence of lower-order [fed] sites until they actually ask for their
mail.  Each feeding will [probably] require two connects, one to
trigger the batching and one to get it, in close seccussion.
Probability dictates that the second attempt will be seccuessful
because it is dependant on the seccuess of the first.
	1a) allow uux permission of a simple script containing
"sendbatch -c ${UU_SYS}" <the actual variable name is different
under different implementations of uucp<
	1b) a control message which accomplishes the exact same thing
but does not require any messing around with uux permissions.  A
perfect example being "Control: DoSend <sysname>"
	With this method, the feeding system need only queue a peice
of bouncy-mail to insure call-back after a legit disconnect.  A
"Control: Queueback <sysname>" message could even be provided for this if
desired.

	My next group of ideas work in concert and are addressed to
the multiple issues of source and binary distribution.  Spesificly the
supression of unwanted sources and binaries along expensive [i.e.
overseas] routes, and allowing archiving sites to distinguish between
sources, binaries, patches, and open descussion within a group.  The
direct implications of this is the removal of the ".d" groups.

The control messages I would add to the system follow:
	Source, Binary, Patch, Iwant, Request, Killstep, Deliver
These messages imply the new header lines labeled:
	Source:, Binary:, Patch:, and LimitP:
These messages imply the newsgroups:
	control.source control.binary and control.patch

These are the purposes of each new header or control message:

Control: Source <orig-message-id>
Control: Binary <orig-message-id>
Control: Patch <orig-message-id>
	These message types are generated when a user "attaches" a
	file to a normal text message.  The contents of the file
	are [depending on implementation if adopted] encoded, shar-ed,
	or whatever as aproprate and the message is secreeted in
	it's control subgroup.  This file may be un-attached on the
	receiving end durring reading [just like "w" or "s" from
	inside vnews].  These messages receive special treatment
	as shown below.

Source: <orig-message-id> <message-id> [<message-id> ...]
Binary: <orig-message-id> <message-id> [<message-id> ...]
Patch: <orig-message-id> <message-id> [<message-id> ...]
	These header(s) are added to the original text description
	of the posting when the  file [i.e. source, binary, or Patch]
	is attached to the message.  These headers are copied into
	the header-block of all replies and followups.  The
	information in these headers is static for all replies
	and followups.  More than one of these headers may apear
	in a single message, and types may be mingled freely.  The
	<orig-message-id> is the id of the text portion and will
	be identical to the ID in the "Message-Id:" header.
		Special treatment is ascribed to the presence of
	these headers, and _all_ the named messsage-id's will
	be adjusted in accordance with the local system settings.
	If any of the named message-id(s) exist on the local system
	receiving a message containing these headers, the named messages
	will have their individual experation-date values adjusted
	to the greater of 1) the original value, or 2) the value
	of this most recient message.  Any refrence to a message-id
	which is not present on the local system will be silently
	ignored.

LimitP: <prev.sys> <path>
	This message will cause the sequential _and_ linear
	propigation of a given message.  This function is spesified
	as a header so as not to violate the one-control-header
	per message rule.  This header is the inverse of the "Path:"
	Each machine removes it's name from the front of the list
	of machine names.  This header is only valid within
	the header-block of a control message.  If the control
	message may be satisfied, the message is not propigated.
	If there is another system in "<path>" the message is
	forwarded.  If there are no more systems, the next system
	is unknown, or the control message function dies badly,
	an error result is mailed back to the originator.
		The process patern is a) save prev.sys, b)
	send killstep to prev.sys, c) remove prev.sys, d) change
	first "!" to " ", e) attempt function, f) if !e then
	propigate or error.
		In all cases the message is canceled on the
	preceeding system.  [see Control: Killstep]
		This header allows a control message to act like
	a guided missle.  This missle would most-often be aimed
	at the poster of a source, or an archive site.  Often a
	request for text [i.e. source, patch, binary, (etc?)]
	sent to the original poster, could be satisfied
	by a much closer site, and need not traverse the entire net.
		It is valid to have more than one LimitP: in a
	single message, as this processing will fork the message
	where ever necessary, but it is not recomended.
		See "Control: Request" "Control: Iwant"

Control: Killstep <sysname> <message-id>
	This message is an aimed "cancel" message.  It _NEVER_
	propigates because it is sent to "sysname" only, and only
	if "sysname" is adjcent to the current site.  It is only
	acted on by a system which identifies itself as "sysname".
	[This allows it to be generated locally by any means, and
	only acted on in the defined case] It will only cancel
	certain types of controll messages [c.f. this doccument]
	In all other ways this message is identical to a cancel message.

Control: Iwant <orig-sys> <org-user> asked <message-id> [<message-id>
							...]
Control: Iwant <orig-sys> <org-user> failed <message-id>
					[<message-id> ...]
	This control message is a low level request for the message(s)
	with the named id(s)  If the local system can satisfy the
	request it will return a "Control: forward" message [if the
	local system is an archive or originator it may use "Deliver"
	this will have to be decided if this plan is implemented]
	and then issue an authorized "cancel" message against the
	message-id of the Iwant.  If the local system can't fufill the
	request and is at the end of the "LimitP:" it will respond
	with an "Iwant ...  failed" message aimed at the originating
	system via the "Path"
		If a system receives an "Iwant ... failed" and
	it's local name matches the <org.sys> it may 1) issue a
	"Control: request", 2) Notify <org-user> locally that the
	request failed, 3) Silently discard the response, or 4)
	combine actions 1 and 2.
		Some systems may want to wait for an arbitrary
	number of requests for the same message id before forwarding
	the request.  The best reason for this type of need is an
	prohibitivly expensive link, such as the U.S. to
	Europe, which might be willing to propigate sources
	if 20 people need them, but not for just one person.  Such
	systems would collect the requests in a single place
	[possibly a prototype Iwant, with the body listing all
	the downstream "Iwant" lines and their text bodies precded
	by ">"'s] providing a nesting of request depth.  [I am not
	sure about the difficulty of this bit, but It seems like
	it would be reasonably possible, if a little nasty]

Control: Request <orig-sys> <org-user> asked <message-id>
					[<message-id> ...]
Control: Request <orig-sys> <org-user> failed <message-id>
					[<message-id> ...]
	This statement is identical to "Iwant" in syntax and function
	except that durring the asking phase, if the LimitP: is
	exhausted then the system will send the "Request" line
	and text body to a command pipe of the system-adminstrators
	choice.  This will often be an automated "failed" reply,
	though on an archive site, or one which frequently submitts
	sources, this could be a tool for processing requests.
	[i.e. something which will preserver the entire return
	pattrn, and also determine the necessary tape et. al.]
		CAVEAT: "Request" is and should be used to ask for
	sources when the original message-id is unknown.  the
	message-id should be set to >unknown< [or something] so that
	the terminal site in the LimitP can generate mail about the
	request [using the message text]

Control: Forward <prev.sys> <orig-message-id>
	This message is used to propigate a message in a hostile
	environment.  The body of the text is defined as containing
	1) a control block [if found necessary durring implementation]
	followed by a blank line followed by 2) The original message
	header block followed by a blank line followed by 3) the
	original message text.
		This message, like all messages has a unique
	message-id.  As it is received by each system it sends a
	"Control: Killstep" to the previous system for it's own
	message-id.  It then attempts to re-constitute the message
	it is carrying in it's body.  It will succeed in the
	re-creation of the message in all cases, except when the
	message is listed as received and expired, or if a message
	with that id is already present <Very Rare<.  If necessary
	the message will be placed in general, junk, or some group
	chosen by the system adminstrator at compile time.

Control: Deliver <dest.system> <dest.user> <message.id>
					[<message.id> ...]
	This message is similar to Forward except that the message
	body is not deposited on interveening systems.  It's primary
	usage is for moderated news-groups.  When the message reaches
	dest.system the message is routed through a command defined
	at compile time, with the arguments to that command being
	the options to the deliver.
		For backbone routing the message would flow to the
	nearest backbone, and be piped out to the moderator with
	the command "Deliver BACKBONE comp.sources.xxx <message>".
		While this method would automatically use up two
	message-ids per message, [one for the text, and one for the
	deliver itself], it would set up the entire mechanism for
	the rest of this idea with a consistant framework, it would
	also make the moderators job easier.
		This also allows for a second level of moderation,
	presently a group is either open "y" or moderated "m", this
	message allows for moderate attached material "a".  In this
	environment open descussion could take place on the channel
	while any posted sources and patches and binaries could pass
	through the moderator, [and the moderator would not have to
	be continually reading and tossing out descussion peices]


	In general the pattern of data flow is normal.  The
control.sources, control.binary, and control.patch groups propigate
normally.  When a user is coasting through "comp.sources.ibmpc"
for example, he comes to a description of a wondurous program to
assist in the day-to-day removal of unwanted files from his disks.
When he gets to the end of the description he types "df" (detach file)
and the source is fetched from control.source and dropped (copied)
into the directory of his choice.  <the source was located
by use of the local history file<

	An archive site, similarly would, if they wanted to archive
_all_ sources [etc.], simply archive control.sources [etc.]
whereas, if they wanted to just archive comp.sources.unix, they would
expire that group with the -a, and what was ready to expire for that
group from control.source would be archived and the primitive text
would simply be discarded <special option for keeping text would also
be available< while the original description would be archived with
the source.  This allows automated archiving of a source group without
restricting the conversation within the group.

	A site which needs or wants to restrict the flow of sources
[etc.] simply has his feed include !control.source [etc.] in his sys
entry.  That site may then carry the source groups, and only get the
text of the descriptions and descussion.  If someone at that site
decides that the source for X is a must-have, they type "df" and get
"The source <message-id> is not available, would you like to request
it? (n)"  If the user replies with a "y" the system starts an Iwant
sequence.  If the Iwant, or subsiquent Request is fulfilled, the
Forward or Deliver will then establish that message on the system and
make it available to downstream sites if they Iwant or Request it.
Since the local machine probably does not have any of the control.X
newsgroups, the files will go to junk.  This, however, is not
important as the system can find it there just as easy as anywhere by
use of the history file(s).


	While this is a massive change to the system, and many of the
messages would blindly propigate through older software, this plan has
a lot of potental value.  On top of what is hinted at through the the
text, this would cut way down on the "I Want This" and the "Me Too"
messages which clutter up the visible bandwidth.  These messages would
not need to propigate through hell and beyond when a site three hops
up stream could satisfy the request.

	Put simply, this looks like it would increase net traffic
substantially, but when actually in use, it would improve the
signal-to-noise ratio remarkably, especially since the material
affected is quite bulky.  [i.e. 50000 byte sources].

<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
<<  All the STREAM is but a page,<<|>>	Robert C. White Jr.		   <<
<<  and we are merely layers,	 <<|>>	nusdhub!rwhite  nusdhub!usenet	   <<
<<  port owners and port payers, <<|>>>>>>>>"The Avitar of Chaos"<<<<<<<<<<<<
<<  each an others audit fence,	 <<|>>	Network tech,  Gamer, Anti-christ, <<
<<  approaching the sum reel.	 <<|>>	Voter, and General bad influence.  <<
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
##  Disclaimer:  You thought I was serious???......  Really????		   ##
##  Interogative:  So... what _is_ your point?			    ;-)	   ##
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^