[news.stargate] Details on `quality control' and transferable usenet postings

webber@brandx.rutgers.edu.UUCP (05/16/87)

In article <769@mcgill-vision.UUCP>, mouse@mcgill-vision.UUCP (der Mouse) writes:
> How is this enforced?  What's to stop me, say, ...
> ...    It sounds as though you have some mechanism
> other than uucp in mind for news transfer - what?

[I am going to spell this out in detail.  If you are not interested,
please feel free to skip the rest of this message.  It has no new
philosophical content, it simply addresses some of the technical
aspects of doing things correctly.]

As far as I know, there is nothing about uucp specifically that is a
controlling factor in the behavior of the net.  I assume that by
`uucp', you are actually referring to the problems specific to the
distributed way news flows across links that are sometimes established
once an hour and other times once a day/week.  Hence, a solution can
not rely on quick communication between all sites and some `central'
site.  I see the solution as evolving in response to the problem as
follows.

Initially, everyone has the resources to gather and broadcast the
entire flow.  Whether or not we are still in that state, it is
inevitable that eventually some people will find their resources
overwhelmed.  At this point, there is a basic problem -- i.e., that
only X bytes of information can be gathered and broadcast by some site
S in a given period of time.  At this point, said site S restricts its
flow to X bytes/cycle.  Sites that are feeding off S are now in the
position to either allow S's resources to be a bottleneck on their
access to the net or to find other places to get their news from (in the
simplest case, they can take over S's sources and then redistribute to
S, thus moving closer to being a backbone site themselves as it
becomes clear that they have more resources than their neighbors).

Given that S has access to more than X bytes/cycle, the question
arises of how does S choose which bytes to take and which to ignore.
Currently, the news mechanisms appear to be structured to encourage S
to make the decision based on the names of various groups.  Thus S is
faced with the choice of deciding to carry the hi-tech names like
comp.sys.masscomp or names like talk.bizzare.  S is worried about the
fact that S is already slightly overexposed with regards to the
resources S is expending on the net, and makes the obvious
conservative choice of the longer established talk.bizzare.  This
denies all the hackers that rely on S for news access to the technical
curiousities of the masscomp systems.  However, this is silly because
the problem was not one of what kind of material was flowing through
the site S, but rather of how much.  Thus the proper approach is to
view the flow strictly as bytes/cycle.

The most natural approach is to take the oldest X unseen bytes from
the sources available and then wait until S can afford to gather
another X bytes.  However, there are some aspects of the X bytes that
are important to the maintenence of the net flow and hence should not be
ignored.

For example, moderated groups are particularly annoying to S since they
force S to handle the same message twice, once when it is going to the 
moderator and once when it is coming back with its seal of approval.
This is particularly annoying when it is a group that carries large
messages, such as a sources group.

Also of interest to S is the fact that messages bear different
distribution zones.  Some news is meant for only people on a given
machine, other news for people in a given machine group, etc.  For
convience, let us denote these zones as person, machine, organization,
state, country, and world (although these will not have quite the
expected geographic meanings since some sites have private lines or
perhaps are even willing to make massive long distance calls - it
might be more reasonable to think in terms of mail domains, but that
is not how news currently works).  As well as having different
distribution zones, a message also has different creation zones.
Given that S prefers neighbors to strangers, there is a preference to
handle the postings created by people on S's machine first, then those
in S's organization, then S's state, country, and then finally `the
world'.

For each of these postings, S recognizes that S has a different kind
of impact.  Clearly if S doesn't handle postings created within S's
machine and bound anywhere they won't get handled, so S handles them.
With the remaining resources, S handles postings created within S's
organization and distributed only within S's organization.  Then S
handles messages created with S's state and for distribution only with
S's State.  Then S handles those with S's country and bound for within
said country.  And finally, anything left over gets handled.  [Note:
postings created within S's organization but for distribution within,
say, S's state, get handled after the postings that are distributed
only within the organization.]

Within this scheme, there is something else going on.  S stands in a
special relation to the people, say T1, T2, etc., that S gets S's news
from.  Although S is too far from other machines to impact the flow
created by them, S can have an impact on T1, etc.  To this end, S
imposes a quota on S's neighbors - thus of the X bytes/cycle, only Y
bytes/cycle will be accepted that were created by machine T1.

Now we move to the plight of machine T1.  The neighbors of T1 have
imposed quotas on the flow out of T1 so that only Z bytes/cycle
created by users on T1 can get out to the net.  Note that so far, the
only one at either S or T1 that has been involved are the news
administrators at those sites.  Bad behavior by a site administrator,
such as T1 making its messages appear to be coming from Q in order to
avoid the quotas would be enforced the same way the net would
currently handle a site that faked moderator certification routinely.

Now, the administrator of T1 has to figure out how to distribute the Z
bytes/cycle quota among the users of T1.  The simplest way would be to
create a queue of all submissions to news from users of T1.  A new
submission would go in the queue behind all earlier submissions by
users that do not have earlier submissions already in the queue, but
in front of the first submission that is the second submission by the
same user that is already in the queue.  Site S then empties the queue
at a rate of Z bytes/cycle.  When all sites that feed from T1 have
gotten the message, the message is removed from the queue and other
messages are reorganized as appropriate.

Now, part of the reason the S is enforcing the quota on T1 is that S
feels that this is the duty of all the nodes of the net in order to
control the flow.  However, S realizes that besides controlling the
flow of its neighbors, its own flow is being controlled.  It is
reasonable to give S the option of allowing T1 to break quota by
having S take the overage and place it into its own queue and treat it
as messages created by users of S.  

However, T1 is usually presenting to S more than Z bytes of newly
created news, so how does S know which ones to take and how does S
know who in its own queue to put the messages ahead of.  The answer is
that it doesn't get handled at the site level, it gets handled at the
user level.  User T1a sends user Sb a mail message notifying Sb that
there is a message in the queue with id #XYZ that Sb should find
worthwhile enough to sacrifice Sb's position in S's queue in order to
get this message into the net.  Sb then posts a special mail message
to the news handler of S requesting that the message #XYZ on machine
T1 be read in the next news transfer, but instead of going into the
general flow, it would go into the queue of submissions from users of
S that would be appropriate if Sb had just submitted message #XYZ.

Of course, there is nothing to prevent user Sb from then turning
around and making the same request from a user of machine T2.  In this
way it is possible that a message will actually get into the net
because a user on some distant machine has yielded a position in the
queue to the user T1a.  Of course, this will not happen often, but is
a possibility that would allow users to handle the notion that they
would like to increase the ability of certain other users to post to
the net.

Note that this does not require any advanced technology such as
stargate.  This is not to say that stargate-like technology could not
help the net.  Just that volume is not the place where this technology
is needed.  The place where this technology is needed is in handling
the timewarp problem.

I hope that this has answered any questions as to the feasibility of
handling a usenet-like uncensored medium in the absence of increased
communication resources.  Comments are welcome.  Since they can't be
prevented, one accepts the inevitable :-)

--------------------- BOB (webber@aramis.rutgers.edu ; rutgers!aramis!webber)