[news.software.nntp] Can nntp handle a hierarchy of spools?

richv@hpinddu.cup.hp.com (Rich Van Gaasbeck) (02/07/91)

Here is a problem that is probably common to many large news-reading
organizations.  

Scenario A: "Typical".  Say you have a large population of news readers
(10,000 people), a large number of news machines (100) each with some
disk space (200Meg).  This might typically be arranged as 100 machines
running B or C news, with 100 users on each machine.  Lets say 200Meg
holds about 1 Month of news.

Lets say that you decide that 1 Month's worth of news is not enough.
You want to keep "local" groups and other groups that your
organization thinks important for several years.  Lets say that you
think that you will need 2 Gigabytes to store one copy.  Lets also
assume that your organization doesn't want to buy 2 Gigabytes for each
of 100 machines.  Given the resorce limitation and your realization
that each machine is storing pretty much the same information, you
come up with a new scheme for your local news network.

Scenario B: "Central".  You gather all the spool disk drives from all
the news machines and put them on one super colossal machine.  Users
read news using an nntp based news reader or log in via telnet to use
a local news reader.

This doesn't work out very well either.  You may not find a machine
fast enough to handle 10,000 users.  If your organization is
world-wide there may be no time when it is night everwhere so it will
be imposible to take down the system to do backups.  In fact if the
system goes down all 10,000 users can't read their news.

Here is the way I would like to see C-news/nntp work.  I think it
would be possible to do but I don't think it can today (could be
wrong, I suppose I should look at the docs and source).

Scenario C: "Ideal".  Start with Scenario A.  Take away 100 Meg from
each machine and give it to a central machine (for a total of 10
Gigabytes).  Configure the 99 "local" machines to automatically expire
articles that haven't been read recently (optionally just use the
current expire mechanism with parameters to keep it under 100 megs).
Change the nntp daemon to list both local information and information
from the central machine when asked about active newsgroups, headers,
etc.  When asked to retrieve an article it would get it from the
central server if necessary, give it to the user and also store it in
the local spool.  Basically it would act as a giant cache.  To the
news reading programs it would look like a single central machine, but
with the advantages that the central machine would be less busy, could
be located on the far side of a low performance network and the local
machines could continue to allow access to cached articles while the
central machine is down.  Additionally a larger amount of news can be
made available to a greater number of people while using much less
disk space.  Slight variations could also be useful.  If your caching
algorithm has a high hit rate you might be able to get by with a much
smaller spool (5 or 10 Meg).  You might also want to distribute the
central "machine".  For instance one "central" machine might hold
comp.sources.unix, a different one rec.*.

Like I mentioned above, I don't think that the current c-news/nntp
implementation can handle Scenario C.  Several parts are missing.  I
don't think c-news can expire articles based on readership patterns.
I also don't think that nntpd can keep track of a local spool and
potentially several remote sources of articles and present them to the
remote news reader as if they came from the same machine.

I would be interested in hearing any comments on the above.
Specifically, does this seem like a common problem for large
organizations?  Does this mixture of nntp, c-news and caching concepts
look like a good solution to the problem?  Are either the nntp or
c-news authors doing any work in this area?  What kinds of changes
would be needed to the c-news and nntp sources to make Scenario C
work?

Richv
richv%hpinddu.cup.hp.com@hplabs.hpl.hp.com

jerry@olivey.olivetti.com (Jerry Aguirre) (02/08/91)

In article <RICHV.91Feb6175707@hpinddr.cup.hp.com> richv@hpinddu.cup.hp.com (Rich Van Gaasbeck) writes:
>Scenario C: "Ideal".  Start with Scenario A.  Take away 100 Meg from
>each machine and give it to a central machine (for a total of 10
>Gigabytes).  Configure the 99 "local" machines to automatically expire

Another point that would interfear with this scheme is that existing
news transmission does not preserve the article number and current news
readers depend on that.  Thus if the user is reading along thru article
98, 99, 100,  and article 101 is not available on the local system then
when it goes to to the master system it is possible that article 100 will
be the same as article 98 and article 105 is what was intended.  This is
very evident if one switches from one news server to another.  Even if
they start out aligned the cancel messages alone will guarantee that
they gradually drift apart.

I have considered writing an "nntpslave" that would function like
nntpxfer but instead of processing the article thru rnews it would
just store it in the appropriate place in the news spool partition.
(The "Xref" line would come in handy for cross posted articles.)  Given
that plus a copy of the master system's active and history files one
would have a slave server that could be used transparently with the
master or other slaves.  But that does not, by itself, eliminate the 
duplicate storage.

One could split the storage and servers by news groups with one system
handling alt, another handling comp, etc.  There was a version of the
"vn" news reader that was set up to do that.  It would switch NNTP
servers based on the group being read.  That would distribute the load
and the storage among many systems.  Presumably a master system would
deal with the external world and redistribute to the partial slave
systems.

If one was really spread out across time zones, such as America and
Europe, then I would strongly recomend different servers for each
geographical area.  As you say there would never be a good time to take
the system down and then there is the question of transmitting the
article every time someone reads it rather than once.

If the only problem is to handle reading by a large number of local
users then there are specialized servers that can handle very large
amounts of NFS and disk traffic.

					Jerry Aguirre

gary@proa.sv.dg.com (Gary Bridgewater) (02/09/91)

In article <50331@olivea.atc.olivetti.com> jerry@olivey.olivetti.com (Jerry Aguirre) writes:
>In article <RICHV.91Feb6175707@hpinddr.cup.hp.com> richv@hpinddu.cup.hp.com (Rich Van Gaasbeck) writes:
>>Scenario C: "Ideal".  Start with Scenario A.  Take away 100 Meg from
>>each machine and give it to a central machine (for a total of 10
>>Gigabytes).  Configure the 99 "local" machines to automatically expire
>
>Another point that would interfear with this scheme is that existing
>news transmission does not preserve the article number and current news
>readers depend on that.  Thus if the user is reading along thru article
>98, 99, 100,  and article 101 is not available on the local system then
>when it goes to to the master system it is possible that article 100 will
>be the same as article 98 and article 105 is what was intended.  This is
>very evident if one switches from one news server to another.  Even if
>they start out aligned the cancel messages alone will guarantee that
>they gradually drift apart.
>...

This might be a an application for the broadcast packet technology that is being experimented
with by the tcp/ip folks.
Have multiple NNTP server systems with each one having some portion of the hierarchy.  As the
news is fed in, it is sent out on the net and the appropriate server wakes up and stores it
locally.   Cross-posted articles across servers would just get saved more than once but that
could possibly be worked around ( or maybe that is a feature).
When the user reads news, broadcast packets for the groups of interest are sent out and the
server(s) responsible would respond.  The client nntp would then pick a server (if more than
one responded) and make a "conventional" connection to transfer the goups. 
In fact, if such redundancy were designed in, I expect a lot of sites would jump on it.
You would never have to worry about losing your ~spool/news partition again.
-- 
Gary Bridgewater, Data General Corporation, Sunnyvale California
gary@sv.dg.com or {amdahl,aeras,amdcad}!dgcad!gary
C++ - it's the right thing to do.

kurt@rufus.almaden.ibm.com (Kurt Shoens) (02/12/91)

Rich Van Gaasbeck (richv@hpinddu.cup.hp.com) ...

   Scenario C: "Ideal" ...  Change the nntp daemon to list both local
   information and information from the central machine when asked about
   active newsgroups, headers, etc.  When asked to retrieve an article it
   would get it from the central server if necessary, give it to the user
   and also store it in the local spool.

This looks similar to the Andrew File System (AFS).  Perhaps you could
get the same effect by having your clients AFS mount the news spool.
AFS will cache recently read files locally on the clients.  By such a
scheme, one would have only a single news system (that on the server)
to administer.

You could handle many many news-reading clients by building a central
server with lots of disk space to hold all the articles and a number of
AFS clients connected to the central server that would each run nntp to
handle the news-reading load.  The AFS clients would tend to cache
articles that had appeared in the last few days ....

With respect to expire ... you can easily modify expire to use different
criteria for deleting articles.  The approximate form is:  gather info
on all articles one might potentially delete, sort the info by order
of desirability, and remove as many of the least desirable articles as
you have to to meet some goal (say, a particular amount of free space).

Now, if you had a few years of netnews online, it might be useful to
reengineer the storage structure of the history file.  As it currently
stands, the history file has to be read and written completely each time
you expire.
--
Kurt Shoens