[news.sysadmin] Netnews redesign

rbraun@spdcc.COM (Rich Braun) (05/07/91)

Years ago, I wrote a bulletin board program for the DEC-10 which stored
what amounted to a newsgroup in a single file, with an index table.  Each
message was basically a record in a database, containing fields like
From, Date, and so on, with a variable-length field containing the
text.

This provided not only efficient file storage in terms of space, but also
made retrieval and search operations extremely fast.  (By caching a
newsgroup's index table in memory, retrieval of any given message could
be done in 0 or 1 seeks of the disk.)

Alas, this code was written in assembly language and is no longer useful.
I've mentioned to a few acquaintances the possibility of creating a new
news system (or modifying the existing one) which stored newsgroups as
databases rather than as directories of individual files, and the
common response is "I'd lose the use of all my familiar Unix tools",
probably just meaning that slow-as-molasses 'grep' wouldn't work.

Is there a netnews development project anywhere with which I could
correspond on this type of design issue?  I've toyed with the idea of
creating a complete communications package, at the heart of which would
be an improved netnews system.  But if someone else is working on one
I'd desperately love to talk to that person.  My other interest in this,
naturally, is to create a completely portable (and free) system which
could run on systems like DOS, OS/2, Macintosh, and VAX/VMS.  Netnews
is simply too valuable to be relegated only to the Unix world.

-rich

nelson@sun.soe.clarkson.edu (Russ Nelson) (05/08/91)

   Is there a netnews development project anywhere with which I could
   correspond on this type of design issue?

Not that I know of.  But I've proposed a similar idea to yours.  Hide
the database behind NNTP.  The draft for a revised NNTP even has a
"grep" call, so you couldn't need to do that by hand...

Expire is the problem, as always.  I've thought about doing it using "buckets".
Each bucket would be labelled "expire in 1 days", "expire in 2 days",
"expire in 3 days", etc.  The problem with that is that you'd really like
to expire as the disk gets full, rather than trying to express things
in terms of days.

Perhaps a better algorithm would be to keep articles in "chunks" with
some maximum size, say 100K.  These chunks would be queued up for
expiration.  Each chunk would have a "velocity" parameter, that
dictated its speed through the queue.  talk.* could have a velocity of
20, comp.* could have a velocity of 10, and comp.archives a velocity
of 0 (Hi, Ed!).  When it came time to find space for new news, the
chunk with the greatest displacement would be deleted, and its database
entries removed.

An algorithm would be needed to balance the chunk size versus its displacement.
Once an chunk has reached a certain displacement, it shouldn't be added to.
Perhaps 10% of the highest displacement?

--
--russ <nelson@clutx.clarkson.edu> I'm proud to be a humble Quaker.
It's better to get mugged than to live a life of fear -- Freeman Dyson
I joined the League for Programming Freedom, and I hope you'll join too.