[news.software.b] New News

amanda@mermaid.intercon.com (Amanda Walker) (12/06/89)

In article <1989Dec6.031329.13569@utzoo.uucp>, henry@utzoo.uucp (Henry Spencer)
writes:
> Be careful not to throw the baby out with the bathwater.  [...]
> The Unix file system is not a bad way to organize a database.

This is true as far as it goes.  One thing I should probably mention is that
I'm in the position of thinking about and implementing news-style software
on non-UNIX machines, and so I tend to look twice at a lot of things such
as the organization of the article database, the active & history files,
and so on.  The article was tossed off the top of my head--it wasn't meant
as a "hit list" :-).

On the other hand, it's also possible to argue that the current newsgroup
structures and so forth are in part artifacts of using a UNIX directory
tree as a database...

I don't want to solve problems that aren't there, but I also think that it's
worth taking a second look at news on an overall level, as opposed to doing
a new implementation within existing constraints.  I also don't want to give
the impression that I think that such efforts aren't valuable, either.  For
example, I've been quite impressed by things in both Cnews and TMNN.  They
are, in fact, part of what got me started on this line of thinking.

Amanda Walker
InterCon Systems Corporation
amanda@mermaid.intercon.com
--

brad@looking.on.ca (Brad Templeton) (12/07/89)

There is one useful concept that the current structure makes difficult.

I see three classes of article in a newsgroup.

The first class I would call "permanent articles" -- articles that are expected
to stay around forever, but might be updated in place.  These would be things
like "commonly asked questions" and "about this newsgroup" etc.

The second class would be semi-permanent articles.  These would be the
roots of popular, recurring discussions.  Such as "Blade Runner" in
rec.arts.sf-lovers.

The third class is ordinary ephemeral articles like we have today.

To support these classes, particular the semi-permanent ones, the current
system would require that very low message numbers be kept around.

That mucks things a lot, since it would make the first number in the
active file a waste of time, and require lots of work to check for articles
that have gone.

For the permanent articles, the concept of article numbers outside the
regular space would be fine -- negative article numbers, for example.

But to do the semi-permanents, there is no easy solution, unless we can
somehow move their article numbers.  But that doesn't sit will with current
schemes.   Although once a s-p article reaches the expiry date for
an ephemeral article, it could probably be moved without trouble.

But such solutions would be kludges, and it would be nice to design a system
that supports this from the ground up.
-- 
Brad Templeton, ClariNet Communications Corp. -- Waterloo, Ontario 519/884-7473

jef@well.UUCP (Jef Poskanzer) (12/07/89)

A couple of years ago I designed a SQL database for news.  It supported
all the operations required by then-available news transports and readers,
and of course it allowed all sorts of new operations.  Retrofitting all
of said transports and readers to use it was too big a job to interest me
then, and it would be an even bigger job now.  But it seems to me that if
you're really interested in re-designing news from scratch, a good place
to start would be the *interface* between the underlying database and
the transports and readers.  If this interface was nicely specified, then
providing alternate implementations (SQL vs. Unix filesystem vs. MS-DOS
filesystem vs. whatever) would suddenly become a hell of a lot easier.
---
Jef

  Jef Poskanzer  jef@well.sf.ca.us  {ucbvax, apple, hplabs}!well!jef
                          One size fits all.

amanda@mermaid.intercon.com (Amanda Walker) (12/08/89)

In article <14853@well.UUCP>, jef@well.UUCP (Jef Poskanzer) writes:
> ... it seems to me that if
> you're really interested in re-designing news from scratch, a good place
> to start would be the *interface* between the underlying database and
> the transports and readers.  If this interface was nicely specified, then
> providing alternate implementations (SQL vs. Unix filesystem vs. MS-DOS
> filesystem vs. whatever) would suddenly become a hell of a lot easier.

This is part of what I'm interested in talking and thinking about.  SQL is
a fairly obvious thing to look at, although I haven't gone into as much
detail in my napkin-sketches as you evidently have.

Another part of what I want to do is to "design first, then code."  A lot
of the current state of news (especially B news & rn) is that a fair amount
of it seems to have been approached with an attitude of "well, now that we
have these articles, what can we do with them?".  Now, for adding function
to an existing system which is effectively immutable, this is the only way
to do it.  Software like 'rn', 'newsclip' (apologies if you want weird
capitalization, Brad :-)), and NNTP show that you can do a lot with this,
but I think it's worth thinking about the actual operations before picking
an interface & representation.  I mean, what kinds of things DO we want to
do with articles anyway?  Brad's stuff about different classes of articles
is an excellent example.  To take it even further, though, why use article
sequence numbers at all?  Toss the active & history files, and have that
information be part of the article database itself...

Anybody know how Andrew deals with message bases?  or Notes?

--Amanda
--

rsalz@bbn.com (Rich Salz) (12/08/89)

I think that one particularly interesting problem about netnews is that it
is such a dynamic database -- a megabyte a day of turnover.  Very few text
retrieval and hypertext systems are cut out for that kind of thing.

"Design first, then write" is nice, but at some point you wanna release
something to users.  Before people got fancy and called it "rapid
prototyping" it used to be called iteration, and it was a cornerstrone of
the techniques used by the first set of Unix authors.
	/r$
-- 
Please send comp.sources.unix-related mail to rsalz@uunet.uu.net.
Use a domain-based address or give alternate paths, or you may lose out.

jmr@jrowan.austin.ibm.com (Jim Rowan/100000) (12/08/89)

In article <1610@intercon.com> amanda@mermaid.intercon.com (Amanda Walker) writes:
>
>Anybody know how Andrew deals with message bases?  or Notes?
>
>--Amanda
>--

Notes keeps each topic (== newsgroup) in a set of three files.
If I remember correctly, there's one which holds the text of all of the
notes (== articles), an index for base notes, and an index for responses.


-- 
Jim Rowan 	(My ravings are my own, and don't belong to my employer.)
		cs.utexas.edu!ibmchs!jrowan (outside the wall)
		or jmr@jrowan.austin.ibm.com (inside the wall)

jef@well.UUCP (Jef Poskanzer) (12/08/89)

In the referenced message, rsalz@bbn.com (Rich Salz) wrote:
}I think that one particularly interesting problem about netnews is that it
}is such a dynamic database -- a megabyte a day of turnover.  Very few text
}retrieval and hypertext systems are cut out for that kind of thing.

I thought it was more like three megabytes a day now.  Even that is only
34 bytes per second.  A commercial database such as Sybase (where I was
working when I designed the SQL news database) has no problem handling data
rates hundreds of times higher.  The indexing speed (and/or space) is more
important than the turnover rate.
---
Jef

  Jef Poskanzer  jef@well.sf.ca.us  {ucbvax, apple, hplabs}!well!jef
 "It says he made us all to be just like him.  So if we're dumb, then
god is dumb, and maybe even a little ugly on the side." -- Frank Zappa

nelson@sun.soe.clarkson.edu (Russ Nelson) (12/08/89)

In article <2203@prune.bbn.com> rsalz@bbn.com (Rich Salz) writes:

   I think that one particularly interesting problem about netnews is that it
   is such a dynamic database -- a megabyte a day of turnover.

But is that good or bad?  We've all seen the same old subjects get rehashed
and rehashed.  The nice thing about Usenet is that if you miss a topic of
discussion, wait.  It will be discussed again.  Perhaps people would be
more careful about what they said if they knew it was going to be around
permanently?
--
--russ (nelson@clutx [.bitnet | .clarkson.edu])
Live up to the light thou hast, and more will be granted thee.
A recession now appears more than 2 years away -- John D. Mathon, 4 Oct 1989.
I think killing is value-neutral in and of itself. -- Gary Strand, 8 Nov 1989.
Liberals run this country, by and large. -- Clayton Cramer, 20 Nov 1989.
Shut up and mind your Canadian business, you meddlesome foreigner. -- TK, 23 N.

rsalz@bbn.com (Rich Salz) (12/08/89)

I said that the news turnover will be a problem.  I wasn't clear about
what I meant because in <14863@well.UUCP> Jef Poskanzer <jef@well.sf.ca.us>
says that:
>I thought it was more like three megabytes a day now....
>A commercial database such as Sybase ...  has no problem handling data
>rates hundreds of times higher.  The indexing speed (and/or space) is more
>important than the turnover rate.

The problem isn't necessarily with indexing or feeding data into the
system.  The problem is with retrieval and display.  When I come back
to a newsgroup after a couple of days, I don't want to have to rummage
around all those hypertextish links again, just to establish my context.

The hypertext systems I've seen are oriented for the case where all the
data is "old news," and not where some is old and some is new.  I think
THAT is the issue which creates many interesting problems.

Is this a better explanation?
	/r$
-- 
Please send comp.sources.unix-related mail to rsalz@uunet.uu.net.
Use a domain-based address or give alternate paths, or you may lose out.