[news.sysadmin] Why are news articles separate files?

lenny@icus.UUCP (Lenny Tropiano) (08/25/88)

Excuse me if this has been discussed before... but with new standards
in Netnews software being developed (ie. News 3.0 and Cnews) why are
each and every news article keep in separate files within separate
directories?  Does this have to do with some sort of RFC standard that
was posed way back when the news software first hit the net?  

Wouldn't a database solution be more apropos?  For example, store each
article after unbatched in a database (possibly within the directory
structure)?  This will eliminate the problem with i-nodes decreasing to
nothingness.  Granted there are somethings that would be slowed down. (eg.
expires)  

The databases can be "key'd" by something, maybe the Article#+<Message-ID:> 
or so other key.  Maybe this is totally of the wall... but several people
asked me "why?" and I couldn't really find a *good* answer to their question.
Maybe the problem is you can't please everyone with what DBM software to use...
Maybe you need to create your own dbm libraries...  Or give it as a compiler
option whether to use dbm or not?!

E-mail your responses to me and I'll be glad to summarize...

-Lenny
-- 
Paper-net: Lenny Tropiano          | @-net:         lenny@icus.UUCP
           ICUS Software Systems   | !-net:      ...sbcs   \
           PO Box 1                |                boulder \
           Islip Terrace, NY 11752 |                talcott  !icus!lenny
Vocal-net: (516) 582-5525 [work]   |                pacbell /
           (516) 968-8576 [home]   |                hombre /
Telex-net: 154232428 ICUS          | Another-net:   attmail!icus!lenny

henry@utzoo.uucp (Henry Spencer) (08/26/88)

In article <471@icus.UUCP> lenny@icus.UUCP (Lenny Tropiano) writes:
>Excuse me if this has been discussed before... but with new standards
>in Netnews software being developed (ie. News 3.0 and Cnews) why are
>each and every news article keep in separate files within separate
>directories?

The simple answer is "compatibility".  In the case of the C News crew,
we really didn't have a choice, since we didn't plan to rewrite all the
news readers.  3.0 has been a bit more ambitious, but even there it's
a substantial win if old news readers continue to work, since there are
several of them and it's a lot of work to replace them all.

In truth, we thought about the matter at some length beforehand, and
basically decided that we couldn't think of any new way that would be
*enough* better to justify it.

>Wouldn't a database solution be more apropos?  For example, store each
>article after unbatched in a database (possibly within the directory
>structure)?  This will eliminate the problem with i-nodes decreasing to
>nothingness.  Granted there are somethings that would be slowed down. (eg.
>expires)  

Aside from inode conservation, exactly what is the win in this?  We could
not see any in particular.  Our solution to the inode problem is to have
plenty of inodes -- they are not expensive.  Performance was THE big issue
with us, and the 3.0 crew aren't ignoring it either.

The existing scheme, although arguably crude, has a lot going for it.  It
is simple.  It is robust.  It is amenable to manipulation by the standard
Unix tools, instead of requiring a whole new set of its own.  It is fairly
efficient for the sorts of things that are done often.  These are important
advantages.
-- 
Intel CPUs are not defective,  |     Henry Spencer at U of Toronto Zoology
they just act that way.        | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

sow@eru.mt.luth.se (Sven-Ove Westberg) (08/28/88)

In article <1988Aug26.160040.22326@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes:
|In article <471@icus.UUCP> lenny@icus.UUCP (Lenny Tropiano) writes:
|
|Aside from inode conservation, exactly what is the win in this?  We could
|not see any in particular.  Our solution to the inode problem is to have
|plenty of inodes -- they are not expensive.  Performance was THE big issue
|with us, and the 3.0 crew aren't ignoring it either.

I don't agree with you. The new discs has a lot of heads and sectors
on each cylinder. I recently tried to create a news partion on a disc
and I can't get as much inodes I want. Mkfs on a Sun says that the
number of cyl/group should be a multiple of 8 an it is a limitation
of maximum 2048 inodes/cyl group. So inodes will be the limiting factor
in the future not the diskspace.

I don meant that 3.0 should work with a database. This must be
fixed in the operating system.

Sven-Ove Westberg, CAD, University of Lulea, S-951 87 Lulea, Sweden.
Internet: sow@cad.luth.se

eric@snark.UUCP (Eric S. Raymond) (08/30/88)

In article <1988Aug26.160040.22326@utzoo.uucp> Henry Spencer writes:
> In article <471@icus.UUCP> lenny@icus.UUCP (Lenny Tropiano) writes:
> >why are each and every news article keep in separate files within separate
> >directories?
> 
> The simple answer is "compatibility".  In the case of the C News crew,
> we really didn't have a choice, since we didn't plan to rewrite all the
> news readers.  3.0 has been a bit more ambitious, but even there it's
> a substantial win if old news readers continue to work, since there are
> several of them and it's a lot of work to replace them all.

Actually, I *have* replaced all the readers with upward-compatible rewrites,
and added three special-purpose new ones. But all the changes retain news
database compatibility with B2.11. This is not to argue Henry's point,
just to clarify it. Having old readers continue to work is good.

> In truth, we thought about the matter at some length beforehand, and
> basically decided that we couldn't think of any new way that would be
> *enough* better to justify it.

Ditto.

> >Wouldn't a database solution be more apropos? 
> 
> Aside from inode conservation, exactly what is the win in this?  We could
> not see any in particular.  Our solution to the inode problem is to have
> plenty of inodes -- they are not expensive.

Ditto again.

>                                              Performance was THE big issue
> with us, and the 3.0 crew aren't ignoring it either.

Correct. I know B3.0 ain't quite the screaming hot-rod C news is reputed
to be, but it's up there; last time I checked our profile figures against C
news's published ones there was maybe 15% or so difference. 3.0's goals are
different -- more focused on maintainability, ease of administration, better
reader interfaces and providing a migration path to full distributed
hypertext.

> The existing scheme, although arguably crude, has a lot going for it.  It
> is simple.  It is robust.  It is amenable to manipulation by the standard
> Unix tools, instead of requiring a whole new set of its own.  It is fairly
> efficient for the sorts of things that are done often.  These are important
> advantages.

100% agreement that these are the right reasons for keeping the format as it
is. In fact, my one experimental change in article tree format turned out to
be premature optimization, and I've removed it.

One thing I am building in right now is code to parse mailboxes as though they
were pseudo-newsgroups. This has involved rigorously separating the get-article
primitive from the reader 'session' and 'presentation' layers above it and
the database or nntp-access layer below it (the analogy with an OSI stack is
intentional, the service libraries really are layered kind of that way).

So if you want to experiment with a database representation, snarf the beta and
do it. You'll only have to change one module each in the reader libraries and
rnews.

-- 
      Eric S. Raymond                     (the mad mastermind of TMN-Netnews)
      UUCP: ...!{uunet,att,rutgers}!snark!eric = eric@snark.UUCP
      Post: 22 S. Warren Avenue, Malvern, PA 19355      Phone: (215)-296-5718

okamoto@hpccc.HP.COM (Jeff Okamoto) (09/09/88)

cball@ishmael writes:

> Since no one else has mentioned it, there is a system that keeps "news"
> in a database.  It is called notes.  Notes is an independent development
> that was derived from the PLATO system originated by CDC.  Notes was
> written by Ray Essick at the University of Illinois.  The latest full
> release I've seen was 1.7 and was released in early 1985.

Notes is actively used here at HP.  We are up to release 2.8.2.

Everything else that was mentioned is true.

Jeff Okamoto
HP Corporate Computing Center
(415) 857-6236